Channels and Capacity

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

A channel is a conditional distribution P(Y|X). Channel capacity C = max_P(X) I(X;Y) is the highest rate at which you can send information reliably. Shannon's noisy channel theorem: reliable communication at rate R < C is possible; at R > C it is not.

What is a channel

A channel takes an input X and produces an output Y according to a conditional probability P(Y|X). The input is what you send. The output is what arrives. The channel adds noise. A perfect channel copies X to Y unchanged. A useless channel ignores X and outputs random noise. In the categorical view, channels are morphisms in the Stoch category, where the Giry monad formalizes the probability distributions that define them.

Scheme

; A channel is a conditional distribution P(Y|X)
; Represented as a function: input -> list of (output . probability)

(define (binary-symmetric-channel x flip-prob)
  ; BSC: flips the input bit with probability flip-prob
  (let ((keep (- 1.0 flip-prob)))
    (list (cons x keep)
          (cons (if (= x 0) 1 0) flip-prob))))

(display "Send 0, flip=0.1: ")
(display (binary-symmetric-channel 0 0.1)) (newline)
(display "Send 1, flip=0.1: ")
(display (binary-symmetric-channel 1 0.1)) (newline)
(display "Send 0, flip=0.5: ")
(display (binary-symmetric-channel 0 0.5)) (newline)
; flip=0.5 is useless: output is independent of input

Channel capacity

Channel capacity C = max_P(X) I(X;Y). You choose the input distribution P(X) to maximize mutual information. For the binary symmetric channel with flip probability p, the capacity is C = 1 - H(p), achieved by a uniform input.

Scheme

; Channel capacity of the binary symmetric channel
; C = 1 - H(p) where H(p) = -p*log2(p) - (1-p)*log2(1-p)
(define (log2 x) (/ (log x) (log 2)))

(define (binary-entropy p)
  (if (or (<= p 0) (>= p 1)) 0
    (- (+ (* p (log2 p)) (* (- 1 p) (log2 (- 1 p)))))))

(define (bsc-capacity flip-prob)
  (- 1 (binary-entropy flip-prob)))

(display "flip=0.0:  C = ") (display (bsc-capacity 0.0)) (newline)
(display "flip=0.01: C = ") (display (bsc-capacity 0.01)) (newline)
(display "flip=0.1:  C = ") (display (bsc-capacity 0.1)) (newline)
(display "flip=0.25: C = ") (display (bsc-capacity 0.25)) (newline)
(display "flip=0.5:  C = ") (display (bsc-capacity 0.5)) (newline)
; flip=0 -> perfect channel (1 bit). flip=0.5 -> useless (0 bits).

Shannon's noisy channel theorem

The noisy channel theorem has two parts. The achievability part: for any rate R < C, there exists a coding scheme that achieves arbitrarily low error probability. The converse: for R > C, errors are unavoidable. The boundary is sharp. Shannon proved existence but did not construct the codes. That took another 50 years.

Scheme

; Noisy channel theorem (statement, not proof)
; R < C => reliable communication possible
; R > C => errors unavoidable
(define (log2 x) (/ (log x) (log 2)))

(define (binary-entropy p)
  (if (or (<= p 0) (>= p 1)) 0
    (- (+ (* p (log2 p)) (* (- 1 p) (log2 (- 1 p)))))))

(define (bsc-capacity flip-prob) (- 1 (binary-entropy flip-prob)))

(define capacity (bsc-capacity 0.1))

(define (check-rate r c)
  (display "  Rate ") (display r)
  (if (< r c)
    (display " < C: reliable (codes exist)")
    (display " > C: unreliable (errors forced)"))
  (newline))

(display "BSC capacity (p=0.1): ") (display capacity) (newline)
(check-rate 0.3 capacity)
(check-rate 0.5 capacity)
(check-rate 0.6 capacity)
; The theorem guarantees a sharp boundary at C.
; Below C: error -> 0 as block length -> infinity.
; Above C: error bounded away from 0.

Notation reference

Symbol	Scheme	Meaning
P(Y\|X)	(bsc x flip-prob)	Channel (conditional distribution)
C = max I(X;Y)	(bsc-capacity p)	Channel capacity
H(p)	(binary-entropy p)	Binary entropy function
R < C	(< r c)	Rate below capacity: reliable

Neighbors

📡 Data processing inequality — constrains what channels can transmit
📡 Entropy as functor — why capacity is a categorical invariant
🍞 Fritz 2020 — Markov kernels are channels in categorical language
🍞 Capucci 2021 — forward and backward channels in cybernetic systems
Channel capacity
Binary symmetric channel

Translation notes

All examples use the binary symmetric channel, the simplest non-trivial channel. Shannon's theorem applies to arbitrary discrete memoryless channels with finite alphabets, and extends to continuous channels (Gaussian channel, AWGN) with integral formulas. The capacity formula C = 1 - H(p) is specific to the BSC. The general formula C = max I(X;Y) requires optimization over all input distributions, which for most channels has no closed-form solution.

← Data Processing by june.kim Entropy as Functor →