← back to information theory

Channels and Capacity

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

A wpchannel is a conditional distribution P(Y|X). Channel capacity C = maxP(X) I(X;Y) is the highest rate at which you can send information reliably. Shannon's noisy channel theorem: reliable communication at rate R < C is possible; at R > C it is not.

What is a channel

A channel takes an input X and produces an output Y according to a conditional probability P(Y|X). The input is what you send. The output is what arrives. The channel adds noise. A perfect channel copies X to Y unchanged. A useless channel ignores X and outputs random noise. In the categorical view, channels are morphisms in jkthe Stoch category, where the Giry monad formalizes the probability distributions that define them.

X P(Y|X) noise Y input X enters, noise corrupts, output Y exits
Scheme

Channel capacity

Channel capacity C = maxP(X) I(X;Y). You choose the input distribution P(X) to maximize mutual information. For the wpbinary symmetric channel with flip probability p, the capacity is C = 1 - H(p), achieved by a uniform input.

Scheme

Shannon's noisy channel theorem

The noisy channel theorem has two parts. The achievability part: for any rate R < C, there exists a coding scheme that achieves arbitrarily low error probability. The converse: for R > C, errors are unavoidable. The boundary is sharp. Shannon proved existence but did not construct the codes. That took another 50 years.

Scheme

Notation reference

Symbol Scheme Meaning
P(Y|X)(bsc x flip-prob)Channel (conditional distribution)
C = max I(X;Y)(bsc-capacity p)Channel capacity
H(p)(binary-entropy p)Binary entropy function
R < C(< r c)Rate below capacity: reliable
Neighbors

Translation notes

All examples use the binary symmetric channel, the simplest non-trivial channel. Shannon's theorem applies to arbitrary discrete memoryless channels with finite alphabets, and extends to continuous channels (Gaussian channel, AWGN) with integral formulas. The capacity formula C = 1 - H(p) is specific to the BSC. The general formula C = max I(X;Y) requires optimization over all input distributions, which for most channels has no closed-form solution.