← back to info theory

Surprise: the atomic unit

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

The information content of an event is I(x) = −log2 P(x). Rare events carry more information. Certain events carry none. The unit is the bit.

Why logarithm?

Shannon needed a measure of "surprise" with one property: independent events should add, not multiply. If you flip a coin and roll a die, the total surprise should be the sum. Since P(A and B) = P(A) × P(B) for independent events, and log turns products into sums, the logarithm is the only choice. This measure of surprise per event is the foundation of jksubstance detection: high information content signals that something is worth attending to.

I(x) = −log₂ P(x) 0 1 2 3 P=1 ½
Scheme

Additivity forces the logarithm

Shannon's key insight: any function f(p) that satisfies f(p × q) = f(p) + f(q) must be a logarithm. This is the wpCauchy functional equation. Adding the constraints that f is continuous and f(1/2) = 1 (defining the bit), the unique solution is f(p) = −log2(p).

Scheme

Choosing the base

The base of the logarithm picks the unit. Base 2 gives bits (Shannon's choice for digital communication). Base e gives nats (convenient for calculus). Base 10 gives hartleys. They differ only by a constant factor. The structure is the same.

Scheme

Notation reference

Symbol Scheme Meaning
I(x) = −log2 P(x)(self-info p)Self-information / surprise
bitlog base 2Unit when base = 2
natlog base eUnit when base = e
I(x,y) = I(x) + I(y)(+ (self-info p) (self-info q))Additivity (independent events)
Neighbors