Surprise: the atomic unit

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

The information content of an event is I(x) = −log₂ P(x). Rare events carry more information. Certain events carry none. The unit is the bit.

Why logarithm?

Shannon needed a measure of "surprise" with one property: independent events should add, not multiply. If you flip a coin and roll a die, the total surprise should be the sum. Since P(A and B) = P(A) × P(B) for independent events, and log turns products into sums, the logarithm is the only choice. This measure of surprise per event is the foundation of substance detection: high information content signals that something is worth attending to.

Scheme

; Self-information: I(x) = -log2(P(x))
; The "surprise" of seeing event x

(define (log2 x) (/ (log x) (log 2)))

(define (self-info p) (- (log2 p)))

; A fair coin lands heads: P = 1/2
(display "coin flip: ")
(display (self-info 0.5))
(display " bits") (newline)

; A fair die lands on 3: P = 1/6
(display "die roll:  ")
(display (self-info (/ 1.0 6)))
(display " bits") (newline)

; A certain event: P = 1
(display "certain:   ")
(display (self-info 1.0))
(display " bits") (newline)

; Independent events ADD surprise
; Coin flip + die roll
(display "coin + die: ")
(display (+ (self-info 0.5) (self-info (/ 1.0 6))))
(display " bits") (newline)

; Same as joint probability
(display "joint P:    ")
(display (self-info (* 0.5 (/ 1.0 6))))
(display " bits")

Additivity forces the logarithm

Shannon's key insight: any function f(p) that satisfies f(p × q) = f(p) + f(q) must be a logarithm. This is the Cauchy functional equation. Adding the constraints that f is continuous and f(1/2) = 1 (defining the bit), the unique solution is f(p) = −log₂(p).

Scheme

; Verify additivity: I(p*q) = I(p) + I(q)
(define (log2 x) (/ (log x) (log 2)))
(define (self-info p) (- (log2 p)))

; Two independent events
(define p 0.3)
(define q 0.2)

(display "I(p):     ") (display (self-info p)) (newline)
(display "I(q):     ") (display (self-info q)) (newline)
(display "I(p) + I(q): ") (display (+ (self-info p) (self-info q))) (newline)
(display "I(p*q):      ") (display (self-info (* p q))) (newline)

; They match: log turns multiplication into addition
(display "equal? ")
(display (< (abs (- (+ (self-info p) (self-info q))
                    (self-info (* p q))))
            0.0001))

Choosing the base

The base of the logarithm picks the unit. Base 2 gives bits (Shannon's choice for digital communication). Base e gives nats (convenient for calculus). Base 10 gives hartleys. They differ only by a constant factor. The structure is the same.

Scheme

; Different bases = different units, same structure
(define (log2 x) (/ (log x) (log 2)))
(define (log10 x) (/ (log x) (log 10)))

(define p (/ 1.0 8))

(display "bits (base 2):     ")
(display (- (log2 p))) (newline)

(display "nats (base e):     ")
(display (- (log p))) (newline)

(display "hartleys (base 10): ")
(display (- (log10 p))) (newline)

; Convert: 1 bit = ln(2) nats
(display "3 bits in nats:    ")
(display (* 3 (log 2)))

Notation reference

Symbol	Scheme	Meaning
I(x) = −log₂ P(x)	(self-info p)	Self-information / surprise
bit	log base 2	Unit when base = 2
nat	log base e	Unit when base = e
I(x,y) = I(x) + I(y)	(+ (self-info p) (self-info q))	Additivity (independent events)

Neighbors

📡 Shannon 02 — entropy is expected surprise
🍞 Baez & Fritz 2011 — entropy as a functor: the category-theoretic characterization
🧠 Lovelace Ch.7 Language — surprisal as a linking hypothesis between language models and reading data
Information content
Bit
🎰 Probability Ch.1 — probability is the foundation: self-information is -log P(event)

by june.kim Entropy →