← back to information theory

Entropy as Functor

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

Prereqs: ๐Ÿ“ก Entropy, ๐Ÿž Baez-Fritz-Leinster 2011. 5 min.

Shannon entropy is the unique functorial information measure. Baez, Fritz, and Leinster proved: any continuous function from finite probability spaces to real numbers that satisfies the chain rule and maximality must be Shannon entropy (up to a constant). Shannon did not choose entropy. The axioms forced it.

Entropy respects composition

The chain rule says H(X,Y) = H(X) + H(Y|X). When you process data in two steps, the total information equals the information from step 1 plus the conditional information from step 2. This is functoriality: the measure of the composite equals the sum of the measures of the parts.

FinProb P Q R f g g;f H R H(f) H(g) H(g;f) + H(g ; f) = H(f) + H(g | f)
Scheme

The three axioms

Baez-Fritz-Leinster proved that Shannon entropy is the unique function satisfying three properties:

  1. Continuity — small changes in probabilities produce small changes in entropy
  2. Maximality — the uniform distribution has the highest entropy among all distributions on n outcomes
  3. Chain rule — H(X,Y) = H(X) + H(Y|X). Composition is additive.

Drop any one axiom and other measures become possible. All three together force Shannon's formula.

Scheme

The payoff: foundations meet category theory

This is the bridge page. Shannon defined entropy operationally in 1948. Baez-Fritz-Leinster proved in 2011 that it is the unique wpfunctor from the category of finite probability spaces to the real numbers that satisfies continuity, maximality, and the chain rule. The foundations (pages 1–7) are not arbitrary definitions. They are the only definitions that compose. jkThe Stoch/Giry framework extends this categorical view to stochastic channels, connecting entropy to the broader structure of probabilistic computation.

Scheme

Notation reference

Symbol Scheme Meaning
H(X,Y) = H(X) + H(Y|X)chain ruleEntropy is additive under composition
FinProblist of probabilitiesCategory of finite probability spaces
H : FinProb → R(entropy probs)Entropy as functor
F(f) = c · (H(p) − H(q))(info-loss p q)Information loss (unique up to scale)
H(g ; f) = H(f) + H(g|f)functorialityLoss of composite = sum of losses
Neighbors

Translation notes

All examples use finite uniform or explicit distributions. The Baez-Fritz-Leinster theorem works over FinProb, the category of finite probability distributions with measure-preserving maps as morphisms. The "functor" here maps each morphism to its information loss (a real number), and the theorem says this functor is unique. Renyi entropy, Tsallis entropy, and other generalizations satisfy some but not all three axioms. The chain rule is the one that eliminates them.

← Channels fin ยท 8 of 8