Entropy as Functor
Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)
Prereqs: ๐ก Entropy, ๐ Baez-Fritz-Leinster 2011. 5 min.
Shannon entropy is the unique functorial information measure. Baez, Fritz, and Leinster proved: any continuous function from finite probability spaces to real numbers that satisfies the chain rule and maximality must be Shannon entropy (up to a constant). Shannon did not choose entropy. The axioms forced it.
Entropy respects composition
The chain rule says H(X,Y) = H(X) + H(Y|X). When you process data in two steps, the total information equals the information from step 1 plus the conditional information from step 2. This is functoriality: the measure of the composite equals the sum of the measures of the parts.
The three axioms
Baez-Fritz-Leinster proved that Shannon entropy is the unique function satisfying three properties:
- Continuity — small changes in probabilities produce small changes in entropy
- Maximality — the uniform distribution has the highest entropy among all distributions on n outcomes
- Chain rule — H(X,Y) = H(X) + H(Y|X). Composition is additive.
Drop any one axiom and other measures become possible. All three together force Shannon's formula.
The payoff: foundations meet category theory
This is the bridge page. Shannon defined entropy operationally in 1948. Baez-Fritz-Leinster proved in 2011 that it is the unique functor from the category of finite probability spaces to the real numbers that satisfies continuity, maximality, and the chain rule. The foundations (pages 1–7) are not arbitrary definitions. They are the only definitions that compose.
The Stoch/Giry framework extends this categorical view to stochastic channels, connecting entropy to the broader structure of probabilistic computation.
Notation reference
| Symbol | Scheme | Meaning |
|---|---|---|
| H(X,Y) = H(X) + H(Y|X) | chain rule | Entropy is additive under composition |
| FinProb | list of probabilities | Category of finite probability spaces |
| H : FinProb → R | (entropy probs) | Entropy as functor |
| F(f) = c · (H(p) − H(q)) | (info-loss p q) | Information loss (unique up to scale) |
| H(g ; f) = H(f) + H(g|f) | functoriality | Loss of composite = sum of losses |
Neighbors
- ๐ก Entropy — the definition that the uniqueness theorem vindicates
- ๐ Baez, Fritz, Leinster 2011 — the main theorem: entropy is the unique functorial measure
- ๐ Chen, Vigneaux 2023 — entropy equals magnitude, connecting information to size
- ๐ Leinster 2021 — entropy and diversity in ecology, same uniqueness result
Shannon entropy
Functor
Translation notes
All examples use finite uniform or explicit distributions. The Baez-Fritz-Leinster theorem works over FinProb, the category of finite probability distributions with measure-preserving maps as morphisms. The "functor" here maps each morphism to its information loss (a real number), and the theorem says this functor is unique. Renyi entropy, Tsallis entropy, and other generalizations satisfy some but not all three axioms. The chain rule is the one that eliminates them.