Data Processing Inequality
Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)
If X → Y → Z is a Markov chain, then I(X;Z) ≤ I(X;Y). Processing cannot create information. Every step in a pipeline can only lose information or preserve it, never gain it. Equivalently: functors can't increase information.
The Markov chain condition
X → Y → Z means Z depends on X only through Y. Given Y, Z is conditionally independent of X. Whatever Y forgets about X is gone forever. No downstream processing of Y can recover it.
Why it matters
DPI is the reason compression and summarization are lossy. If your pipeline is X → Y → Z, then Z cannot know more about X than Y does. No clever algorithm applied to Y can recover what Y already lost. This bounds every learning algorithm, every codec, every statistical estimator. It also constrains
multi-stage embedding pipelines: each stage in the chain can only preserve or destroy information, never create it.
The functorial perspective
In categorical terms, a deterministic function f : X → Y is a functor. DPI says this functor cannot increase mutual information. Baez-Fritz-Leinster proved that entropy is the unique such functor. DPI is the axiom that makes this characterization work.
Notation reference
| Symbol | Scheme | Meaning |
|---|---|---|
| X → Y → Z | chain of channels | Markov chain |
| I(X;Z) ≤ I(X;Y) | (<= ixz ixy) | Data processing inequality |
| H(f(X)) ≤ H(X) | (<= hfx hx) | Entropy can't increase under functions |
| f : X → Y | (lambda (x) ...) | Deterministic channel (functor) |
Neighbors
- 📡 Mutual information — the quantity that DPI bounds
- 📡 Channels and capacity — DPI constrains what channels can transmit
- 🍞 Baez, Fritz, Leinster 2011 — DPI is the key axiom in their characterization
- 🍞 Panangaden 2009 — labelled Markov processes and information flow
Data processing inequality
Markov chain
Translation notes
All examples use discrete finite distributions. The data processing inequality holds for continuous distributions, Markov kernels, and general measurable spaces. The inequality is tight (equality) when Y is a sufficient statistic for X with respect to Z. Every example computes entropy directly from probability vectors rather than simulating the Markov chain.