← back to info theory

Joint and conditional entropy

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

Joint entropy H(X,Y) measures the total uncertainty of two variables together. Conditional entropy H(X|Y) = H(X,Y) − H(Y) is what remains uncertain about X after observing Y. Conditioning never increases entropy.

Joint entropy

The joint entropy H(X,Y) = −∑ P(x,y) log2 P(x,y) measures the total surprise of observing both X and Y together. If X and Y are independent, H(X,Y) = H(X) + H(Y). If they are dependent, the joint entropy is less: shared structure reduces total uncertainty.

H(X|Y) H(Y|X) I(X;Y) H(X) H(Y) H(X,Y)
Scheme

Conditional entropy

H(X|Y) = H(X,Y) − H(Y) tells you how much uncertainty remains about X once you know Y. This is the chain rule of entropy: H(X,Y) = H(Y) + H(X|Y). Conditioning never increases entropy: H(X|Y) ≤ H(X). Knowing something can only help.

Scheme

The chain rule

The chain rule generalizes: H(X1, ..., Xn) = H(X1) + H(X2|X1) + ... + H(Xn|X1,...,Xn-1). Each new variable adds only its residual uncertainty, conditioned on everything before it.

Scheme

Notation reference

Symbol Scheme Meaning
H(X,Y)(entropy joint)Joint entropy
H(X|Y) = H(X,Y) − H(Y)(- h-joint h-y)Conditional entropy
H(X,Y) = H(X) + H(Y|X)(+ h-x h-y-given-x)Chain rule
H(X|Y) ≤ H(X)(<= h-x-given-y h-x)Conditioning reduces entropy
Neighbors