← back to info theory

Mutual information

Shannon 1948 (public domain) · Wikipedia (CC BY-SA 4.0)

Mutual information I(X;Y) = H(X) + H(Y) − H(X,Y) measures the information shared between two variables. It is always non-negative, and zero if and only if X and Y are independent.

The shared information

Mutual information answers: how much does knowing X tell you about Y? Equivalently: how much does the joint distribution differ from the product of marginals? Three equivalent formulas:

The first says mutual information is the overlap in the Venn diagram. The second says it is how much Y reduces uncertainty about X. The third is the symmetric version.

H(X|Y) H(Y|X) I(X;Y) shared H(X) H(Y) H(X,Y)
Scheme

Non-negativity and independence

Mutual information is always non-negative: I(X;Y) ≥ 0. It equals zero if and only if X and Y are independent, because independent variables share no information. This follows from the fact that H(X|Y) ≤ H(X): conditioning never increases entropy.

Scheme

Symmetry

Mutual information is symmetric: I(X;Y) = I(Y;X). Knowing X tells you as much about Y as knowing Y tells you about X. This is not true of conditional entropy: H(X|Y) and H(Y|X) can differ.

Scheme

Notation reference

Symbol Scheme Meaning
I(X;Y) = H(X) + H(Y) − H(X,Y)(mutual-info joint mx my)Mutual information
I(X;Y) = H(X) − H(X|Y)(- h-x h-x-given-y)Reduction in uncertainty
I(X;Y) ≥ 0(>= mi 0)Non-negativity
I(X;Y) = 0 iff independent(= mi 0)Zero means no shared info
Neighbors