← back to machine learning

PCA

Machine Learning ยท Ch.6 of 12

Find the directions of maximum variance. Project the data onto those directions. The principal components are eigenvectors of the covariance matrix.

PC1 PC2 PC1 captures most variance — PC2 is orthogonal

Covariance matrix

The covariance matrix C measures how pairs of features vary together. C[i,j] = E[(x_i - mean_i)(x_j - mean_j)]. Diagonal entries are variances; off-diagonal entries are covariances. PCA finds the eigenvectors of this matrix.

Scheme

Eigendecomposition

The eigenvectors of the covariance matrix point in the directions of maximum variance. The eigenvalues tell you how much variance each direction captures. For a 2x2 matrix, we can solve the characteristic equation directly: det(C - lambda*I) = 0.

Scheme

Projecting data

To reduce dimensionality, project each data point onto the top k principal components. The projection is a dot product with each eigenvector. This gives a k-dimensional representation that preserves the most variance from the original data.

Scheme

Notation reference

Math Scheme Meaning
C = XTX / n(cov x y)Covariance matrix
λl1, l2Eigenvalue (variance along PC)
v · x(dot v x)Projection onto eigenvector
λ1 / Σλ(/ l1 (+ l1 l2))Fraction of variance explained

Translation notes

PCA is a change of basis that diagonalizes the covariance matrix. The principal components form an orthonormal basis aligned with the data's natural axes. Dimensionality reduction is projection onto a subspace -- the same operation as in linear algebra, but chosen to maximize information retention. The jkembedding gap explores what happens when this kind of projection fails to capture the structure that matters.

Neighbors

Ready for the real thing?

This chapter covers the 2D intuition. For high-dimensional PCA, SVD-based computation, and connections to factor analysis, see Shalev-Shwartz & Ben-David Ch.23 or Bishop Ch.12.