Find the directions of maximum variance. Project the data onto those directions. The principal components are eigenvectors of the covariance matrix.
Covariance matrix
The covariance matrix C measures how pairs of features vary together. C[i,j] = E[(x_i - mean_i)(x_j - mean_j)]. Diagonal entries are variances; off-diagonal entries are covariances. PCA finds the eigenvectors of this matrix.
Scheme
; Covariance matrix of 2D data; C[i,j] = (1/n) * sum((x_i - mean_i)(x_j - mean_j))
(define data (list (list 12) (list 34) (list 56)
(list 23) (list 45)))
(define (mean lst)
(/ (let loop ((l lst) (s 0))
(if (null? l) s (loop (cdr l) (+ s (car l)))))
(length lst)))
(define (col data j)
(map (lambda (row) (list-ref row j)) data))
(define (cov xs ys)
(let ((mx (mean xs)) (my (mean ys)) (n (length xs)))
(/ (let loop ((xs xs) (ys ys) (s 0))
(if (null? xs) s
(loop (cdr xs) (cdr ys)
(+ s (* (- (car xs) mx)
(- (car ys) my))))))
n)))
(define x (col data 0))
(define y (col data 1))
(display "Cov matrix:") (newline)
(display " [") (display (cov x x)) (display " ")
(display (cov x y)) (display "]") (newline)
(display " [") (display (cov y x)) (display " ")
(display (cov y y)) (display "]")
Python
# Covariance matrix of 2D data
data = [[1,2],[3,4],[5,6],[2,3],[4,5]]
def mean(lst):
returnsum(lst) / len(lst)
def cov(xs, ys):
mx, my = mean(xs), mean(ys)
returnsum((x-mx)*(y-my) for x,y inzip(xs,ys)) / len(xs)
x = [row[0] for row in data]
y = [row[1] for row in data]
print("Cov matrix:")
print(" [{:.2f} {:.2f}]".format(cov(x,x), cov(x,y)))
print(" [{:.2f} {:.2f}]".format(cov(y,x), cov(y,y)))
Eigendecomposition
The eigenvectors of the covariance matrix point in the directions of maximum variance. The eigenvalues tell you how much variance each direction captures. For a 2x2 matrix, we can solve the characteristic equation directly: det(C - lambda*I) = 0.
To reduce dimensionality, project each data point onto the top k principal components. The projection is a dot product with each eigenvector. This gives a k-dimensional representation that preserves the most variance from the original data.
Scheme
; Project 2D data onto PC1 = [0.707, 0.707]; This reduces 2D -> 1D while keeping max variance
(define (dot xs ys)
(let loop ((xs xs) (ys ys) (s 0))
(if (null? xs) s
(loop (cdr xs) (cdr ys)
(+ s (* (car xs) (car ys)))))))
(define pc1 (list 0.70710.7071))
(define data (list (list 12) (list 34) (list 56)
(list 23) (list 45)))
; Center the data first
(define mean-x 3.0)
(define mean-y 4.0)
(define (center pt)
(list (- (car pt) mean-x) (- (cadr pt) mean-y)))
(define (project pt)
(dot (center pt) pc1))
(display "Projections onto PC1:") (newline)
(for-each (lambda (pt)
(display " ") (display pt) (display " -> ")
(display (/ (round (* (project pt) 1000)) 1000))
(newline))
data)
Python
# Project 2D data onto PC1 = [0.707, 0.707]
data = [[1,2],[3,4],[5,6],[2,3],[4,5]]
mean_x, mean_y = 3.0, 4.0
pc1 = [0.7071, 0.7071]
def project(pt):
centered = [pt[0]-mean_x, pt[1]-mean_y]
returnsum(a*b for a,b inzip(centered, pc1))
print("Projections onto PC1:")
for pt in data:
print(" {} -> {:.3f}".format(pt, project(pt)))
Notation reference
Math
Scheme
Meaning
C = XTX / n
(cov x y)
Covariance matrix
λ
l1, l2
Eigenvalue (variance along PC)
v · x
(dot v x)
Projection onto eigenvector
λ1 / Σλ
(/ l1 (+ l1 l2))
Fraction of variance explained
Translation notes
PCA is a change of basis that diagonalizes the covariance matrix. The principal components form an orthonormal basis aligned with the data's natural axes. Dimensionality reduction is projection onto a subspace -- the same operation as in linear algebra, but chosen to maximize information retention. The embedding gap explores what happens when this kind of projection fails to capture the structure that matters.
This chapter covers the 2D intuition. For high-dimensional PCA, SVD-based computation, and connections to factor analysis, see Shalev-Shwartz & Ben-David Ch.23 or Bishop Ch.12.