PCA

Machine Learning · Ch.6 of 12

Find the directions of maximum variance. Project the data onto those directions. The principal components are eigenvectors of the covariance matrix.

Covariance matrix

The covariance matrix C measures how pairs of features vary together. C[i,j] = E[(x_i - mean_i)(x_j - mean_j)]. Diagonal entries are variances; off-diagonal entries are covariances. PCA finds the eigenvectors of this matrix.

Scheme

; Covariance matrix of 2D data
; C[i,j] = (1/n) * sum((x_i - mean_i)(x_j - mean_j))

(define data (list (list 1 2) (list 3 4) (list 5 6)
                   (list 2 3) (list 4 5)))

(define (mean lst)
  (/ (let loop ((l lst) (s 0))
       (if (null? l) s (loop (cdr l) (+ s (car l)))))
     (length lst)))

(define (col data j)
  (map (lambda (row) (list-ref row j)) data))

(define (cov xs ys)
  (let ((mx (mean xs)) (my (mean ys)) (n (length xs)))
    (/ (let loop ((xs xs) (ys ys) (s 0))
         (if (null? xs) s
             (loop (cdr xs) (cdr ys)
                   (+ s (* (- (car xs) mx)
                           (- (car ys) my))))))
       n)))

(define x (col data 0))
(define y (col data 1))

(display "Cov matrix:") (newline)
(display "  [") (display (cov x x)) (display "  ")
(display (cov x y)) (display "]") (newline)
(display "  [") (display (cov y x)) (display "  ")
(display (cov y y)) (display "]")

Eigendecomposition

The eigenvectors of the covariance matrix point in the directions of maximum variance. The eigenvalues tell you how much variance each direction captures. For a 2x2 matrix, we can solve the characteristic equation directly: det(C - lambda*I) = 0.

Scheme

; Eigenvalues of 2x2 matrix [[a,b],[c,d]]
; Characteristic equation: lambda^2 - (a+d)*lambda + (a*d - b*c) = 0

(define (eigen-2x2 a b c d)
  (let* ((trace (+ a d))
         (det (- (* a d) (* b c)))
         (disc (sqrt (- (* trace trace) (* 4 det))))
         (l1 (/ (+ trace disc) 2))
         (l2 (/ (- trace disc) 2)))
    (list l1 l2)))

; Covariance matrix from above: [[2.0, 2.0], [2.0, 2.0]]
(define evals (eigen-2x2 2.0 2.0 2.0 2.0))

(display "eigenvalue 1: ") (display (car evals)) (newline)
(display "eigenvalue 2: ") (display (cadr evals)) (newline)

; Eigenvector for l1: (C - l1*I)v = 0
; For [[2-4, 2],[2, 2-4]] = [[-2,2],[2,-2]]
; v1 = [1, 1] (normalized: [0.707, 0.707])
(display "PC1 direction: [0.707, 0.707]") (newline)
(display "PC2 direction: [0.707, -0.707]") (newline)
(display "Variance explained by PC1: ")
(display (* 100 (/ (car evals) (+ (car evals) (cadr evals)))))
(display "%")

Projecting data

To reduce dimensionality, project each data point onto the top k principal components. The projection is a dot product with each eigenvector. This gives a k-dimensional representation that preserves the most variance from the original data.

Scheme

; Project 2D data onto PC1 = [0.707, 0.707]
; This reduces 2D -> 1D while keeping max variance

(define (dot xs ys)
  (let loop ((xs xs) (ys ys) (s 0))
    (if (null? xs) s
        (loop (cdr xs) (cdr ys)
              (+ s (* (car xs) (car ys)))))))

(define pc1 (list 0.7071 0.7071))

(define data (list (list 1 2) (list 3 4) (list 5 6)
                   (list 2 3) (list 4 5)))

; Center the data first
(define mean-x 3.0)
(define mean-y 4.0)

(define (center pt)
  (list (- (car pt) mean-x) (- (cadr pt) mean-y)))

(define (project pt)
  (dot (center pt) pc1))

(display "Projections onto PC1:") (newline)
(for-each (lambda (pt)
  (display "  ") (display pt) (display " -> ")
  (display (/ (round (* (project pt) 1000)) 1000))
  (newline))
  data)

Notation reference

Math	Scheme	Meaning
C = X^TX / n	(cov x y)	Covariance matrix
λ	l1, l2	Eigenvalue (variance along PC)
v · x	(dot v x)	Projection onto eigenvector
λ₁ / Σλ	(/ l1 (+ l1 l2))	Fraction of variance explained

Translation notes

PCA is a change of basis that diagonalizes the covariance matrix. The principal components form an orthonormal basis aligned with the data's natural axes. Dimensionality reduction is projection onto a subspace -- the same operation as in linear algebra, but chosen to maximize information retention. The embedding gap explores what happens when this kind of projection fails to capture the structure that matters.

Neighbors

Linear Algebra Ch.4 — determinants and eigenvalues: the math behind PCA
ML Ch.5 Kernel Methods — kernel PCA extends this to nonlinear structure

Ready for the real thing?

This chapter covers the 2D intuition. For high-dimensional PCA, SVD-based computation, and connections to factor analysis, see Shalev-Shwartz & Ben-David Ch.23 or Bishop Ch.12.

← Kernel Methods by june.kim Gaussian Processes →