Kernel Methods & SVMs

Machine Learning · Ch.5 of 12

Map data to a higher-dimensional space where it becomes linearly separable. The kernel trick computes the inner product in that space without ever visiting it. The SVM maximizes the margin between classes.

Polynomial kernel

A kernel k(x,y) computes the inner product in a feature space without explicitly mapping there. The polynomial kernel k(x,y) = (x·y + c)^d implicitly maps to all monomials up to degree d. Higher degree means a more flexible decision boundary.

Scheme

; Polynomial kernel: k(x,y) = (dot(x,y) + c)^d
; For d=2, this implicitly maps to a space
; containing all degree-2 monomials

(define (dot xs ys)
  (let loop ((xs xs) (ys ys) (sum 0))
    (if (null? xs) sum
        (loop (cdr xs) (cdr ys)
              (+ sum (* (car xs) (car ys)))))))

(define (poly-kernel x y c d)
  (expt (+ (dot x y) c) d))

; Two 2D points
(define x1 (list 1 2))
(define x2 (list 3 4))

(display "linear (d=1): ")
(display (poly-kernel x1 x2 1 1)) (newline)

(display "quadratic (d=2): ")
(display (poly-kernel x1 x2 1 2)) (newline)

(display "cubic (d=3): ")
(display (poly-kernel x1 x2 1 3))

RBF kernel

The radial basis function (RBF) kernel k(x,y) = exp(-gamma * ||x-y||²) maps to an infinite-dimensional space. It measures similarity: nearby points get kernel values close to 1, distant points approach 0. The parameter gamma controls the width of the Gaussian bump.

Scheme

; RBF kernel: k(x,y) = exp(-gamma * ||x-y||^2)

(define (sq-dist xs ys)
  (let loop ((xs xs) (ys ys) (sum 0))
    (if (null? xs) sum
        (loop (cdr xs) (cdr ys)
              (+ sum (expt (- (car xs) (car ys)) 2))))))

(define (rbf-kernel x y gamma)
  (exp (* (- gamma) (sq-dist x y))))

(define a (list 0 0))
(define b (list 1 0))
(define c (list 5 5))

(define gamma 0.5)

(display "k(a,a) = ")
(display (rbf-kernel a a gamma)) (newline)  ; 1.0

(display "k(a,b) = ")
(display (/ (round (* (rbf-kernel a b gamma) 10000)) 10000)) (newline)

(display "k(a,c) = ")
(display (/ (round (* (rbf-kernel a c gamma) 10000)) 10000))

SVM decision boundary via kernel

The SVM decision function is f(x) = sign(sum of alpha_i * y_i * k(x_i, x) + b), where the sum runs over support vectors. Only a few training points (the support vectors) determine the boundary. This is classification without ever computing coordinates in feature space.

Scheme

; Toy SVM: classify using support vectors + RBF kernel
; Pre-trained alphas and bias for XOR-like data

(define (rbf x y gamma)
  (exp (* (- gamma)
          (+ (expt (- (car x) (car y)) 2)
             (expt (- (cadr x) (cadr y)) 2)))))

; Support vectors, labels, alphas (pretrained)
(define svs   (list (list 0 0) (list 1 1) (list 0 1) (list 1 0)))
(define labels (list -1 -1 1 1))
(define alphas (list 1.0 1.0 1.0 1.0))
(define bias 0.0)
(define gamma 2.0)

(define (svm-predict x)
  (let loop ((sv svs) (lb labels) (al alphas) (sum bias))
    (if (null? sv) (if (>= sum 0) 1 -1)
        (loop (cdr sv) (cdr lb) (cdr al)
              (+ sum (* (car al) (car lb)
                        (rbf x (car sv) gamma)))))))

(display "predict (0,0): ") (display (svm-predict (list 0 0))) (newline)
(display "predict (1,1): ") (display (svm-predict (list 1 1))) (newline)
(display "predict (0,1): ") (display (svm-predict (list 0 1))) (newline)
(display "predict (1,0): ") (display (svm-predict (list 1 0)))

; Toy SVM: classify using support vectors + RBF kernel
; Pre-trained alphas and bias for XOR-like data

(define (rbf x y gamma)
  (exp (* (- gamma)
          (+ (expt (- (car x) (car y)) 2)
             (expt (- (cadr x) (cadr y)) 2)))))

; Support vectors, labels, alphas (pretrained)
(define svs   (list (list 0 0) (list 1 1) (list 0 1) (list 1 0)))
(define labels (list -1 -1 1 1))
(define alphas (list 1.0 1.0 1.0 1.0))
(define bias 0.0)
(define gamma 2.0)

(define (svm-predict x)
  (let loop ((sv svs) (lb labels) (al alphas) (sum bias))
    (if (null? sv) (if (>= sum 0) 1 -1)
        (loop (cdr sv) (cdr lb) (cdr al)
              (+ sum (* (car al) (car lb)
                        (rbf x (car sv) gamma)))))))

(display "predict (0,0): ") (display (svm-predict (list 0 0))) (newline)
(display "predict (1,1): ") (display (svm-predict (list 1 1))) (newline)
(display "predict (0,1): ") (display (svm-predict (list 0 1))) (newline)
(display "predict (1,0): ") (display (svm-predict (list 1 0)))

Notation reference

Math	Scheme	Meaning
k(x,y)	(kernel x y)	Kernel function (inner product in feature space)
(x·y + c)^d	(expt (+ (dot x y) c) d)	Polynomial kernel
exp(-γ\|\|x-y\|\|²)	(exp (* (- g) (sq-dist x y)))	RBF kernel
α_i y_i	(* alpha label)	Support vector weight

Translation notes

The kernel trick is a functor in disguise: it maps from a data category to a feature-space category, preserving inner-product structure. The SVM optimization problem is convex, so gradient descent finds the global minimum. Support vectors are the critical points that pin the decision boundary.

Neighbors

Linear Algebra Ch.2 — vector spaces: the "higher-dimensional space" the kernel maps into
Category Theory Ch.7 — functors as structure-preserving maps between spaces

Ready for the real thing?

This chapter covers the geometric intuition. For the full optimization story (dual problem, KKT conditions, soft margins), see Bishop's Pattern Recognition and Machine Learning Ch.7 or Scholkopf & Smola's Learning with Kernels.

← Logistic Regression by june.kim PCA →