Kernel Methods & SVMs
Machine Learning ยท Ch.5 of 12
Map data to a higher-dimensional space where it becomes linearly separable. The kernel trick computes the inner product in that space without ever visiting it. The SVM maximizes the margin between classes.
Polynomial kernel
A kernel k(x,y) computes the inner product in a feature space without explicitly mapping there. The polynomial kernel k(x,y) = (x·y + c)d implicitly maps to all monomials up to degree d. Higher degree means a more flexible decision boundary.
RBF kernel
The radial basis function (RBF) kernel k(x,y) = exp(-gamma * ||x-y||2) maps to an infinite-dimensional space. It measures similarity: nearby points get kernel values close to 1, distant points approach 0. The parameter gamma controls the width of the Gaussian bump.
SVM decision boundary via kernel
The SVM decision function is f(x) = sign(sum of alpha_i * y_i * k(x_i, x) + b), where the sum runs over support vectors. Only a few training points (the support vectors) determine the boundary. This is classification without ever computing coordinates in feature space.
Notation reference
| Math | Scheme | Meaning |
|---|---|---|
| k(x,y) | (kernel x y) | Kernel function (inner product in feature space) |
| (x·y + c)d | (expt (+ (dot x y) c) d) | Polynomial kernel |
| exp(-γ||x-y||²) | (exp (* (- g) (sq-dist x y))) | RBF kernel |
| αi yi | (* alpha label) | Support vector weight |
Translation notes
The kernel trick is a functor in disguise: it maps from a data category to a feature-space category, preserving inner-product structure. The SVM optimization problem is convex, so gradient descent finds the global minimum. Support vectors are the critical points that pin the decision boundary.
Neighbors
- Linear Algebra Ch.2 โ vector spaces: the "higher-dimensional space" the kernel maps into
- Category Theory Ch.7 โ functors as structure-preserving maps between spaces
Ready for the real thing?
This chapter covers the geometric intuition. For the full optimization story (dual problem, KKT conditions, soft margins), see Bishop's Pattern Recognition and Machine Learning Ch.7 or Scholkopf & Smola's Learning with Kernels.