What is ML?

Deisenroth et al., Mathematics for Machine Learning (CC BY 4.0) · mml-book.github.io

ML is function approximation from data. Supervised learning fits a function from labeled examples. Unsupervised learning discovers structure without labels. Reinforcement learning learns from reward signals. Every ML algorithm is choosing a function from a hypothesis space that best explains the data it has seen.

Train/test split

The most basic discipline in ML: never evaluate on the data you trained on. Hold out a portion for testing. The training set teaches; the test set judges.

Scheme

; Train/test split — hold out data for evaluation
; We'll use a simple dataset: (input, label) pairs

(define data '((1 3) (2 5) (3 7) (4 9) (5 11)
               (6 13) (7 15) (8 17) (9 19) (10 21)))

; Split: first 7 for training, last 3 for testing
(define (take lst n)
  (if (or (= n 0) (null? lst)) '()
      (cons (car lst) (take (cdr lst) (- n 1)))))

(define (drop lst n)
  (if (or (= n 0) (null? lst)) lst
      (drop (cdr lst) (- n 1))))

(define train (take data 7))
(define test (drop data 7))

(display "Training set: ") (display train) (newline)
(display "Test set: ") (display test) (newline)
(display "Train size: ") (display (length train)) (newline)
(display "Test size: ") (display (length test))

Nearest-neighbor classifier

The simplest possible classifier: given a new point, find the closest training point and copy its label. No parameters to learn. The training data is the model.

Scheme

; Nearest-neighbor classifier
; Dataset: (x, y, label) — two classes
(define data '((1 1 "A") (2 3 "A") (3 2 "A")
               (6 7 "B") (7 6 "B") (8 8 "B")))

(define (dist p1 p2)
  (sqrt (+ (expt (- (car p1) (car p2)) 2)
           (expt (- (cadr p1) (cadr p2)) 2))))

(define (nearest-neighbor query data)
  (define (closer? a b)
    (< (dist query a) (dist query b)))
  (define (find-best best rest)
    (if (null? rest) best
        (find-best (if (closer? (car rest) best)
                       (car rest) best)
                   (cdr rest))))
  (find-best (car data) (cdr data)))

(define query '(4 4))
(define nn (nearest-neighbor query data))
(display "Query: ") (display query) (newline)
(display "Nearest: ") (display nn) (newline)
(display "Predicted class: ") (display (caddr nn))

Evaluating accuracy

Accuracy is the fraction of test examples the model gets right. It is the most basic metric — good enough to start, too coarse for serious work. Function approximation alone is not intelligence; that requires closed feedback loops where the system consolidates what it has learned back into its own processing.

Scheme

; Evaluate accuracy of nearest-neighbor
(define train '((1 1 "A") (2 3 "A") (3 2 "A")
                (6 7 "B") (7 6 "B") (8 8 "B")))

(define test '((2 2 "A") (5 5 "B") (7 7 "B") (1 3 "A")))

(define (dist p1 p2)
  (sqrt (+ (expt (- (car p1) (car p2)) 2)
           (expt (- (cadr p1) (cadr p2)) 2))))

(define (predict query train)
  (define (find-best best rest)
    (if (null? rest) (caddr best)
        (find-best (if (< (dist query (car rest))
                          (dist query best))
                       (car rest) best)
                   (cdr rest))))
  (find-best (car train) (cdr train)))

(define (accuracy test train)
  (define (count-correct remaining correct)
    (if (null? remaining)
        (/ correct (length test))
        (let ((point (car remaining))
              (true-label (caddr (car remaining))))
          (count-correct (cdr remaining)
                         (if (equal? (predict point train)
                                     true-label)
                             (+ correct 1) correct)))))
  (count-correct test 0))

(define acc (accuracy test train))
(display "Correct: ") (display (* acc (length test)))
(display " / ") (display (length test)) (newline)
(display "Accuracy: ") (display (* acc 100)) (display "%")

; Evaluate accuracy of nearest-neighbor
(define train '((1 1 "A") (2 3 "A") (3 2 "A")
                (6 7 "B") (7 6 "B") (8 8 "B")))

(define test '((2 2 "A") (5 5 "B") (7 7 "B") (1 3 "A")))

(define (dist p1 p2)
  (sqrt (+ (expt (- (car p1) (car p2)) 2)
           (expt (- (cadr p1) (cadr p2)) 2))))

(define (predict query train)
  (define (find-best best rest)
    (if (null? rest) (caddr best)
        (find-best (if (< (dist query (car rest))
                          (dist query best))
                       (car rest) best)
                   (cdr rest))))
  (find-best (car train) (cdr train)))

(define (accuracy test train)
  (define (count-correct remaining correct)
    (if (null? remaining)
        (/ correct (length test))
        (let ((point (car remaining))
              (true-label (caddr (car remaining))))
          (count-correct (cdr remaining)
                         (if (equal? (predict point train)
                                     true-label)
                             (+ correct 1) correct)))))
  (count-correct test 0))

(define acc (accuracy test train))
(display "Correct: ") (display (* acc (length test)))
(display " / ") (display (length test)) (newline)
(display "Accuracy: ") (display (* acc 100)) (display "%")

Notation reference

Math	Scheme	Python	Meaning
f: X → Y	(define (f x) ...)	def f(x): ...	Function mapping
(xᵢ, yᵢ)	(list xi yi)	(xi, yi)	Labeled example
‖x - y‖	(dist x y)	dist(x, y)	Euclidean distance
argmin	(find-best ...)	min(..., key=...)	Index of minimum
accuracy	(/ correct n)	correct / n	Fraction correct

Translation notes

Nearest-neighbor is pure functional: no mutation, no learned parameters. The training data is passed as an argument, making the model explicit. Python's min with a key function is the imperative equivalent of the recursive find-best. Both are linear scans — O(n) per query.

Real ML frameworks (scikit-learn, PyTorch) add indexing structures for speed, but the core idea is identical: find the function that best explains the data.

Neighbors

🪄 SICP Ch.1 — expressions as data: the foundation for representing training examples
🎲 Grinstead Ch.1 — sample spaces: the probabilistic framing behind every ML model
📊 Statistics Ch.1 — statistics and ML share the same core problem: inferring patterns from data
🎰 Probability Ch.1 — the probabilistic framing underlying most ML models
📐 Linear Algebra Ch.1 — the mathematical backbone of most ML algorithms

Ready for the real thing? Read Mathematics for Machine Learning Ch. 8 and Dive into Deep Learning Ch. 1.

by june.kim Linear Regression · 2 of 12 →