Central Limit Theorem

Grinstead & Snell · GFDL · PDF

Standardize the sum of n independent random variables: subtract the mean, divide by σ√n. As n grows, the distribution converges to N(0, 1). This is why the normal distribution appears everywhere.

The standardized sum

Let X₁, X₂, …, Xₙ be independent random variables with mean μ and variance σ². Their sum Sₙ = X₁ + … + Xₙ has mean nμ and variance nσ². The standardized sum is Sₙ* = (Sₙ − nμ) / (σ√n). The CLT says the distribution of Sₙ* converges to the standard normal.

Scheme

; Central Limit Theorem: standardize sums of dice rolls
; S_n* = (S_n - n*mu) / (sigma * sqrt(n))

; Simple pseudo-random via linear congruential generator
(define *seed* 42)
(define (rand!)
  (set! *seed* (modulo (+ (* 1103515245 *seed*) 12345) 2147483648))
  *seed*)

; Roll a fair die (1-6)
(define (roll-die) (+ 1 (modulo (rand!) 6)))

; Sum of n dice rolls, standardized
; mu = 3.5, sigma^2 = 35/12, sigma = sqrt(35/12) ~ 1.708
(define mu 3.5)
(define sigma 1.708)

(define (standardized-sum n)
  (let loop ((i 0) (total 0))
    (if (= i n) (/ (- total (* n mu)) (* sigma (sqrt n)))
        (loop (+ i 1) (+ total (roll-die))))))

; Show how the standardized sum concentrates near 0
(display "n=1:   ") (display (standardized-sum 1)) (newline)
(display "n=10:  ") (display (standardized-sum 10)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
; As n grows, values cluster around 0 — the CLT in action

; Central Limit Theorem: standardize sums of dice rolls
; S_n* = (S_n - n*mu) / (sigma * sqrt(n))

; Simple pseudo-random via linear congruential generator
(define *seed* 42)
(define (rand!)
  (set! *seed* (modulo (+ (* 1103515245 *seed*) 12345) 2147483648))
  *seed*)

; Roll a fair die (1-6)
(define (roll-die) (+ 1 (modulo (rand!) 6)))

; Sum of n dice rolls, standardized
; mu = 3.5, sigma^2 = 35/12, sigma = sqrt(35/12) ~ 1.708
(define mu 3.5)
(define sigma 1.708)

(define (standardized-sum n)
  (let loop ((i 0) (total 0))
    (if (= i n) (/ (- total (* n mu)) (* sigma (sqrt n)))
        (loop (+ i 1) (+ total (roll-die))))))

; Show how the standardized sum concentrates near 0
(display "n=1:   ") (display (standardized-sum 1)) (newline)
(display "n=10:  ") (display (standardized-sum 10)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
(display "n=100: ") (display (standardized-sum 100)) (newline)
; As n grows, values cluster around 0 — the CLT in action

Continuity correction

When approximating a discrete distribution with the continuous normal, shift by ½. To find P(Sₙ = k), compute P(k − ½ < Sₙ < k + ½) under the normal curve. This continuity correction dramatically improves accuracy for small n.

Scheme

; Continuity correction example
; Flip 100 fair coins. P(exactly 50 heads)?
; Exact: C(100,50) * 0.5^100 ~ 0.0796
; Without correction: P(S*=0) — a single point has zero area
; With correction: P(-0.5 < S_n < 0.5) after standardizing

; Normal CDF approximation (Abramowitz & Stegun)
(define (phi z)
  (let* ((az (abs z))
         (t (/ 1 (+ 1 (* 0.2316419 az))))
         (d (* 0.3989423 (exp (* -0.5 z z))))
         (p (* d t (+ 0.3193815
                (* t (+ -0.3565638
                   (* t (+ 1.781478
                      (* t (+ -1.8212560
                         (* t 1.3302744)))))))))))
    (if (> z 0) (- 1 p) p)))

; n=100 coin flips: mu=50, sigma=5
(define n 100)
(define mu-coins 50)
(define sig-coins 5)

; Standardize with continuity correction
(define z-lo (/ (- 49.5 mu-coins) sig-coins))
(define z-hi (/ (- 50.5 mu-coins) sig-coins))

(display "P(exactly 50 heads) ~ ")
(display (- (phi z-hi) (phi z-lo))) (newline)
; Should be close to 0.0796 (exact answer)

; Without correction: P(50 < S < 50) = 0
(display "Without correction: 0 (point has no area)")

; Continuity correction example
; Flip 100 fair coins. P(exactly 50 heads)?
; Exact: C(100,50) * 0.5^100 ~ 0.0796
; Without correction: P(S*=0) — a single point has zero area
; With correction: P(-0.5 < S_n < 0.5) after standardizing

; Normal CDF approximation (Abramowitz & Stegun)
(define (phi z)
  (let* ((az (abs z))
         (t (/ 1 (+ 1 (* 0.2316419 az))))
         (d (* 0.3989423 (exp (* -0.5 z z))))
         (p (* d t (+ 0.3193815
                (* t (+ -0.3565638
                   (* t (+ 1.781478
                      (* t (+ -1.8212560
                         (* t 1.3302744)))))))))))
    (if (> z 0) (- 1 p) p)))

; n=100 coin flips: mu=50, sigma=5
(define n 100)
(define mu-coins 50)
(define sig-coins 5)

; Standardize with continuity correction
(define z-lo (/ (- 49.5 mu-coins) sig-coins))
(define z-hi (/ (- 50.5 mu-coins) sig-coins))

(display "P(exactly 50 heads) ~ ")
(display (- (phi z-hi) (phi z-lo))) (newline)
; Should be close to 0.0796 (exact answer)

; Without correction: P(50 < S < 50) = 0
(display "Without correction: 0 (point has no area)")

Why the normal distribution?

The normal distribution is not arbitrary. It is the maximum-entropy distribution for a given mean and variance. If all you know about a quantity is its mean and variance, the least-presumptuous distribution is normal. The CLT makes this concrete: sums of independent variables lose all structure except mean and variance, so what remains is the max-entropy distribution.

Scheme

; The normal distribution maximizes entropy
; among all distributions with fixed mean and variance.
; H(X) = (1/2) * ln(2 * pi * e * sigma^2)

(define pi 3.141592653589793)
(define e 2.718281828459045)

(define (normal-entropy sigma)
  (* 0.5 (log (* 2 pi e sigma sigma))))

; Compare entropies for different sigmas
(display "sigma=1: H = ") (display (normal-entropy 1)) (newline)
(display "sigma=2: H = ") (display (normal-entropy 2)) (newline)
(display "sigma=5: H = ") (display (normal-entropy 5)) (newline)
; Wider spread = more entropy = more uncertainty
; No other distribution with the same sigma has higher entropy.

Notation reference

Textbook	Scheme	Meaning
Sₙ = X₁ + … + Xₙ	(loop ... (+ total (roll-die)))	Sum of n trials
Sₙ* = (Sₙ − nμ) / σ√n	(standardized-sum n)	Standardized sum
Φ(z)	(phi z)	Standard normal CDF
N(0, 1)	standard normal	Mean 0, variance 1

Neighbors

Probability chapters

🎰 Ch 8 — Law of Large Numbers (convergence of averages)
🎰 Ch 10 — Generating Functions (another route to the CLT proof)
🎰 Ch 7 — Sums of Random Variables (convolution)

Connections

Baez & Fritz 2011 — entropy characterization: the normal maximizes entropy for fixed mean and variance
∞ Lebl Ch.2 Sequences — the convergence theory underlying the CLT
Central Limit Theorem
Normal distribution

Translation notes

The standardized sum here uses a fixed pseudo-random generator, so the "randomness" is deterministic. A real demonstration would run thousands of trials and plot the histogram. The continuity correction uses an approximation to Φ(z) from Abramowitz & Stegun, accurate to about 5 decimal places. The textbook proves the CLT via moment generating functions (Chapter 10). The entropy characterization follows from constrained optimization with Lagrange multipliers.

Ready for the real thing? Read Chapter 9 of Grinstead & Snell.

The Handshake

← Law of Large Numbers by june.kim Generating Functions →