Law of Large Numbers

Grinstead & Snell · GFDL · PDF

The law of large numbers says: the sample mean converges to the true mean as the sample grows. Flip a fair coin a million times. The fraction of heads will be close to 0.5. Not because outcomes balance out, but because the noise gets drowned by the signal.

Chebyshev's inequality

Before proving the LLN, Grinstead and Snell establish the key tool: Chebyshev's inequality. For any random variable X with mean mu and variance sigma^2: P(|X - mu| ≥ k sigma) ≤ 1/k^2. The probability of being far from the mean is bounded by the variance. No distributional assumptions needed.

Scheme

; Chebyshev's inequality: P(|X - mu| >= k*sigma) <= 1/k^2
; No assumption about the shape of the distribution.

(define (chebyshev-bound k)
  ; At most 1/k^2 probability of being k standard deviations away
  (/ 1.0 (* k k)))

(display "Chebyshev bounds:") (newline)
(display "P(|X-mu| >= 2*sigma) <= ") (display (chebyshev-bound 2)) (newline)
(display "P(|X-mu| >= 3*sigma) <= ") (display (chebyshev-bound 3)) (newline)
(display "P(|X-mu| >= 5*sigma) <= ") (display (chebyshev-bound 5)) (newline)
(display "P(|X-mu| >= 10*sigma) <= ") (display (chebyshev-bound 10)) (newline)

; For the sample mean of n coin flips:
; Var(sample mean) = Var(X)/n = 0.25/n
; sigma(sample mean) = 0.5/sqrt(n)
; As n grows, sigma -> 0, so the bound tightens.
(display "sigma(mean) for n=100: ") (display (/ 0.5 (sqrt 100))) (newline)
(display "sigma(mean) for n=10000: ") (display (/ 0.5 (sqrt 10000)))

Weak law of large numbers

Let X_1, X_2, ..., X_n be independent with the same distribution, mean mu, and finite variance sigma^2. The sample mean S_n/n satisfies: for any epsilon > 0, P(|S_n/n - mu| ≥ epsilon) ≤ sigma^2 / (n epsilon^2). As n grows, this bound goes to zero. The sample mean converges in probability to the true mean.

Scheme

; Weak LLN: P(|S_n/n - mu| >= eps) <= sigma^2 / (n * eps^2)
; The bound shrinks as n grows.

(define (lln-bound sigma-sq n eps)
  (/ sigma-sq (* n eps eps)))

; Fair coin: sigma^2 = 0.25
(define sigma-sq 0.25)
(define eps 0.05)  ; within 5% of true mean

(display "P(|mean - 0.5| >= 0.05):") (newline)
(display "  n=100:   <= ") (display (lln-bound sigma-sq 100 eps)) (newline)
(display "  n=1000:  <= ") (display (lln-bound sigma-sq 1000 eps)) (newline)
(display "  n=10000: <= ") (display (lln-bound sigma-sq 10000 eps)) (newline)
(display "  n=100000:<= ") (display (lln-bound sigma-sq 100000 eps)) (newline)

; At n=10000, probability of being off by 5% is at most 1%.
; The bound is loose (actual probability much smaller)
; but it proves convergence for ANY distribution with finite variance.

Bernoulli's theorem

The LLN was first proved by Jacob Bernoulli in 1713, specifically for coin flips. His version: the fraction of heads in n flips of a coin with probability p converges to p. This is the special case of the weak LLN applied to Bernoulli trials (Ch 5). Bernoulli called it the "golden theorem." It took him twenty years to prove.

Scheme

; Bernoulli's theorem: special case of LLN for coin flips
; Fraction of heads -> p as n -> infinity

; Simulate convergence by computing running averages
; (We use a deterministic "pseudo-random" sequence for reproducibility)

(define (pseudo-flip seed p)
  ; Simple hash-based pseudo-random: returns 0 or 1
  (let ((hash (- (* seed 1103515245) (* seed seed) -12345)))
    (if (< (abs (remainder hash 1000)) (* p 1000)) 1 0)))

(define (running-mean p n-max)
  (let loop ((i 1) (sum 0))
    (if (> i n-max) 'done
        (let* ((flip (pseudo-flip i p))
               (new-sum (+ sum flip))
               (mean (/ (* 1.0 new-sum) i)))
          (if (= (remainder i (/ n-max 5)) 0)
              (begin
                (display "n=") (display i)
                (display "  mean=") (display mean)
                (display "  |error|=") (display (abs (- mean p)))
                (newline)))
          (loop (+ i 1) new-sum)))))

(display "Bernoulli trials, p=0.3:") (newline)
(running-mean 0.3 500)
; Error shrinks as n grows -- Bernoulli's theorem in action

; Bernoulli's theorem: special case of LLN for coin flips
; Fraction of heads -> p as n -> infinity

; Simulate convergence by computing running averages
; (We use a deterministic "pseudo-random" sequence for reproducibility)

(define (pseudo-flip seed p)
  ; Simple hash-based pseudo-random: returns 0 or 1
  (let ((hash (- (* seed 1103515245) (* seed seed) -12345)))
    (if (< (abs (remainder hash 1000)) (* p 1000)) 1 0)))

(define (running-mean p n-max)
  (let loop ((i 1) (sum 0))
    (if (> i n-max) 'done
        (let* ((flip (pseudo-flip i p))
               (new-sum (+ sum flip))
               (mean (/ (* 1.0 new-sum) i)))
          (if (= (remainder i (/ n-max 5)) 0)
              (begin
                (display "n=") (display i)
                (display "  mean=") (display mean)
                (display "  |error|=") (display (abs (- mean p)))
                (newline)))
          (loop (+ i 1) new-sum)))))

(display "Bernoulli trials, p=0.3:") (newline)
(running-mean 0.3 500)
; Error shrinks as n grows -- Bernoulli's theorem in action

What LLN does not say

The LLN does not say outcomes "balance out." After 10 heads in a row, the next flip is still 50/50. What happens is dilution: those 10 heads become a smaller fraction of the total as n grows. The law works by addition, not correction. The gambler's fallacy confuses these two.

Scheme

; The gambler's fallacy: outcomes don't "correct" themselves
; After 10 heads, the coin is still fair.

; Suppose we start with 10 heads in a row.
; Track how the running proportion evolves.

(define (dilution initial-heads fair-flips)
  ; initial-heads out of initial-heads trials already happened
  ; Then flip a fair coin fair-flips more times (assume exactly half heads)
  (let* ((total-n (+ initial-heads fair-flips))
         (total-heads (+ initial-heads (/ fair-flips 2)))
         (proportion (/ (* 1.0 total-heads) total-n)))
    proportion))

(display "Start: 10 heads in 10 flips = ")
(display (dilution 10 0)) (newline)
(display "After 10 more fair flips: ")
(display (dilution 10 10)) (newline)
(display "After 100 more: ")
(display (dilution 10 100)) (newline)
(display "After 1000 more: ")
(display (dilution 10 1000)) (newline)
(display "After 10000 more: ")
(display (dilution 10 10000)) (newline)
; Proportion approaches 0.5 by dilution, not correction

Notation reference

Notation	Scheme	Meaning
S_n / n	(/ sum n)	Sample mean
P(\|X-μ\| ≥ kσ) ≤ 1/k²	(chebyshev-bound k)	Chebyshev's inequality
σ²/(nε²)	(lln-bound var n eps)	Weak LLN convergence bound
X_n → μ in probability	bound → 0 as n → ∞	Convergence in probability

Neighbors

Probability chapters

🎰 Ch 7 — sums of random variables (the machinery behind LLN)
🎰 Ch 6 — expected value and variance (the quantities LLN connects)
🎰 Ch 5 — Bernoulli distribution (the original LLN setting)

Foundations (Wikipedia)

Ready for the real thing? Read Grinstead & Snell, Chapter 8.

The Handshake

← Sums of Variables by june.kim Central Limit Theorem →