Distributions

OpenIntro Statistics · Ch. 4 · openintro.org/book/os

A distribution describes how values are spread across possible outcomes. The normal distribution is the workhorse of statistics: most sample means converge to it. Other distributions (binomial, Poisson, geometric) model counting, rare events, and waiting times.

The normal distribution and Z-scores

The normal distribution is symmetric, bell-shaped, and fully described by its mean (μ) and standard deviation (σ). A Z-score standardizes any value: Z = (x - μ) / σ. A Z-score of 2 means the value is 2 standard deviations above the mean.

Scheme

; Z-scores: standardize values
; Z = (x - mean) / sd

(define (z-score x mean sd)
  (/ (- x mean) sd))

; SAT scores: mean=1060, sd=195
(define sat-mean 1060)
(define sat-sd 195)

(define score-1 1250)
(define score-2 900)

(display "SAT score 1250, Z = ")
(display (exact->inexact (z-score score-1 sat-mean sat-sd)))
(newline)

(display "SAT score 900, Z = ")
(display (exact->inexact (z-score score-2 sat-mean sat-sd)))
(newline)

; Reverse: what score is at Z = 2?
(define (from-z z mean sd) (+ mean (* z sd)))
(display "Score at Z=2: ")
(display (from-z 2 sat-mean sat-sd))

Binomial distribution

The binomial distribution counts successes in n independent trials, each with probability p. P(X = k) = C(n,k) * p^k * (1-p)^(n-k). Mean = np, standard deviation = sqrt(np(1-p)).

Scheme

; Binomial distribution: n trials, probability p
; P(X = k) = C(n,k) * p^k * (1-p)^(n-k)

(define (factorial n)
  (if (<= n 1) 1 (* n (factorial (- n 1)))))

(define (choose n k)
  (/ (factorial n) (* (factorial k) (factorial (- n k)))))

(define (binomial-pmf n p k)
  (* (choose n k)
     (expt p k)
     (expt (- 1 p) (- n k))))

; Example: 10 coin flips, P(exactly 7 heads)
(define n 10)
(define p 0.5)

(display "P(7 heads in 10 flips) = ")
(display (exact->inexact (binomial-pmf n p 7))) (newline)

; Distribution of successes
(display "Full distribution:") (newline)
(let loop ((k 0))
  (when (<= k n)
    (display "  P(X=") (display k) (display ") = ")
    (display (exact->inexact (binomial-pmf n p k)))
    (newline)
    (loop (+ k 1))))

Poisson and geometric

The Poisson distribution models rare events per unit time: P(X = k) = e^(-λ) * λ^k / k!. The geometric distribution models waiting time until first success: P(X = k) = (1-p)^(k-1) * p.

Scheme

; Poisson: events per unit time
; P(X = k) = e^(-lambda) * lambda^k / k!

(define (factorial n)
  (if (<= n 1) 1 (* n (factorial (- n 1)))))

(define e 2.718281828)

(define (poisson-pmf lam k)
  (* (expt e (- lam))
     (/ (expt lam k) (factorial k))))

; Average 3 emails per hour. P(exactly 5)?
(display "Poisson(lambda=3):") (newline)
(display "  P(X=5) = ")
(display (poisson-pmf 3 5)) (newline)
(display "  P(X=0) = ")
(display (poisson-pmf 3 0)) (newline)

; Geometric: trials until first success
; P(X = k) = (1-p)^(k-1) * p
(define (geometric-pmf p k)
  (* (expt (- 1 p) (- k 1)) p))

; P(first success on 4th trial), p=0.2
(newline)
(display "Geometric(p=0.2):") (newline)
(display "  P(first success on trial 4) = ")
(display (geometric-pmf 0.2 4))

Neighbors

Related chapters

🎰 Grinstead Ch.5 — important distributions derived from first principles
🎰 Grinstead Ch.9 — the Central Limit Theorem, why means converge to normal

Foundations (Wikipedia)

Translation notes

OpenIntro devotes significant space to normal probability tables and calculator use. We compute Z-scores and probabilities directly. The 68-95-99.7 rule (empirical rule) is the key heuristic for the normal distribution. For the Central Limit Theorem that justifies the normal's dominance, see Grinstead Ch. 9.

Want the full treatment? Read OpenIntro Statistics, Ch. 4.

← Probability by june.kim Foundations for Inference →