Important Distributions

Grinstead & Snell · GFDL · PDF

A distribution is a recipe for randomness. The binomial counts successes, the Poisson counts rare events, the geometric waits for the first success, and the normal emerges when you add enough of anything together.

Binomial distribution

Flip a coin n times, each with success probability p. The number of successes k follows the binomial distribution: P(X = k) = C(n,k) p^k (1-p)^(n-k). Each bar in the diagram is one possible count of successes.

Scheme

; Binomial distribution: P(X = k) = C(n,k) * p^k * (1-p)^(n-k)
; n independent trials, each with success probability p

(define (factorial n)
  (if (<= n 1) 1 (* n (factorial (- n 1)))))

(define (choose n k)
  (/ (factorial n) (* (factorial k) (factorial (- n k)))))

(define (binomial-pmf n p k)
  (* (choose n k)
     (expt p k)
     (expt (- 1 p) (- n k))))

; n=10 trials, p=0.3 success probability
(define n 10)
(define p 0.3)

(display "P(X = 0) = ") (display (binomial-pmf n p 0)) (newline)
(display "P(X = 3) = ") (display (binomial-pmf n p 3)) (newline)
(display "P(X = 5) = ") (display (binomial-pmf n p 5)) (newline)

; Verify: all probabilities sum to 1
(define (sum-pmf k)
  (if (> k n) 0
      (+ (binomial-pmf n p k) (sum-pmf (+ k 1)))))
(display "Sum = ") (display (sum-pmf 0))

Poisson distribution

When n is large and p is small, the binomial approaches the Poisson distribution with parameter lambda = np. It counts rare events: typos per page, emails per hour, mutations per generation. P(X = k) = e^(-lambda) lambda^k / k!.

Scheme

; Poisson distribution: limit of binomial for rare events
; P(X = k) = e^(-lambda) * lambda^k / k!

(define (factorial n)
  (if (<= n 1) 1 (* n (factorial (- n 1)))))

(define e 2.718281828)

(define (poisson-pmf lam k)
  (* (expt e (- lam))
     (/ (expt lam k) (factorial k))))

; lambda = 3 (e.g., 3 typos per page on average)
(define lam 3)

(display "Poisson(3):") (newline)
(display "P(X = 0) = ") (display (poisson-pmf lam 0)) (newline)
(display "P(X = 1) = ") (display (poisson-pmf lam 1)) (newline)
(display "P(X = 3) = ") (display (poisson-pmf lam 3)) (newline)
(display "P(X = 5) = ") (display (poisson-pmf lam 5)) (newline)

; Compare with Binomial(1000, 0.003) -- should be close
(define (choose n k)
  (if (= k 0) 1 (* (/ (- n (- k 1)) k) (choose n (- k 1)))))
(define (binom-pmf n p k)
  (* (choose n k) (expt p k) (expt (- 1 p) (- n k))))

(display "Binomial(1000,0.003) at k=3: ")
(display (binom-pmf 1000 0.003 3))
; Close to Poisson(3) at k=3

Geometric distribution

How many trials until the first success? The geometric distribution answers this: P(X = k) = (1-p)^(k-1) p. It is memoryless. No matter how many failures you have seen, the probability of success on the next trial stays at p.

Scheme

; Geometric distribution: waiting for first success
; P(X = k) = (1-p)^(k-1) * p   (k = 1, 2, 3, ...)

(define (geometric-pmf p k)
  (* (expt (- 1 p) (- k 1)) p))

; p = 0.2 (20% chance each trial)
(define p 0.2)

(display "Geometric(0.2):") (newline)
(display "P(first on trial 1) = ") (display (geometric-pmf p 1)) (newline)
(display "P(first on trial 3) = ") (display (geometric-pmf p 3)) (newline)
(display "P(first on trial 5) = ") (display (geometric-pmf p 5)) (newline)
(display "P(first on trial 10) = ") (display (geometric-pmf p 10)) (newline)

; Expected number of trials = 1/p
(display "E[X] = 1/p = ") (display (/ 1.0 p))
; On average, 5 trials to first success

Normal (Gaussian) distribution

The normal distribution is continuous. Its PDF is f(x) = (1 / sqrt(2 pi sigma^2)) exp(-(x - mu)^2 / (2 sigma^2)). It arises whenever many small independent effects add up, which is why it appears everywhere. The Central Limit Theorem (Ch 9) explains why.

Scheme

; Normal distribution PDF
; f(x) = (1 / sqrt(2*pi*sigma^2)) * exp(-(x-mu)^2 / (2*sigma^2))

(define pi 3.14159265)
(define e 2.71828183)

(define (normal-pdf mu sigma x)
  (* (/ 1 (sqrt (* 2 pi sigma sigma)))
     (expt e (/ (- (* (- x mu) (- x mu)))
                (* 2 sigma sigma)))))

; Standard normal: mu=0, sigma=1
(display "Standard normal N(0,1):") (newline)
(display "f(0)  = ") (display (normal-pdf 0 1 0)) (newline)
(display "f(1)  = ") (display (normal-pdf 0 1 1)) (newline)
(display "f(-1) = ") (display (normal-pdf 0 1 -1)) (newline)
(display "f(2)  = ") (display (normal-pdf 0 1 2)) (newline)

; Wider distribution: sigma=2
(display "N(0,2):") (newline)
(display "f(0) = ") (display (normal-pdf 0 2 0)) (newline)
; Shorter peak, wider spread

PMF vs PDF

Discrete distributions have a probability mass function (PMF): P(X = k) gives the probability of each exact value. Continuous distributions have a probability density function (PDF): f(x) gives density, and probability comes from integrating over an interval. The density at a single point is not a probability.

Scheme

; PMF: each value has a probability (sums to 1)
; PDF: density function (integrates to 1)

; PMF example: fair die
(define (die-pmf k) (/ 1.0 6))

(display "Fair die PMF:") (newline)
(display "P(X=1) = ") (display (die-pmf 1)) (newline)
(display "P(X=4) = ") (display (die-pmf 4)) (newline)

; PDF example: approximate integral of standard normal from -1 to 1
; Using the rectangle rule with small steps
(define pi 3.14159265)
(define e 2.71828183)
(define (std-normal x)
  (* (/ 1 (sqrt (* 2 pi)))
     (expt e (/ (- (* x x)) 2))))

(define (integrate f a b steps)
  (let ((dx (/ (- b a) steps)))
    (let loop ((i 0) (sum 0))
      (if (>= i steps) (* sum dx)
          (loop (+ i 1) (+ sum (f (+ a (* dx (+ i 0.5))))))))))

(display "P(-1 < X < 1) ~ ") (display (integrate std-normal -1 1 100))
; About 0.6827 -- the 68% rule

Notation reference

Notation	Scheme	Meaning
B(n,p)	(binomial-pmf n p k)	Binomial: k successes in n trials
Pois(λ)	(poisson-pmf lam k)	Poisson: rare events at rate λ
Geom(p)	(geometric-pmf p k)	Geometric: trials until first success
N(μ,σ²)	(normal-pdf mu sigma x)	Normal: bell curve with mean μ, variance σ²
P(X = k)	(pmf k)	PMF: probability of exact value
f(x)	(pdf x)	PDF: probability density

Neighbors

Probability chapters

🎰 Ch 4 — conditional probability (needed for Bayes with these distributions)
🎰 Ch 6 — expected value and variance of these distributions
🎰 Ch 3 — combinatorics (the C(n,k) in binomial)
📊 Statistics Ch.4 — the same distributions used for inference
🤖 ML Ch.7 — Gaussian processes are distributions over functions
📡 Information Theory Ch.2 — entropy is defined over probability distributions like these

Foundations (Wikipedia)

Ready for the real thing? Read Grinstead & Snell, Chapter 5.

The Handshake

← Conditional Probability by june.kim Expected Value →