A probability function assigns a number between 0 and 1 to every outcome, and the numbers must add up to 1. That is the entire foundation. Everything else in probability is built on these two constraints.
Sample spaces and events
An experiment has a set of possible outcomes called the sample space (written Ω). An event is any subset of the sample space. Rolling a die: Ω = {1, 2, 3, 4, 5, 6}. "Rolling even" is the event {2, 4, 6}.
Scheme
; Sample space = list of outcomes; Event = subset (filtered list)
(define die '(123456))
(define (event? outcome)
(= 0 (modulo outcome 2)))
(define even-event (filter event? die))
(display "sample space: ") (display die) (newline)
(display "even event: ") (display even-event) (newline)
(display "size of event: ") (display (length even-event))
Python
# Sample space and events
die = [1, 2, 3, 4, 5, 6]
even_event = [x for x in die if x % 2 == 0]
print(f"sample space: {die}")
print(f"even event: {even_event}")
print(f"size of event: {len(even_event)}")
The probability function axioms
A probability function P assigns a weight to each outcome. Three axioms define it:
Non-negative: P(ω) ≥ 0 for every outcome ω
Sums to 1: the weights of all outcomes add to 1
Additive: for disjoint events A and B, P(A ∪ B) = P(A) + P(B)
The probability of an event is the sum of the probabilities of its outcomes.
Scheme
; Probability function: assign weights, check axioms
(define (make-prob-fn weights)
; weights is a list of (outcome . probability) pairs
(lambda (outcome)
(let ((pair (assoc outcome weights)))
(if pair (cdr pair) 0))))
; Fair die: each outcome gets 1/6
(define fair-die
(map (lambda (x) (cons x (/ 16))) '(123456)))
(define P (make-prob-fn fair-die))
; Check axiom 1: non-negative
(display "P(1) = ") (display (P 1)) (newline)
; Check axiom 2: sums to 1
(define total (apply + (map cdr fair-die)))
(display "sum = ") (display total) (newline)
; Probability of an event = sum of outcome probabilities
(define (prob-event P outcomes)
(apply + (map P outcomes)))
(display "P(even) = ")
(display (prob-event P '(246))) (newline)
(display "P(odd) = ")
(display (prob-event P '(135)))
Python
# Probability function on a fair die
probs = {i: 1/6for i inrange(1, 7)}
# Axiom 1: non-negativeprint(f"P(1) = {probs[1]:.4f}")
# Axiom 2: sums to 1print(f"sum = {sum(probs.values()):.4f}")
# P(event) = sum of outcome probs
even = [2, 4, 6]
print(f"P(even) = {sum(probs[x] for x in even):.4f}")
print(f"P(odd) = {sum(probs[x] for x in [1,3,5]):.4f}")
Uniform distribution on finite sets
When every outcome is equally likely, P(ω) = 1/n where n = |Ω|. The probability of an event E is just |E|/|Ω|. Counting favorable outcomes over total outcomes is the oldest probability calculation there is.
Scheme
; Uniform distribution: P(E) = |E| / |omega|; This is the "classical" definition of probability
(define (uniform-prob event sample-space)
(/ (length event) (length sample-space)))
; Two dice: sample space is all pairs
(define two-dice
(let loop ((i 1) (pairs '()))
(if (> i 6) pairs
(loop (+ i 1)
(append pairs
(map (lambda (j) (list i j))
'(123456)))))))
(display "total outcomes: ") (display (length two-dice)) (newline)
; Event: sum equals 7
(define sum-7 (filter (lambda (p) (= 7 (+ (car p) (cadr p)))) two-dice))
(display "ways to get 7: ") (display (length sum-7)) (newline)
(display "P(sum=7) = ") (display (uniform-prob sum-7 two-dice)) (newline)
; Event: doubles
(define doubles (filter (lambda (p) (= (car p) (cadr p))) two-dice))
(display "P(doubles) = ") (display (uniform-prob doubles two-dice))
Python
# Uniform distribution: count favorable / totalfromitertoolsimport product
two_dice = list(product(range(1,7), repeat=2))
print(f"total outcomes: {len(two_dice)}")
sum_7 = [(a,b) for a,b in two_dice if a+b == 7]
print(f"ways to get 7: {len(sum_7)}")
print(f"P(sum=7) = {len(sum_7)/len(two_dice):.4f}")
doubles = [(a,b) for a,b in two_dice if a == b]
print(f"P(doubles) = {len(doubles)/len(two_dice):.4f}")
Coin flips and repeated trials
Flip a fair coin n times. The sample space has 2^n outcomes, each equally likely. The number of sequences with exactly k heads is "n choose k" (a preview of Chapter 3). For 3 flips, the probability of exactly 2 heads is 3/8.
Scheme
; Coin flips: enumerate all sequences of length n
(define (coin-sequences n)
(if (= n 0) '(())
(let ((rest (coin-sequences (- n 1))))
(append
(map (lambda (s) (cons 'H s)) rest)
(map (lambda (s) (cons 'T s)) rest)))))
(define flips3 (coin-sequences 3))
(display "all 3-flip sequences: ") (display (length flips3)) (newline)
; Count heads in a sequence
(define (count-heads seq)
(length (filter (lambda (x) (eq? x 'H)) seq)))
; P(exactly 2 heads in 3 flips)
(define two-heads (filter (lambda (s) (= 2 (count-heads s))) flips3))
(display "sequences with 2 heads: ") (display two-heads) (newline)
(display "P(2 heads) = ") (display (/ (length two-heads) (length flips3)))