Expected value is the weighted average of outcomes: E[X] = sum of x P(x). Variance measures how far outcomes spread from the mean. Linearity of expectation holds even when variables are dependent.
Expected value
The expected value E[X] is the long-run average. Weight each outcome by its probability and add. For a fair die: E[X] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. You never roll 3.5, but that is what you get on average.
Scheme
; Expected value: E[X] = sum of x * P(x); The probability-weighted average of all outcomes
(define (expected-value outcomes probs)
(if (null? outcomes) 0
(+ (* (car outcomes) (car probs))
(expected-value (cdr outcomes) (cdr probs)))))
; Fair die
(define die-vals '(123456))
(define die-probs '(0.16670.16670.16670.16670.16670.1667))
(display "E[fair die] = ")
(display (expected-value die-vals die-probs)) (newline)
; Loaded die: 4 is twice as likely
(define loaded-probs '(0.1430.1430.1430.2860.1430.143))
(display "E[loaded die] = ")
(display (expected-value die-vals loaded-probs))
; Higher because 4 is overweighted
Variance and standard deviation
Variance measures spread: Var(X) = E[(X - mu)^2] = E[X^2] - (E[X])^2. Standard deviation sigma = sqrt(Var(X)) lives in the same units as X. Low variance means the distribution clusters near the mean. High variance means it sprawls.
# Expected value and varianceimportmathdef expected_value(vals, probs):
returnsum(x * p for x, p inzip(vals, probs))
def variance(vals, probs):
mu = expected_value(vals, probs)
return expected_value([x**2for x in vals], probs) - mu**2
vals = [1, 2, 3, 4, 5, 6]
probs = [1/6] * 6
mu = expected_value(vals, probs)
var = variance(vals, probs)
print(f"E[X] = {mu:.4f}")
print(f"Var(X) = {var:.4f}")
print(f"SD(X) = {math.sqrt(var):.4f}")
Linearity of expectation
The most useful property in probability: E[X + Y] = E[X] + E[Y], always. It does not matter whether X and Y are independent. E[aX + b] = a E[X] + b. This lets you compute expected values of complicated sums by breaking them into simple pieces.
Scheme
; Linearity: E[X + Y] = E[X] + E[Y] — always, even if dependent!; Example: expected number of matches in a hat problem; n people throw hats in a pile, each grabs one at random.; X_i = 1 if person i gets their own hat, 0 otherwise.; P(X_i = 1) = 1/n. By linearity:; E[total matches] = E[X_1] + ... + E[X_n] = n * (1/n) = 1
(define (hat-expected n)
; Each indicator has E[X_i] = 1/n. Sum n of them.
(* n (/ 1.0 n)))
(display "E[matches] for 5 people: ") (display (hat-expected 5)) (newline)
(display "E[matches] for 100 people: ") (display (hat-expected 100)) (newline)
(display "E[matches] for 1000000 people: ") (display (hat-expected 1000000))
; Always 1. Linearity makes the hard problem trivial.
Connection: expected value and entropy
Shannon entropy H(X) = -sum P(x) log P(x) is itself an expected value: the average surprise. Baez and Fritz (2011) showed that entropy is the unique functor preserving this expectation structure. Expected value is not just a summary statistic. It is the interface between probability and information.
Scheme
; Shannon entropy is an expected value: E[-log P(x)]; H(X) = -sum P(x) * log2(P(x))
(define (log2 x) (/ (log x) (log 2)))
(define (entropy probs)
(let loop ((ps probs) (h 0))
(if (null? ps) h
(let ((p (car ps)))
(if (<= p 0) (loop (cdr ps) h)
(loop (cdr ps) (- h (* p (log2 p)))))))))
; Fair coin: maximum entropy for 2 outcomes
(display "Fair coin: H = ") (display (entropy '(0.50.5))) (newline)
; Biased coin
(display "p=0.9 coin: H = ") (display (entropy '(0.90.1))) (newline)
; Fair die
(display "Fair die: H = ") (display (entropy '(0.16670.16670.16670.16670.16670.1667))) (newline)
; Entropy is the expected surprise per outcome
Notation reference
Notation
Scheme
Meaning
E[X]
(expected-value xs ps)
Expected value: weighted average
Var(X)
(variance xs ps)
Variance: E[X²] - (E[X])²
σ
(sqrt (variance xs ps))
Standard deviation
E[aX + b]
(+ (* a mu) b)
Linearity of expectation
H(X)
(entropy probs)
Shannon entropy: expected surprise
Neighbors
Probability chapters
🎰 Ch 5 — the distributions whose expected values and variances we compute here
🎰 Ch 7 — sums of random variables: linearity in action
🎰 Ch 8 — law of large numbers: sample means converge to E[X]
Paper pages
🍞 Baez & Fritz 2011 — entropy as a functor, the categorical view of expected information