Expected Value and Variance

Grinstead & Snell · GFDL · PDF

Expected value is the weighted average of outcomes: E[X] = sum of x P(x). Variance measures how far outcomes spread from the mean. Linearity of expectation holds even when variables are dependent.

Expected value

The expected value E[X] is the long-run average. Weight each outcome by its probability and add. For a fair die: E[X] = (1 + 2 + 3 + 4 + 5 + 6)/6 = 3.5. You never roll 3.5, but that is what you get on average.

Scheme

; Expected value: E[X] = sum of x * P(x)
; The probability-weighted average of all outcomes

(define (expected-value outcomes probs)
  (if (null? outcomes) 0
      (+ (* (car outcomes) (car probs))
         (expected-value (cdr outcomes) (cdr probs)))))

; Fair die
(define die-vals '(1 2 3 4 5 6))
(define die-probs '(0.1667 0.1667 0.1667 0.1667 0.1667 0.1667))

(display "E[fair die] = ")
(display (expected-value die-vals die-probs)) (newline)

; Loaded die: 4 is twice as likely
(define loaded-probs '(0.143 0.143 0.143 0.286 0.143 0.143))
(display "E[loaded die] = ")
(display (expected-value die-vals loaded-probs))
; Higher because 4 is overweighted

Variance and standard deviation

Variance measures spread: Var(X) = E[(X - mu)^2] = E[X^2] - (E[X])^2. Standard deviation sigma = sqrt(Var(X)) lives in the same units as X. Low variance means the distribution clusters near the mean. High variance means it sprawls.

Scheme

; Variance: Var(X) = E[(X - mu)^2] = E[X^2] - (E[X])^2

(define (expected-value xs ps)
  (if (null? xs) 0
      (+ (* (car xs) (car ps))
         (expected-value (cdr xs) (cdr ps)))))

(define (variance xs ps)
  (let* ((mu (expected-value xs ps))
         (sq-vals (map (lambda (x) (* x x)) xs))
         (e-x-sq (expected-value sq-vals ps)))
    (- e-x-sq (* mu mu))))

; Fair die
(define vals '(1 2 3 4 5 6))
(define probs '(0.1667 0.1667 0.1667 0.1667 0.1667 0.1667))

(define mu (expected-value vals probs))
(define var-x (variance vals probs))

(display "E[X] = ") (display mu) (newline)
(display "Var(X) = ") (display var-x) (newline)
(display "SD(X) = ") (display (sqrt var-x))
; Var ~ 2.917, SD ~ 1.708

Linearity of expectation

The most useful property in probability: E[X + Y] = E[X] + E[Y], always. It does not matter whether X and Y are independent. E[aX + b] = a E[X] + b. This lets you compute expected values of complicated sums by breaking them into simple pieces.

Scheme

; Linearity: E[X + Y] = E[X] + E[Y] — always, even if dependent!

; Example: expected number of matches in a hat problem
; n people throw hats in a pile, each grabs one at random.
; X_i = 1 if person i gets their own hat, 0 otherwise.
; P(X_i = 1) = 1/n. By linearity:
; E[total matches] = E[X_1] + ... + E[X_n] = n * (1/n) = 1

(define (hat-expected n)
  ; Each indicator has E[X_i] = 1/n. Sum n of them.
  (* n (/ 1.0 n)))

(display "E[matches] for 5 people: ") (display (hat-expected 5)) (newline)
(display "E[matches] for 100 people: ") (display (hat-expected 100)) (newline)
(display "E[matches] for 1000000 people: ") (display (hat-expected 1000000))
; Always 1. Linearity makes the hard problem trivial.

Connection: expected value and entropy

Shannon entropy H(X) = -sum P(x) log P(x) is itself an expected value: the average surprise. Baez and Fritz (2011) showed that entropy is the unique functor preserving this expectation structure. Expected value is not just a summary statistic. It is the interface between probability and information.

Scheme

; Shannon entropy is an expected value: E[-log P(x)]
; H(X) = -sum P(x) * log2(P(x))

(define (log2 x) (/ (log x) (log 2)))

(define (entropy probs)
  (let loop ((ps probs) (h 0))
    (if (null? ps) h
        (let ((p (car ps)))
          (if (<= p 0) (loop (cdr ps) h)
              (loop (cdr ps) (- h (* p (log2 p)))))))))

; Fair coin: maximum entropy for 2 outcomes
(display "Fair coin: H = ") (display (entropy '(0.5 0.5))) (newline)
; Biased coin
(display "p=0.9 coin: H = ") (display (entropy '(0.9 0.1))) (newline)
; Fair die
(display "Fair die: H = ") (display (entropy '(0.1667 0.1667 0.1667 0.1667 0.1667 0.1667))) (newline)
; Entropy is the expected surprise per outcome

Notation reference

Notation	Scheme	Meaning
E[X]	(expected-value xs ps)	Expected value: weighted average
Var(X)	(variance xs ps)	Variance: E[X²] - (E[X])²
σ	(sqrt (variance xs ps))	Standard deviation
E[aX + b]	(+ (* a mu) b)	Linearity of expectation
H(X)	(entropy probs)	Shannon entropy: expected surprise

Neighbors

Probability chapters

🎰 Ch 5 — the distributions whose expected values and variances we compute here
🎰 Ch 7 — sums of random variables: linearity in action
🎰 Ch 8 — law of large numbers: sample means converge to E[X]

Paper pages

🍞 Baez & Fritz 2011 — entropy as a functor, the categorical view of expected information

Related foundations

∫ Calculus Ch.7 The Integral — expected value is a weighted integral over a distribution

Foundations (Wikipedia)

Ready for the real thing? Read Grinstead & Snell, Chapter 6.

The Handshake

← Distributions by june.kim Sums of Variables →