Probability and Bayes

Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home

Bayes' theorem tells you how to update beliefs when new evidence arrives. Start with a prior (what you believed before), multiply by the likelihood (how probable the evidence is under each hypothesis), and normalize. The posterior is your updated belief. This is the engine underneath every Bayesian model of cognition.

Bayes' theorem

P(H|D) = P(D|H) P(H) / P(D). The prior P(H) encodes what you believed before seeing data. The likelihood P(D|H) says how probable the observed data is if hypothesis H is true. Your updated belief is the posterior P(H|D). The denominator P(D) is a normalizing constant that ensures it sums to one.

Scheme

; Bayes' theorem: P(H|D) = P(D|H) * P(H) / P(D)
;
; Example: a disease test.
; Prior: 1% of people have the disease.
; Likelihood: test is 90% sensitive (true positive rate).
; False positive rate: 5%.

(define p-disease 0.01)
(define p-healthy 0.99)
(define p-positive-given-disease 0.90)
(define p-positive-given-healthy 0.05)

; P(D) = total probability of a positive test
(define p-positive
  (+ (* p-positive-given-disease p-disease)
     (* p-positive-given-healthy p-healthy)))

; Posterior: P(disease | positive test)
(define p-disease-given-positive
  (/ (* p-positive-given-disease p-disease) p-positive))

(display "P(disease | positive test) = ")
(display (exact->inexact p-disease-given-positive))
(newline)
; About 15% — most positive tests are false positives
; because the disease is rare (low prior).

(display "P(healthy | positive test) = ")
(display (exact->inexact (- 1 p-disease-given-positive)))

Priors and sequential updating

The posterior from one observation becomes the prior for the next. Bayesian inference is naturally sequential: each new piece of evidence sharpens (or broadens) your beliefs. The order of evidence does not matter. Two observations yield the same posterior regardless of which comes first.

Scheme

; Sequential Bayesian updating
; Coin flipping: is the coin fair (p=0.5) or biased (p=0.8)?
; Prior: 50/50 between fair and biased.

(define (update prior-fair prior-biased observation)
  (let* ((p-obs-fair (if (equal? observation 'heads) 0.5 0.5))
         (p-obs-biased (if (equal? observation 'heads) 0.8 0.2))
         (evidence (+ (* p-obs-fair prior-fair)
                      (* p-obs-biased prior-biased)))
         (post-fair (/ (* p-obs-fair prior-fair) evidence))
         (post-biased (/ (* p-obs-biased prior-biased) evidence)))
    (list post-fair post-biased)))

; Start: 50/50
(define start (list 0.5 0.5))
(display "Start:     ") (display start) (newline)

; Observe heads
(define after-h (update 0.5 0.5 'heads))
(display "After H:   ") (display after-h) (newline)

; Observe another heads
(define after-hh (update (car after-h) (cadr after-h) 'heads))
(display "After HH:  ") (display after-hh) (newline)

; Observe tails
(define after-hht (update (car after-hh) (cadr after-hh) 'tails))
(display "After HHT: ") (display after-hht)
; Tails pulls belief back toward fair

; Sequential Bayesian updating
; Coin flipping: is the coin fair (p=0.5) or biased (p=0.8)?
; Prior: 50/50 between fair and biased.

(define (update prior-fair prior-biased observation)
  (let* ((p-obs-fair (if (equal? observation 'heads) 0.5 0.5))
         (p-obs-biased (if (equal? observation 'heads) 0.8 0.2))
         (evidence (+ (* p-obs-fair prior-fair)
                      (* p-obs-biased prior-biased)))
         (post-fair (/ (* p-obs-fair prior-fair) evidence))
         (post-biased (/ (* p-obs-biased prior-biased) evidence)))
    (list post-fair post-biased)))

; Start: 50/50
(define start (list 0.5 0.5))
(display "Start:     ") (display start) (newline)

; Observe heads
(define after-h (update 0.5 0.5 'heads))
(display "After H:   ") (display after-h) (newline)

; Observe another heads
(define after-hh (update (car after-h) (cadr after-h) 'heads))
(display "After HH:  ") (display after-hh) (newline)

; Observe tails
(define after-hht (update (car after-hh) (cadr after-hh) 'tails))
(display "After HHT: ") (display after-hht)
; Tails pulls belief back toward fair

Conjugate priors

A prior is conjugate to a likelihood function when the posterior has the same functional form as the prior. The Beta distribution is conjugate to the Binomial likelihood: if your prior on a coin's bias is Beta(a, b), then after observing h heads and t tails, the posterior is Beta(a+h, b+t). The update is just adding counts to the parameters.

Scheme

; Beta-Binomial conjugacy
; Prior: Beta(a, b). After h heads and t tails: Beta(a+h, b+t).
; The mean of Beta(a, b) is a/(a+b).

(define (beta-mean a b) (/ a (+ a b)))

; Uniform prior: Beta(1, 1) — mean = 0.5
(define prior-a 1)
(define prior-b 1)
(display "Prior mean: ")
(display (exact->inexact (beta-mean prior-a prior-b)))
(newline)

; Observe 7 heads, 3 tails
(define post-a (+ prior-a 7))
(define post-b (+ prior-b 3))
(display "After 7H 3T: Beta(")
(display post-a) (display ", ") (display post-b) (display ")")
(newline)
(display "Posterior mean: ")
(display (exact->inexact (beta-mean post-a post-b)))
(newline)

; More data: 70 heads, 30 tails total
(define post2-a (+ prior-a 70))
(define post2-b (+ prior-b 30))
(display "After 70H 30T: Beta(")
(display post2-a) (display ", ") (display post2-b) (display ")")
(newline)
(display "Posterior mean: ")
(display (exact->inexact (beta-mean post2-a post2-b)))
; More data = tighter posterior, converging on 0.7

Notation reference

Symbol	Scheme	Meaning
P(H)	p-hypothesis	Prior probability
P(D\|H)	p-data-given-h	Likelihood
P(H\|D)	p-h-given-data	Posterior probability
P(D)	p-evidence	Marginal likelihood (normalizer)
Beta(a, b)	(beta-mean a b)	Conjugate prior for coin bias

Neighbors

Grinstead Ch.4 — Bayes' theorem from first principles
Fritz 2020 — Markov categories: a categorical foundation for probability
🍞 Ho & Wu 2026 — Bayesian inference as a lens: prior-to-posterior update via optics
Conjugate prior — the mathematical convenience that makes sequential updating tractable

Translation notes

The Lovelace textbook introduces probability from scratch with interactive visualizations. This page assumes basic familiarity with probability and focuses on the Bayesian update machinery that subsequent chapters build on. The textbook also covers joint distributions, marginalization, and independence, which are prerequisites for the graphical models in Chapter 3.

Read the original: Lovelace, Chapter 2.

← What is Computational Cognitive Science? by june.kim Bayesian Models of Cognition →