Bayes' theorem tells you how to update beliefs when new evidence arrives. Start with a prior (what you believed before), multiply by the likelihood (how probable the evidence is under each hypothesis), and normalize. The posterior is your updated belief. This is the engine underneath every Bayesian model of cognition.
Bayes' theorem
P(H|D) = P(D|H) P(H) / P(D). The prior P(H) encodes what you believed before seeing data. The likelihood P(D|H) says how probable the observed data is if hypothesis H is true. Your updated belief is the posterior P(H|D). The denominator P(D) is a normalizing constant that ensures it sums to one.
Scheme
; Bayes' theorem: P(H|D) = P(D|H) * P(H) / P(D);; Example: a disease test.; Prior: 1% of people have the disease.; Likelihood: test is 90% sensitive (true positive rate).; False positive rate: 5%.
(define p-disease 0.01)
(define p-healthy 0.99)
(define p-positive-given-disease 0.90)
(define p-positive-given-healthy 0.05)
; P(D) = total probability of a positive test
(define p-positive
(+ (* p-positive-given-disease p-disease)
(* p-positive-given-healthy p-healthy)))
; Posterior: P(disease | positive test)
(define p-disease-given-positive
(/ (* p-positive-given-disease p-disease) p-positive))
(display "P(disease | positive test) = ")
(display (exact->inexact p-disease-given-positive))
(newline)
; About 15% — most positive tests are false positives; because the disease is rare (low prior).
(display "P(healthy | positive test) = ")
(display (exact->inexact (- 1 p-disease-given-positive)))
The posterior from one observation becomes the prior for the next. Bayesian inference is naturally sequential: each new piece of evidence sharpens (or broadens) your beliefs. The order of evidence does not matter. Two observations yield the same posterior regardless of which comes first.
A prior is conjugate to a likelihood function when the posterior has the same functional form as the prior. The Beta distribution is conjugate to the Binomial likelihood: if your prior on a coin's bias is Beta(a, b), then after observing h heads and t tails, the posterior is Beta(a+h, b+t). The update is just adding counts to the parameters.
Scheme
; Beta-Binomial conjugacy; Prior: Beta(a, b). After h heads and t tails: Beta(a+h, b+t).; The mean of Beta(a, b) is a/(a+b).
(define (beta-mean a b) (/ a (+ a b)))
; Uniform prior: Beta(1, 1) — mean = 0.5
(define prior-a 1)
(define prior-b 1)
(display "Prior mean: ")
(display (exact->inexact (beta-mean prior-a prior-b)))
(newline)
; Observe 7 heads, 3 tails
(define post-a (+ prior-a 7))
(define post-b (+ prior-b 3))
(display "After 7H 3T: Beta(")
(display post-a) (display ", ") (display post-b) (display ")")
(newline)
(display "Posterior mean: ")
(display (exact->inexact (beta-mean post-a post-b)))
(newline)
; More data: 70 heads, 30 tails total
(define post2-a (+ prior-a 70))
(define post2-b (+ prior-b 30))
(display "After 70H 30T: Beta(")
(display post2-a) (display ", ") (display post2-b) (display ")")
(newline)
(display "Posterior mean: ")
(display (exact->inexact (beta-mean post2-a post2-b)))
; More data = tighter posterior, converging on 0.7
Python
# Beta-Binomial conjugacydef beta_mean(a, b):
return a / (a + b)
a, b = 1, 1# uniform priorprint("Prior mean: " + str(beta_mean(a, b)))
# After 7 heads, 3 tails
a1, b1 = a + 7, b + 3print("After 7H 3T: Beta(" + str(a1) + ", " + str(b1) + ")")
print("Posterior mean: " + "{:.4f}".format(beta_mean(a1, b1)))
# After 70 heads, 30 tails
a2, b2 = a + 70, b + 30print("After 70H 30T: Beta(" + str(a2) + ", " + str(b2) + ")")
print("Posterior mean: " + "{:.4f}".format(beta_mean(a2, b2)))
Fritz 2020 — Markov categories: a categorical foundation for probability
🍞 Ho & Wu 2026 — Bayesian inference as a lens: prior-to-posterior update via optics
Conjugate prior — the mathematical convenience that makes sequential updating tractable
Translation notes
The Lovelace textbook introduces probability from scratch with interactive visualizations. This page assumes basic familiarity with probability and focuses on the Bayesian update machinery that subsequent chapters build on. The textbook also covers joint distributions, marginalization, and independence, which are prerequisites for the graphical models in Chapter 3.