Bayesian Models of Cognition

Lovelace textbook · CC BY-SA 4.0 · computationalcognitivescience.github.io/lovelace/home

Concept learning is hypothesis testing. Given a few examples, learners infer which concept generated them by computing a posterior over a structured hypothesis space. The size principle favors smaller, tighter hypotheses: a concept that could generate fewer examples gets more credit for generating the ones you saw. Abstract knowledge helps rather than hurts. This is the blessing of abstraction.

Concept learning as hypothesis testing

You see the numbers 2, 4, 8. What is the rule? "Powers of two" is a tighter hypothesis than "even numbers," which is tighter than "all numbers." Bayesian inference naturally favors the tightest hypothesis consistent with the data, because the likelihood of generating exactly those examples is higher under a smaller set.

Scheme

; Concept learning: which hypothesis generated {2, 4, 8}?
; Size principle: P(data|H) = (1/|H|)^n for n examples.

(define (size-principle hypothesis-size n-examples)
  (expt (/ 1.0 hypothesis-size) n-examples))

; Hypothesis sizes (how many numbers each contains, up to 100)
(define powers-of-2-size 7)    ; 1,2,4,8,16,32,64
(define even-numbers-size 50)  ; 2,4,6,...,100
(define all-numbers-size 100)  ; 1,2,...,100

; 3 examples observed: {2, 4, 8}
(define n 3)

(define lik-powers (size-principle powers-of-2-size n))
(define lik-evens  (size-principle even-numbers-size n))
(define lik-all    (size-principle all-numbers-size n))

(display "Likelihood (powers of 2): ")
(display lik-powers) (newline)
(display "Likelihood (even numbers): ")
(display lik-evens) (newline)
(display "Likelihood (all numbers):  ")
(display lik-all) (newline)

; With uniform priors, posterior is proportional to likelihood
(let ((total (+ lik-powers lik-evens lik-all)))
  (display "Posterior (powers of 2): ")
  (display (/ lik-powers total)) (newline)
  (display "Posterior (even):        ")
  (display (/ lik-evens total)) (newline)
  (display "Posterior (all):         ")
  (display (/ lik-all total)))

; Concept learning: which hypothesis generated {2, 4, 8}?
; Size principle: P(data|H) = (1/|H|)^n for n examples.

(define (size-principle hypothesis-size n-examples)
  (expt (/ 1.0 hypothesis-size) n-examples))

; Hypothesis sizes (how many numbers each contains, up to 100)
(define powers-of-2-size 7)    ; 1,2,4,8,16,32,64
(define even-numbers-size 50)  ; 2,4,6,...,100
(define all-numbers-size 100)  ; 1,2,...,100

; 3 examples observed: {2, 4, 8}
(define n 3)

(define lik-powers (size-principle powers-of-2-size n))
(define lik-evens  (size-principle even-numbers-size n))
(define lik-all    (size-principle all-numbers-size n))

(display "Likelihood (powers of 2): ")
(display lik-powers) (newline)
(display "Likelihood (even numbers): ")
(display lik-evens) (newline)
(display "Likelihood (all numbers):  ")
(display lik-all) (newline)

; With uniform priors, posterior is proportional to likelihood
(let ((total (+ lik-powers lik-evens lik-all)))
  (display "Posterior (powers of 2): ")
  (display (/ lik-powers total)) (newline)
  (display "Posterior (even):        ")
  (display (/ lik-evens total)) (newline)
  (display "Posterior (all):         ")
  (display (/ lik-all total)))

Causal reasoning

Bayesian models extend to causal reasoning. Given a causal graph (A causes B, B causes C), you can infer causes from effects by inverting the generative model with Bayes' theorem. Observing wet grass, you infer rain is more likely. Observing that the sprinkler is on reduces the evidence for rain. This "explaining away" falls out naturally from the posterior computation.

Scheme

; Causal reasoning: rain vs sprinkler explaining wet grass.
; Generative model:
;   P(rain) = 0.2
;   P(sprinkler | no rain) = 0.4, P(sprinkler | rain) = 0.01
;   P(wet | rain, sprinkler) = 0.99
;   P(wet | rain, no sprinkler) = 0.8
;   P(wet | no rain, sprinkler) = 0.9
;   P(wet | no rain, no sprinkler) = 0.0

(define (wet-grass-model)
  (let* ((p-rain 0.2)
         (p-no-rain 0.8)
         ; Enumerate all four combinations
         ; (rain, sprinkler), (rain, no-spr), (no-rain, spr), (no-rain, no-spr)
         (j1 (* p-rain 0.01 0.99))        ; rain + sprinkler + wet
         (j2 (* p-rain 0.99 0.8))         ; rain + no-sprinkler + wet
         (j3 (* p-no-rain 0.4 0.9))       ; no-rain + sprinkler + wet
         (j4 (* p-no-rain 0.6 0.0))       ; no-rain + no-sprinkler + wet
         (p-wet (+ j1 j2 j3 j4)))
    (display "P(rain | wet grass) = ")
    (display (/ (+ j1 j2) p-wet))
    (newline)
    ; Now condition on sprinkler ON and wet:
    (let ((p-wet-spr (+ j1 j3)))
      (display "P(rain | wet, sprinkler on) = ")
      (display (/ j1 p-wet-spr)))
    ; Rain becomes less likely once sprinkler explains the wetness
    ))

(wet-grass-model)

; Causal reasoning: rain vs sprinkler explaining wet grass.
; Generative model:
;   P(rain) = 0.2
;   P(sprinkler | no rain) = 0.4, P(sprinkler | rain) = 0.01
;   P(wet | rain, sprinkler) = 0.99
;   P(wet | rain, no sprinkler) = 0.8
;   P(wet | no rain, sprinkler) = 0.9
;   P(wet | no rain, no sprinkler) = 0.0

(define (wet-grass-model)
  (let* ((p-rain 0.2)
         (p-no-rain 0.8)
         ; Enumerate all four combinations
         ; (rain, sprinkler), (rain, no-spr), (no-rain, spr), (no-rain, no-spr)
         (j1 (* p-rain 0.01 0.99))        ; rain + sprinkler + wet
         (j2 (* p-rain 0.99 0.8))         ; rain + no-sprinkler + wet
         (j3 (* p-no-rain 0.4 0.9))       ; no-rain + sprinkler + wet
         (j4 (* p-no-rain 0.6 0.0))       ; no-rain + no-sprinkler + wet
         (p-wet (+ j1 j2 j3 j4)))
    (display "P(rain | wet grass) = ")
    (display (/ (+ j1 j2) p-wet))
    (newline)
    ; Now condition on sprinkler ON and wet:
    (let ((p-wet-spr (+ j1 j3)))
      (display "P(rain | wet, sprinkler on) = ")
      (display (/ j1 p-wet-spr)))
    ; Rain becomes less likely once sprinkler explains the wetness
    ))

(wet-grass-model)

The blessing of abstraction

Hierarchical Bayesian models learn at multiple levels simultaneously. Abstract knowledge (e.g., "animals in this ecosystem tend to be small") constrains lower-level inference (e.g., "this new species is probably small too"). More abstract hypotheses are learnable from fewer examples because they constrain many lower-level hypotheses at once. Abstraction does not cost you data efficiency. It buys you data efficiency.

Scheme

; Blessing of abstraction: abstract prior helps learn specifics.
; Scenario: you visit two islands. Island 1 has small animals.
; Island 2 has small animals. Now you arrive at Island 3.
; Abstract hypothesis: "islands in this chain have small animals."

; Without abstraction: uniform prior on Island 3 animal size
(define (uniform-prior) 0.5)

; With abstraction: prior shaped by islands 1 and 2
; After seeing 2 islands with small animals:
; Beta(1+2, 1+0) = Beta(3, 1), mean = 0.75
(define (abstract-prior) 0.75)

(display "P(small animals on Island 3)") (newline)
(display "  Without abstraction: ")
(display (uniform-prior)) (newline)
(display "  With abstraction:    ")
(display (abstract-prior)) (newline)
(display "Abstraction makes prediction before seeing ANY Island 3 data.")

Notation reference

Term	Meaning
Size principle	P(data\|H) = (1/\|H\|)^n; smaller hypotheses get more credit
Explaining away	Observing one cause reduces the posterior of competing causes
Hierarchical Bayes	Priors at one level are learned from data at another
Blessing of abstraction	Abstract knowledge helps rather than hurts data efficiency

Neighbors

Lovelace Ch.2 — the Bayesian machinery this chapter applies
Parzygnat 2020 — Bayesian inversion as a categorical construction
Bayesian cognition — the broader research program

Translation notes

The Lovelace textbook walks through Tenenbaum's number game in detail and includes interactive sliders for hypothesis spaces. This page extracts the core principles: the size principle as the source of Bayesian Occam's razor, causal reasoning as posterior inference over generative models, and the blessing of abstraction as a scaling argument for hierarchical models. The textbook also covers iterated learning and cultural transmission, which connect to Chapter 7.

Read the original: Lovelace, Chapter 3.

← Probability and Bayes by june.kim Neural Networks · 4 of 8 →