Compositional Imprecise Probability

Liell-Cock, Staton · POPL 2025 · arxiv arXiv:2405.09391

Prereqs: 🍞 Fritz 2020 (Markov categories). 5 min.

When you don't know the exact probability — only that it lies in some set of distributions — you have imprecise probability. Liell-Cock shows these form a Markov category (nLab) via the Para construction, and that graded monads organize the imprecision levels.

Imprecise probability — sets of distributions

A precise Markov kernel maps an input to one distribution. An imprecise kernel maps to a set of distributions — a credal set. You know the output is distributed according to one of these distributions, but you don't know which one. This is stronger than a single distribution (you commit less) and weaker than full ignorance (you commit something).

Scheme

; Imprecise probability: a set of possible distributions
; Credal set = { d1, d2, ... } where each di is a distribution

(define (make-credal . dists) dists)

; Precise: exactly one distribution
(define precise
  (make-credal '((heads . 0.5) (tails . 0.5))))

; Imprecise: coin might be fair or biased
(define imprecise
  (make-credal
    '((heads . 0.5) (tails . 0.5))
    '((heads . 0.6) (tails . 0.4))
    '((heads . 0.4) (tails . 0.6))))

(display "Precise: ") (display (length precise)) (display " distribution") (newline)
(display "Imprecise: ") (display (length imprecise)) (display " distributions") (newline)
; Imprecise = we don't know which distribution governs the coin

The Para construction

The Para construction takes a Markov category C and builds a new category Para(C) where morphisms have a hidden parameter. A morphism in Para(C) from X to Y is a morphism P × X → Y in C, where P is the parameter space. Imprecise kernels arise when you existentially quantify over the parameter: "there exists some p in P such that..."

Scheme

; Para construction: morph with hidden parameter
; Para(f)(x) = f(p, x) for some parameter p
; The parameter p is hidden — you see x -> dist, not (p, x) -> dist

(define (make-para param-space kernel)
  (list param-space kernel))

; A kernel that depends on bias parameter
(define biased-coin
  (make-para
    '(0.3 0.5 0.7)  ; parameter space: possible biases
    (lambda (bias x)
      (list (cons 'heads bias) (cons 'tails (- 1.0 bias))))))

; Each parameter gives a different precise kernel
(for-each (lambda (p)
  (display "bias=") (display p) (display ": ")
  (display ((cadr biased-coin) p 'flip)) (newline))
(car biased-coin))

; The credal set = image of the kernel over all parameters
(display "Credal set has ") (display (length (car biased-coin)))
(display " distributions")

Confidence: Simplified. Real Para construction works over arbitrary Markov categories with tensor products for parameters. Same idea: hidden parameter generates a family.

Lower and upper expectations

With a credal set, you compute bounds on expectations. The lower expectation is the minimum expected value over all distributions in the set. The upper expectation is the maximum. These brackets tell you the range of possible answers — how uncertain you are.

Scheme

; Lower and upper expectations from a credal set
; E_lower[f] = min over distributions, E_upper[f] = max

(define (expected-value dist f)
  (apply + (map (lambda (p) (* (f (car p)) (cdr p))) dist)))

(define (lower-exp credal f)
  (apply min (map (lambda (d) (expected-value d f)) credal)))

(define (upper-exp credal f)
  (apply max (map (lambda (d) (expected-value d f)) credal)))

; Credal set: coin with unknown bias
(define credal
  (list '((heads . 0.3) (tails . 0.7))
        '((heads . 0.5) (tails . 0.5))
        '((heads . 0.7) (tails . 0.3))))

; Payoff: heads=10, tails=0
(define payoff (lambda (x) (if (eq? x 'heads) 10 0)))

(display "Lower E[payoff]: ") (display (lower-exp credal payoff)) (newline)
(display "Upper E[payoff]: ") (display (upper-exp credal payoff)) (newline)
; Bracket: [3.0, 7.0] — imprecision shows as a gap

Graded monads — imprecision levels

Different amounts of imprecision form a graded monad. The grade tracks how much imprecision you have. Grade 0 = precise (one distribution). Higher grades = wider credal sets. Composition accumulates imprecision — the grade adds up, just like effect grades in 🍞 Gaboardi 2021.

Scheme

; Graded imprecision: grade = width of credal set
; Higher grade = more imprecision

(define (make-graded-credal center radius steps)
  ; Generate distributions around center within radius
  (let ((lo (max 0.01 (- center radius)))
        (hi (min 0.99 (+ center radius))))
    (let loop ((b lo) (acc '()))
      (if (> b (+ hi 0.001)) acc
          (loop (+ b (/ (- hi lo) (max 1 (- steps 1))))
                (cons (list (cons 'H b) (cons 'T (- 1.0 b))) acc))))))

; Grade 0: precise
(define g0 (make-graded-credal 0.5 0.0 1))
; Grade 1: small imprecision
(define g1 (make-graded-credal 0.5 0.1 3))
; Grade 2: large imprecision
(define g2 (make-graded-credal 0.5 0.3 5))

(display "grade 0: ") (display (length g0)) (display " dist") (newline)
(display "grade 1: ") (display (length g1)) (display " dists") (newline)
(display "grade 2: ") (display (length g2)) (display " dists") (newline)
; Higher grade = wider credal set = more imprecision

The bridge — graded monads = Markov categories

Liell-Cock's main result: the Kleisli category of a graded monad on a Markov category is itself a Markov category. Imprecise probability inherits all the structure — copy, discard, composition — from the precise case. You can do everything Fritz does in FinStoch, but with credal sets instead of single distributions.

Scheme

; Kleisli composition of imprecise kernels
; Each kernel maps input to a credal set
; Composition: for each (d1, d2), marginalize

(define (imprecise-compose f g)
  ; f(x) = credal set, g(y) = credal set
  ; composed(x) = union of all marginalizations
  (lambda (x)
    (apply append
      (map (lambda (d-f)
        (apply append
          (map (lambda (pair-f)
            (let ((y (car pair-f)) (py (cdr pair-f)))
              (map (lambda (d-g)
                (map (lambda (pair-g)
                  (cons (car pair-g) (* py (cdr pair-g))))
                d-g))
              (g y))))
          d-f)))
      (f x)))))

; f: imprecise coin
(define (f x) (list '((a . 0.6) (b . 0.4))
                     '((a . 0.4) (b . 0.6))))
; g: precise doubler
(define (g y) (list (list (cons (string-append (symbol->string y) (symbol->string y)) 1.0))))

(display "f(x): ") (display (length (f 'start))) (display " dists") (newline)
(display "Composition preserves Markov structure")
; Imprecise kernels compose — credal sets stay credal sets

; Kleisli composition of imprecise kernels
; Each kernel maps input to a credal set
; Composition: for each (d1, d2), marginalize

(define (imprecise-compose f g)
  ; f(x) = credal set, g(y) = credal set
  ; composed(x) = union of all marginalizations
  (lambda (x)
    (apply append
      (map (lambda (d-f)
        (apply append
          (map (lambda (pair-f)
            (let ((y (car pair-f)) (py (cdr pair-f)))
              (map (lambda (d-g)
                (map (lambda (pair-g)
                  (cons (car pair-g) (* py (cdr pair-g))))
                d-g))
              (g y))))
          d-f)))
      (f x)))))

; f: imprecise coin
(define (f x) (list '((a . 0.6) (b . 0.4))
                     '((a . 0.4) (b . 0.6))))
; g: precise doubler
(define (g y) (list (list (cons (string-append (symbol->string y) (symbol->string y)) 1.0))))

(display "f(x): ") (display (length (f 'start))) (display " dists") (newline)
(display "Composition preserves Markov structure")
; Imprecise kernels compose — credal sets stay credal sets

Confidence: Simplified. Real composition requires careful treatment of the parameter space tensor. Same compositional structure.

Notation reference

Paper	Scheme	Meaning
Para(C)	(make-para params kernel)	Para construction
credal set	(make-credal d1 d2 ...)	Set of distributions
E̲[f]	(lower-exp credal f)	Lower expectation
E̅[f]	(upper-exp credal f)	Upper expectation
T_e	# credal set at grade e	Graded monad at imprecision level e

Neighbors

Other paper pages

🍞 Fritz 2020 — the Markov category that Para generalizes
🍞 Gaboardi 2021 — graded monads for effects (same grading idea)
🍞 Fritz, Perrone 2021 — support is the maximally imprecise shadow
🍞 Capucci 2021 — Para construction appears in cybernetics too

Foundations (Wikipedia)

Translation notes

All examples use finite lists of finite distributions as credal sets. The paper works with convex sets of probability measures over measurable spaces, the Para construction on arbitrary Markov categories, and graded monads indexed by a quantale of imprecision levels. For example: the lower/upper expectation example on this page iterates over three coin distributions. In the paper, the same construction applies to credal sets of Gaussian measures over ℝⁿ — where the lower expectation is an optimization problem over an infinite-dimensional convex set. The bracket structure (min/max over the set) is identical. The measure theory and optimization are not.

Ready for the real thing? arxiv

Read the paper. Start at §3 for the Para construction, §5 for graded monads and Markov categories.

Framework connection: Imprecise probability via the Para construction generalizes the Natural Framework's ambient category from precise to credal-set-valued stages. (Ambient Category)

← Baez, Fritz 2011 · 8 of 21 by june.kim Sato 2023 · 10 of 21 →