Divergences on Monads for Relational Program Logics

Sato, Katsumata · MSCS 2023 · arxiv arXiv:2206.05716

Assumes you've seen 🍞 Markov categories and 🍞 Hoare triples. 5 minutes each.

The distance between two programs IS the enrichment on the category. Divergences on a monad give you a metric on morphism spaces — and that metric powers relational program logics.

Divergences as distance

A divergence measures how different two distributions are. KL divergence, total variation, Rényi divergence — all are ways to ask "how far apart are these?" Sato and Katsumata showed that putting a divergence on a monad's outputs gives you a metric on the entire Kleisli category. The distance between two programs is the worst-case divergence of their outputs.

Scheme

; KL divergence: D(p || q) = Σ p(x) · log(p(x)/q(x))
(define (log2 x) (/ (log x) (log 2)))

(define (kl-divergence p q)
  (apply + (map (lambda (pi qi)
    (if (> pi 0)
      (* pi (log2 (/ pi qi)))
      0))
  p q)))

; Two distributions over {A, B, C}
(define p '(0.5 0.3 0.2))
(define q '(0.33 0.33 0.34))

(display "D(p || q) = ")
(display (kl-divergence p q))
(newline)
; Measures how "surprised" you'd be seeing q when expecting p

Python

import math

def kl(p, q):
    return sum(pi * math.log2(pi/qi) for pi, qi in zip(p, q) if pi > 0)

p = [0.5, 0.3, 0.2]
q = [0.33, 0.33, 0.34]
print(f"D(p || q) = {kl(p, q):.4f} bits")

Enrichment — metric on morphisms

The insight: a divergence on the monad lifts to a divergence on Kleisli arrows. Two stochastic functions f, g : X → D(Y) are "close" if their output distributions are close for every input. This turns the hom-sets of the Kleisli category into metric spaces.

A non-expanding morphism doesn't increase distance. That's the data processing inequality for metrics — processing can't make programs more distinguishable than they already are.

Scheme

; Distance between two morphisms:
; d(f, g) = max over inputs x of D(f(x) || g(x))

(define (log2 x) (/ (log x) (log 2)))

(define (kl-divergence p q)
  (apply + (map (lambda (pi qi)
    (if (> pi 0) (* pi (log2 (/ pi qi))) 0))
  p q)))

; Two stochastic functions: X -> D(Y)
; f: biased toward first outcome
; g: more uniform
(define (f x) '(0.7 0.2 0.1))
(define (g x) '(0.4 0.3 0.3))

(display "d(f, g) at x=0: ")
(display (kl-divergence (f 0) (g 0)))
; This distance enriches the hom-set Hom(X, Y)

Python

import math

def kl(p, q):
    return sum(pi * math.log2(pi/qi) for pi, qi in zip(p, q) if pi > 0)

def f(x): return [0.7, 0.2, 0.1]
def g(x): return [0.4, 0.3, 0.3]

# Distance between morphisms = worst-case divergence
inputs = range(5)
d = max(kl(f(x), g(x)) for x in inputs)
print(f"d(f, g) = {d:.4f} bits")

Relational program logics

With a metric on morphisms, you can state relational properties: "these two programs produce outputs within ε of each other." That's a relational Hoare triple — instead of {P} c {Q}, you get {P} c₁ ~ c₂ {Q} where Q talks about the *relationship* between the two outputs. Divergences on monads give the semantic foundation for these relational triples.

Applications: differential privacy (programs are "close" if adding one record changes the output by at most ε), cost analysis (programs are "close" if their resource usage differs by at most k), and probabilistic coupling (two distributions are close if they can be jointly sampled to agree often).

Scheme

; Relational property: f and g are ε-close
; "For all inputs, their outputs diverge by at most ε"

(define (log2 x) (/ (log x) (log 2)))

(define (kl-divergence p q)
  (apply + (map (lambda (pi qi)
    (if (> pi 0) (* pi (log2 (/ pi qi))) 0))
  p q)))

(define epsilon 0.5)

(define (f x) '(0.5 0.3 0.2))
(define (g x) '(0.45 0.35 0.2))

(define dist (kl-divergence (f 0) (g 0)))
(display "D(f(0) || g(0)) = ") (display dist) (newline)
(display "ε-close? ") (display (<= dist epsilon))
; #t — these programs are within ε of each other

Python

import math

def kl(p, q):
    return sum(pi * math.log2(pi/qi) for pi, qi in zip(p, q) if pi > 0)

epsilon = 0.5

def f(x): return [0.5, 0.3, 0.2]
def g(x): return [0.45, 0.35, 0.2]

dist = kl(f(0), g(0))
print(f"D(f(0) || g(0)) = {dist:.4f} bits")
print(f"ε = {epsilon}")
print(f"ε-close? {dist <= epsilon}")
# These programs are within ε of each other

Notation reference

Paper	Python	Meaning
D(p ‖ q)	kl(p, q)	KL divergence between distributions
d(f, g)	max(kl(f(x), g(x)) for x)	Distance between morphisms
{P} c₁ ~ c₂ {Q}	# relational triple	Two programs satisfy a relation
V-enriched	# metric on hom-sets	Hom-sets carry distance structure

Neighbors

Other paper pages

🍞 Fritz 2020 — the Markov category that gets enriched
🍞 Staton 2025 — non-relational Hoare logic (single program)
🍞 Baez, Fritz, Leinster 2011 — entropy as a special case of divergence
🍞 Gaboardi 2021 — graded Hoare logic (tracks effects including cost)

Foundations (Wikipedia)

Translation notes

All examples use KL divergence over finite discrete distributions. The paper works with arbitrary divergences (Rényi, total variation, hockey-stick) over arbitrary monads on arbitrary base categories. For example: the ε-closeness test on this page checks KL divergence of two functions over a finite set. In the paper, the same construction applies to differential privacy of database queries — the monad is the randomized response mechanism, the divergence is the hockey-stick divergence, and ε-closeness is the privacy guarantee. The structure (divergence enriches the Kleisli category) is identical. The monad and divergence are not.

Every example is Simplified.

Ready for the real thing? arxiv

Read the paper. Start at §3 for divergences on monads, §5 for the connection to relational program logics.

Framework connection: The Natural Framework's NonExpanding typeclass captures exactly this — pipeline stages must be non-expanding morphisms in the divergence-enriched category. (The Natural Framework)

← Liell-Cock 2025 · 9 of 21 by june.kim Spivak 2013 · 11 of 21 →