Diff

Chapter 4 · Part II begins · Calcagno 2009, O'Hearn 2019, Arjovsky 2019, Rubin 1915

Chapter 3 showed the same abstract schema under eight names. This chapter defines the primitive: snapshot before, snapshot after, XOR. What flipped is figure; what held is ground. Diff is the contrast that abduction operates over.

The mechanic

A mechanic taps the alternator and the engine stalls. Two hypotheses fire: test the battery, test the voltage regulator. Nobody taught her a hypothesis-generation algorithm. The shape of the failure named the next experiment.

Where did the hypotheses come from? She compared two states: engine running, engine stalled. She noted what changed (alternator tapped) and what stayed the same (fuel, spark, coolant). The change pointed at the electrical subsystem. The hypotheses followed from the diff.

Diff is computable. The mechanic performed it in her head. OBD-II performs it in silicon. Facebook Infer performs it on code. The representation changes; the operation stays the same.

The primitive

Diff is the substrate of abduction, not abduction entire. It produces the contrast that abduction operates over:

Given expected state E, observed state O, and background frame B, partition what differs from what remains invariant.

Abduction then uses that partition to generate candidate causes H: minimal revisions to B that would make O unsurprising. The partition itself is the primitive. All eight names in Chapter 3 instantiate it. The simplest encoding: take two snapshots of a system's state, one before an event and one after. The comparison partitions state into two sets:

Figure — what changed. Variables whose values differ between snapshots. In Gestalt terms, what pops out against a stable background.

Ground — what held. Variables whose values stayed the same. The context that remained invariant while the figure shifted. In separation logic, the frame that bi-abduction infers.

Formally:

diff(state_before, state_after) → (figure, ground)

The partition is symmetric: it doesn't matter which snapshot is “first.” But use is asymmetric. The before-state is the baseline; the after-state is the perturbation. The diff tells you what the perturbation touched.

Caution: diff gives candidates, not causes. A changed variable may be cause, effect, symptom, intervention, or coincident noise. The mechanic's diff puts both alternator tapped (intervention) and engine stalled (effect) in the figure. The diff alone does not know which is which. Separating cause from effect requires a dependency graph or an experiment. The diff just names what changed.

The XOR

The simplest instantiation: state is a set of key-value pairs. The figure is every key whose value changed. The ground is every key whose value didn't.

Python

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Compute figure (what changed) and ground (what held).

    Returns (figure, ground).
    figure: keys present in both with different values,
            plus keys added or removed.
    ground: keys present in both with identical values.
    """
    all_keys = set(before) | set(after)
    figure = {}
    ground = {}

    for key in all_keys:
        b = before.get(key)
        a = after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)   # (was, now)

    return figure, ground

print("diff() defined")

Apply it to the mechanic's engine:

Python

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

before = {
    "engine":     "running",
    "alternator": "untouched",
    "battery":    "12.6V",
    "fuel_pump":  "on",
    "spark":      "firing",
    "coolant":    "92C",
}

after = {
    "engine":     "stalled",
    "alternator": "tapped",
    "battery":    "12.6V",
    "fuel_pump":  "on",
    "spark":      "firing",
    "coolant":    "92C",
}

figure, ground = diff(before, after)

print("FIGURE (what changed):")
for k, (was, now) in figure.items():
    print(f"  {k}: {was} -> {now}")

print("\nGROUND (what held):")
for k, v in ground.items():
    print(f"  {k}: {v}")

Two figure entries: the alternator (input) and the engine state (output). Four ground entries: battery, fuel pump, spark, coolant, all unchanged.

The hypotheses follow from the figure. The alternator was tapped, the engine stalled. What connects them? The electrical subsystem. Battery supplies power; voltage regulator manages alternator output. Both sit upstream of the engine and both fit a tap-induced failure. The ground tells the mechanic what to deprioritize: fuel-pump and spark are unchanged (though that doesn't eliminate them unless the measurement set is complete).

This is the contrast primitive. Hypothesis generation, ranking, and testing build on top of it.

Degrees of freedom

The diff above is the minimal case: one before, one after, one partition. The variants in the literature add degrees of freedom to this primitive:

Variant	Inputs	Outputs	What it adds
Unary diff	One before, one after	Figure + ground	The primitive. This chapter.
Bi-abduction	Partial before, partial after	Inferred frame + inferred anti-frame	Infers the ground autonomously. Ch 5.
Incorrectness	One before, one after	Under-approximation of bugs	Flip polarity: attend to failure, not success.
Tri-abduction	Fork: shared start, two branches	Causal edge (what the branch changed)	Diff across branches, not just time. Ch 6.

Each step adds an operand. One snapshot pair gives one frame. Two pairs (actual and counterfactual) give one causal edge. N pairs across N branches give a typed subgraph. The pattern stays diff; the arity grows.

Three witnesses

Three systems, three decades, three fields. Each encodes the diff. They are rarely presented as instances of one operation.

OBD-II (1996) — hardcoded diff

OBD-II reads sensor states, diffs against expected values, and generates fault codes. (Real OBD-II adds thresholds, monitors, and enable conditions; "hardcoded diff" is a simplification of the core logic.) Vehicles have run this since 1996.

But OBD-II is hardcoded. The fault tree is hand-authored, the hypotheses enumerated in advance. An engineer decided which deviations map to which codes. If a failure mode wasn't anticipated, nothing fires. The primitive works; it just runs on a fixed table.

Python

# OBD-II style: hardcoded fault table
FAULT_TABLE = {
    "alternator_voltage": {
        "low":  ["P0562 - System Voltage Low",
                 "Check battery", "Check voltage regulator"],
        "high": ["P0563 - System Voltage High",
                 "Check voltage regulator", "Check wiring"],
    },
    "coolant_temp": {
        "high": ["P0217 - Engine Overtemp",
                 "Check thermostat", "Check coolant level"],
    },
}

def obd_diff(expected: dict, observed: dict) -> list[str]:
    """Hardcoded diff: look up deviations in the fault table."""
    codes = []
    for sensor, exp_val in expected.items():
        obs_val = observed.get(sensor)
        if obs_val != exp_val and sensor in FAULT_TABLE:
            if obs_val in FAULT_TABLE[sensor]:
                codes.extend(FAULT_TABLE[sensor][obs_val])
    return codes

expected = {"alternator_voltage": "normal", "coolant_temp": "normal"}
observed = {"alternator_voltage": "low",    "coolant_temp": "normal"}

for code in obd_diff(expected, observed):
    print(code)

# OBD-II style: hardcoded fault table
FAULT_TABLE = {
    "alternator_voltage": {
        "low":  ["P0562 - System Voltage Low",
                 "Check battery", "Check voltage regulator"],
        "high": ["P0563 - System Voltage High",
                 "Check voltage regulator", "Check wiring"],
    },
    "coolant_temp": {
        "high": ["P0217 - Engine Overtemp",
                 "Check thermostat", "Check coolant level"],
    },
}

def obd_diff(expected: dict, observed: dict) -> list[str]:
    """Hardcoded diff: look up deviations in the fault table."""
    codes = []
    for sensor, exp_val in expected.items():
        obs_val = observed.get(sensor)
        if obs_val != exp_val and sensor in FAULT_TABLE:
            if obs_val in FAULT_TABLE[sensor]:
                codes.extend(FAULT_TABLE[sensor][obs_val])
    return codes

expected = {"alternator_voltage": "normal", "coolant_temp": "normal"}
observed = {"alternator_voltage": "low",    "coolant_temp": "normal"}

for code in obd_diff(expected, observed):
    print(code)

The limitation is the table. Every hypothesis must be written down before the system ships. No table entry, no hypothesis. The diff is there; the inference is manual.

Facebook Infer (2009) — automated diff

Infer (Calcagno et al. 2009) runs bi-abduction on millions of lines of production code. Given a function's precondition and postcondition, it infers the frame (heap the function didn't touch) and the anti-frame (heap that must exist for the function to be safe).

The figure is what the function modifies; the ground is what it leaves alone. Infer doesn't require the programmer to specify the ground. That's the “bi”: abduction in both directions, computing what must hold before and what the function preserves.

Infer moved the diff from a hand-authored table (OBD-II) to an automated inference engine. Same primitive, higher automation.

IRM (2019) — learned diff

Invariant Risk Minimization (Arjovsky et al. 2019) uses environment variation to force figure/ground separation. Train a model across multiple environments. Features that predict the outcome in all environments are invariant (ground). Features that predict in some but not others are spurious (figure).

IRM diffs across environments rather than across time. Instead of comparing two snapshots of one system, it compares the same learning task under different conditions. Invariant features are ground; environment-specific features are figure. (Note the inversion: in IRM, the invariant features are the causal signal you want. Calling them "ground" follows the changed/unchanged definition but reverses the ordinary sense of "figure = thing of interest." The role assignment holds; the valence flips.)

System	Year	Diff over	Figure	Ground
OBD-II	1996	Expected vs. observed sensor values	Fault codes (hand-authored)	Normal operating range (hand-authored)
Infer	2009	Precondition vs. postcondition	Modified heap (automated)	Frame: untouched heap (inferred)
IRM	2019	Environment A vs. environment B	Spurious features (learned)	Invariant features (learned)

Three encodings. Handcoded table (OBD-II), automated inference (Infer), learned separation (IRM). IRM is the loosest analogue; it seeks invariant representations rather than literally diffing states. But the structural role holds: partition observations into signal and noise.

Code: the full loop

Combine the primitive with hypothesis generation. Given a diff, produce candidate explanations by examining what the figure touches in a dependency graph.

Python

from dataclasses import dataclass

@dataclass
class Hypothesis:
    component: str
    reason: str
    testable: bool = True

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Partition state into figure (changed) and ground (held)."""
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)
    return figure, ground

def abduct(figure: dict, dependencies: dict) -> list[Hypothesis]:
    """Generate hypotheses from the diff.

    dependencies: maps each state variable to components
    that could cause it to change.
    """
    candidates = []
    seen = set()
    for key in figure:
        for component in dependencies.get(key, []):
            if component not in seen:
                seen.add(component)
                was, now = figure[key]
                candidates.append(Hypothesis(
                    component=component,
                    reason=f"{key} changed ({was} -> {now})",
                ))
    return candidates


# --- The mechanic scenario ---

before = {
    "engine": "running", "alternator": "untouched",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
}

after = {
    "engine": "stalled", "alternator": "tapped",
    "battery": "12.6V",  "fuel_pump": "on",
    "spark": "firing",   "coolant": "92C",
}

# Which components can cause each state variable to change?
dependencies = {
    "engine":     ["battery", "voltage_regulator", "ECU"],
    "alternator": ["alternator_belt", "alternator_bearing"],
    "battery":    ["alternator", "parasitic_drain"],
    "fuel_pump":  ["fuel_relay", "fuel_filter"],
    "spark":      ["ignition_coil", "spark_plug"],
    "coolant":    ["thermostat", "water_pump"],
}

figure, ground = diff(before, after)
hypotheses = abduct(figure, dependencies)

print("Diff result:")
print(f"  Figure: {list(figure.keys())}")
print(f"  Ground: {list(ground.keys())}")
print(f"\n{len(hypotheses)} hypotheses generated:")
for h in hypotheses:
    print(f"  [{h.component}] {h.reason}")

from dataclasses import dataclass

@dataclass
class Hypothesis:
    component: str
    reason: str
    testable: bool = True

def diff(before: dict, after: dict) -> tuple[dict, dict]:
    """Partition state into figure (changed) and ground (held)."""
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a:
            ground[key] = a
        else:
            figure[key] = (b, a)
    return figure, ground

def abduct(figure: dict, dependencies: dict) -> list[Hypothesis]:
    """Generate hypotheses from the diff.

dependencies: maps each state variable to components
    that could cause it to change.
    """
    candidates = []
    seen = set()
    for key in figure:
        for component in dependencies.get(key, []):
            if component not in seen:
                seen.add(component)
                was, now = figure[key]
                candidates.append(Hypothesis(
                    component=component,
                    reason=f"{key} changed ({was} -> {now})",
                ))
    return candidates

# --- The mechanic scenario ---

before = {
    "engine": "running", "alternator": "untouched",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
}

after = {
    "engine": "stalled", "alternator": "tapped",
    "battery": "12.6V",  "fuel_pump": "on",
    "spark": "firing",   "coolant": "92C",
}

# Which components can cause each state variable to change?
dependencies = {
    "engine":     ["battery", "voltage_regulator", "ECU"],
    "alternator": ["alternator_belt", "alternator_bearing"],
    "battery":    ["alternator", "parasitic_drain"],
    "fuel_pump":  ["fuel_relay", "fuel_filter"],
    "spark":      ["ignition_coil", "spark_plug"],
    "coolant":    ["thermostat", "water_pump"],
}

figure, ground = diff(before, after)
hypotheses = abduct(figure, dependencies)

print("Diff result:")
print(f"  Figure: {list(figure.keys())}")
print(f"  Ground: {list(ground.keys())}")
print(f"\n{len(hypotheses)} hypotheses generated:")
for h in hypotheses:
    print(f"  [{h.component}] {h.reason}")

Five hypotheses from a two-variable diff. The dependency graph constrains which components are candidates. The ground (fuel pump, spark, coolant) eliminates their upstream components. Battery, voltage regulator, ECU, alternator belt, alternator bearing: the five things to check.

Notice what the code does not do. It does not test the hypotheses, rank them, or estimate their probability. It generates them. The diff is a hypothesis-generation primitive. Testing is induction. Ranking is economy of research (ch 7). The diff names the candidates.

What breaks

The diff requires you to know what to observe. Every variable was chosen by someone: the mechanic who checked six gauges, the engineer who wired six sensors, the programmer who logged six fields. If the relevant state lives in a variable nobody snapshotted, the diff misses it.

Return to the mechanic. She checked six variables but not the serpentine belt tension. If the real cause is a worn belt that slipped when she tapped the alternator, the diff cannot find it. The belt isn't in either snapshot. Not in the figure, not in the ground. Absent.

Python

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

# The variable that matters isn't in the snapshot.
before = {
    "engine": "running", "alternator": "untouched",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
    # "belt_tension": "4.2" — not measured!
}

after = {
    "engine": "stalled", "alternator": "tapped",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
    # "belt_tension": "1.1" — would have been figure, but we missed it
}

figure, ground = diff(before, after)

# The diff correctly reports what changed among observed variables.
# But the real cause (belt_tension: 4.2 -> 1.1) is invisible.
# The diff cannot find what it wasn't told to look at.
print("Figure:", list(figure.keys()))
print("Ground:", list(ground.keys()))
print("Belt tension: not in snapshot. Hypothesis space is incomplete.")

def diff(before, after):
    all_keys = set(before) | set(after)
    figure, ground = {}, {}
    for key in all_keys:
        b, a = before.get(key), after.get(key)
        if b == a: ground[key] = a
        else: figure[key] = (b, a)
    return figure, ground

# The variable that matters isn't in the snapshot.
before = {
    "engine": "running", "alternator": "untouched",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
    # "belt_tension": "4.2" — not measured!
}

after = {
    "engine": "stalled", "alternator": "tapped",
    "battery": "12.6V", "fuel_pump": "on",
    "spark": "firing",  "coolant": "92C",
    # "belt_tension": "1.1" — would have been figure, but we missed it
}

figure, ground = diff(before, after)

# The diff correctly reports what changed among observed variables.
# But the real cause (belt_tension: 4.2 -> 1.1) is invisible.
# The diff cannot find what it wasn't told to look at.
print("Figure:", list(figure.keys()))
print("Ground:", list(ground.keys()))
print("Belt tension: not in snapshot. Hypothesis space is incomplete.")

This is the structural limitation. The unary diff partitions observed state into figure and ground. It cannot reason about unobserved state. The hypothesis space is bounded by what you chose to measure.

OBD-II shows this concretely: if a failure involves a sensor the ECU doesn't monitor, no code fires. Engineers add more sensors, but you can never instrument everything. Some state will always be unmeasured.

Bi-abduction addresses this by inferring the frame: the state the operation must not have touched for the result to be valid. It reasons backward from the postcondition, including state never explicitly observed. The diff goes from "compare what you measured" to "infer what you must have missed."

That is Chapter 5.

Sources

Rubin 1915	Synsoplevede Figurer. Figure-ground segregation in visual perception. The perceptual ancestor of diff.
SAE J1962 (1996)	OBD-II standard. Diagnostic connector, protocol, and trouble code conventions. The diff, hardcoded since 1996.
Calcagno et al. 2009	"Compositional Shape Analysis by Means of Bi-Abduction." POPL. The frame inference engine behind Facebook Infer. Automated figure/ground from separation logic.
Arjovsky et al. 2019	"Invariant Risk Minimization." Environment variation as the lever for figure/ground separation. Learned diff across training conditions.
O'Hearn 2019	"Incorrectness Logic." POPL. Flip the polarity of the diff: under-approximate bugs instead of over-approximating correctness.
Ernst et al. 2001	"Dynamically Discovering Likely Program Invariants to Support Program Evolution." Daikon: infer invariants (ground) from observed execution traces. More samples, sharper ground.

Neighbors

Methodeutics
Ch 2: Security and Uberty — the tradeoff that makes abduction fertile
Ch 7: Economy of Research — selecting among the hypotheses this chapter generates
Abduction — the blog post this chapter formalizes

External

Facebook Infer — bi-abduction in production
OBD-II (Wikipedia)
Arjovsky et al. 2019 — Invariant Risk Minimization (arXiv)