← back to methodeutics

Diff

Chapter 4 ยท Part II begins ยท Calcagno 2009, O'Hearn 2019, Arjovsky 2019, Rubin 1915

Chapter 3 showed the same abstract schema under eight names. This chapter defines the primitive: snapshot before, snapshot after, XOR. What flipped is figure; what held is ground. Diff is the contrast that abduction operates over.

The mechanic

A mechanic taps the alternator and the engine stalls. Two hypotheses fire: test the battery, test the voltage regulator. Nobody taught her a hypothesis-generation algorithm. The shape of the failure named the next experiment.

Where did the hypotheses come from? She compared two states: engine running, engine stalled. She noted what changed (alternator tapped) and what stayed the same (fuel, spark, coolant). The change pointed at the electrical subsystem. The hypotheses followed from the diff.

Diff is computable. The mechanic performed it in her head. OBD-II performs it in silicon. Facebook Infer performs it on code. The representation changes; the operation stays the same.

The primitive

Diff is the substrate of abduction, not abduction entire. It produces the contrast that abduction operates over:

Given expected state E, observed state O, and background frame B, partition what differs from what remains invariant.

Abduction then uses that partition to generate candidate causes H: minimal revisions to B that would make O unsurprising. The partition itself is the primitive. All eight names in Chapter 3 instantiate it. The simplest encoding: take two snapshots of a system's state, one before an event and one after. The comparison partitions state into two sets:

Figure — what changed. Variables whose values differ between snapshots. In Gestalt terms, what pops out against a stable background.

Ground — what held. Variables whose values stayed the same. The context that remained invariant while the figure shifted. In separation logic, the frame that bi-abduction infers.

Formally:

diff(state_before, state_after) → (figure, ground)

The partition is symmetric: it doesn't matter which snapshot is “first.” But use is asymmetric. The before-state is the baseline; the after-state is the perturbation. The diff tells you what the perturbation touched.

Caution: diff gives candidates, not causes. A changed variable may be cause, effect, symptom, intervention, or coincident noise. The mechanic's diff puts both alternator tapped (intervention) and engine stalled (effect) in the figure. The diff alone does not know which is which. Separating cause from effect requires a dependency graph or an experiment. The diff just names what changed.

The XOR

The simplest instantiation: state is a set of key-value pairs. The figure is every key whose value changed. The ground is every key whose value didn't.

Python

Apply it to the mechanic's engine:

Python

Two figure entries: the alternator (input) and the engine state (output). Four ground entries: battery, fuel pump, spark, coolant, all unchanged.

The hypotheses follow from the figure. The alternator was tapped, the engine stalled. What connects them? The electrical subsystem. Battery supplies power; voltage regulator manages alternator output. Both sit upstream of the engine and both fit a tap-induced failure. The ground tells the mechanic what to deprioritize: fuel-pump and spark are unchanged (though that doesn't eliminate them unless the measurement set is complete).

This is the contrast primitive. Hypothesis generation, ranking, and testing build on top of it.

Degrees of freedom

The diff above is the minimal case: one before, one after, one partition. The variants in the literature add degrees of freedom to this primitive:

Variant Inputs Outputs What it adds
Unary diff One before, one after Figure + ground The primitive. This chapter.
Bi-abduction Partial before, partial after Inferred frame + inferred anti-frame Infers the ground autonomously. Ch 5.
Incorrectness One before, one after Under-approximation of bugs Flip polarity: attend to failure, not success.
Tri-abduction Fork: shared start, two branches Causal edge (what the branch changed) Diff across branches, not just time. Ch 6.

Each step adds an operand. One snapshot pair gives one frame. Two pairs (actual and counterfactual) give one causal edge. N pairs across N branches give a typed subgraph. The pattern stays diff; the arity grows.

Three witnesses

Three systems, three decades, three fields. Each encodes the diff. They are rarely presented as instances of one operation.

OBD-II (1996) — hardcoded diff

OBD-II reads sensor states, diffs against expected values, and generates fault codes. (Real OBD-II adds thresholds, monitors, and enable conditions; "hardcoded diff" is a simplification of the core logic.) Vehicles have run this since 1996.

But OBD-II is hardcoded. The fault tree is hand-authored, the hypotheses enumerated in advance. An engineer decided which deviations map to which codes. If a failure mode wasn't anticipated, nothing fires. The primitive works; it just runs on a fixed table.

Python

The limitation is the table. Every hypothesis must be written down before the system ships. No table entry, no hypothesis. The diff is there; the inference is manual.

Facebook Infer (2009) — automated diff

Infer (Calcagno et al. 2009) runs bi-abduction on millions of lines of production code. Given a function's precondition and postcondition, it infers the frame (heap the function didn't touch) and the anti-frame (heap that must exist for the function to be safe).

The figure is what the function modifies; the ground is what it leaves alone. Infer doesn't require the programmer to specify the ground. That's the “bi”: abduction in both directions, computing what must hold before and what the function preserves.

Infer moved the diff from a hand-authored table (OBD-II) to an automated inference engine. Same primitive, higher automation.

IRM (2019) — learned diff

Invariant Risk Minimization (Arjovsky et al. 2019) uses environment variation to force figure/ground separation. Train a model across multiple environments. Features that predict the outcome in all environments are invariant (ground). Features that predict in some but not others are spurious (figure).

IRM diffs across environments rather than across time. Instead of comparing two snapshots of one system, it compares the same learning task under different conditions. Invariant features are ground; environment-specific features are figure. (Note the inversion: in IRM, the invariant features are the causal signal you want. Calling them "ground" follows the changed/unchanged definition but reverses the ordinary sense of "figure = thing of interest." The role assignment holds; the valence flips.)

System Year Diff over Figure Ground
OBD-II 1996 Expected vs. observed sensor values Fault codes (hand-authored) Normal operating range (hand-authored)
Infer 2009 Precondition vs. postcondition Modified heap (automated) Frame: untouched heap (inferred)
IRM 2019 Environment A vs. environment B Spurious features (learned) Invariant features (learned)

Three encodings. Handcoded table (OBD-II), automated inference (Infer), learned separation (IRM). IRM is the loosest analogue; it seeks invariant representations rather than literally diffing states. But the structural role holds: partition observations into signal and noise.

Code: the full loop

Combine the primitive with hypothesis generation. Given a diff, produce candidate explanations by examining what the figure touches in a dependency graph.

Python

Five hypotheses from a two-variable diff. The dependency graph constrains which components are candidates. The ground (fuel pump, spark, coolant) eliminates their upstream components. Battery, voltage regulator, ECU, alternator belt, alternator bearing: the five things to check.

Notice what the code does not do. It does not test the hypotheses, rank them, or estimate their probability. It generates them. The diff is a hypothesis-generation primitive. Testing is induction. Ranking is economy of research (ch 7). The diff names the candidates.


What breaks

The diff requires you to know what to observe. Every variable was chosen by someone: the mechanic who checked six gauges, the engineer who wired six sensors, the programmer who logged six fields. If the relevant state lives in a variable nobody snapshotted, the diff misses it.

Return to the mechanic. She checked six variables but not the serpentine belt tension. If the real cause is a worn belt that slipped when she tapped the alternator, the diff cannot find it. The belt isn't in either snapshot. Not in the figure, not in the ground. Absent.

Python

This is the structural limitation. The unary diff partitions observed state into figure and ground. It cannot reason about unobserved state. The hypothesis space is bounded by what you chose to measure.

OBD-II shows this concretely: if a failure involves a sensor the ECU doesn't monitor, no code fires. Engineers add more sensors, but you can never instrument everything. Some state will always be unmeasured.

Bi-abduction addresses this by inferring the frame: the state the operation must not have touched for the result to be valid. It reasons backward from the postcondition, including state never explicitly observed. The diff goes from "compare what you measured" to "infer what you must have missed."

That is Chapter 5.


Sources

Rubin 1915 Synsoplevede Figurer. Figure-ground segregation in visual perception. The perceptual ancestor of diff.
SAE J1962 (1996) OBD-II standard. Diagnostic connector, protocol, and trouble code conventions. The diff, hardcoded since 1996.
Calcagno et al. 2009 "Compositional Shape Analysis by Means of Bi-Abduction." POPL. The frame inference engine behind Facebook Infer. Automated figure/ground from separation logic.
Arjovsky et al. 2019 "Invariant Risk Minimization." Environment variation as the lever for figure/ground separation. Learned diff across training conditions.
O'Hearn 2019 "Incorrectness Logic." POPL. Flip the polarity of the diff: under-approximate bugs instead of over-approximating correctness.
Ernst et al. 2001 "Dynamically Discovering Likely Program Invariants to Support Program Evolution." Daikon: infer invariants (ground) from observed execution traces. More samples, sharper ground.
Neighbors

External