← back to methodeutics

Bi-abduction

Chapter 5 · Calcagno et al. 2009, Bylander et al. 1991, O'Hearn 2019

Facebook runs Infer on millions of lines of code every day. It infers the frame (what the diff didn't touch) automatically. Chapter 4's diff requires the observer to choose what to snapshot. Bi-abduction removes that requirement.

Diff requires a choice

Chapter 4 defined the primitive: snapshot before, snapshot after, XOR. The figure is what changed; the ground is what held. The mechanic taps the alternator. The engine stalls. The diff between "before tap" and "after tap" is the figure. Everything else — battery, fuel, spark — is the ground.

But the mechanic chose what to observe. She chose to look at the engine, not the headlights. She chose to snapshot RPM, not cabin temperature. The diff works on whatever you give it. It cannot tell you what you forgot to give it.

Scale breaks this. A function modifies some variables and leaves others untouched. Snapshot all of them, you get the right diff. Miss one, the ground is incomplete, and the hypothesis overgeneralizes. "Tile size 16 is optimal" is a different claim from "tile size 16 is optimal when loop order is column-major and vectorization is enabled." The ground is the qualification that makes the hypothesis precise.

The question: can a system infer the ground automatically, without the observer pre-selecting what to snapshot?


Separation logic: the frame rule

The answer comes from separation logic, a branch of formal verification. The key idea: the heap (all the memory a program can access) separates into two disjoint parts.

Part What it is In diff terms
Footprint The memory the function actually reads and writes Figure
Frame Everything else — memory the function doesn't touch Ground

Separation logic's frame rule: if a function is correct on its footprint alone, it stays correct on its footprint plus any frame. The frame passes through untouched. Figure/ground separation, stated as a theorem.

The separating conjunction * means "and separately." P * F asserts two disjoint heap regions: one satisfying P (footprint), one satisfying F (frame). The frame rule:

   {P} C {Q}
─────────────────────   (Frame Rule)
 {P * F} C {Q * F}

If command C transforms precondition P into postcondition Q, then running C on a larger heap P * F transforms it into Q * F. The frame F is unchanged. The ground, guaranteed by the logic itself.

Bi-abduction: inferring both sides

The frame rule assumes you already know the frame. Calcagno, Distefano, O'Hearn, and Yang (2009) asked: what if you don't?

Their insight: solve for two unknowns at once. Given a function, find:

  1. The anti-frame (missing precondition): what must be true before the function runs. The heap the function needs but that wasn't stated in the spec.
  2. The frame (untouched heap): what stays true after the function runs. The heap the function doesn't touch.

The bi-abduction problem: given A (known heap) and a function requiring precondition P, find M (anti-frame) and F (frame) such that:

A * M  ⊢  P * F

where:
  A = what we know (current heap state)
  M = anti-frame (what must also be true — inferred)
  P = what the function needs (precondition)
  F = frame (what the function won't touch — inferred)

Two unknowns, one equation, solved simultaneously. The name bi-abduction comes from this double inference: abducing the missing precondition (M) and abducing the frame (F) at the same time.

Anti-frame is hypothesis, frame is ground

Translate out of verification language and back into the language of inquiry.

Verification Methodeutics What it does
Anti-frame (M) Hypothesis What was missing that would explain the observation. The abductive leap.
Frame (F) Ground What was untouched: the context that qualifies the hypothesis. Automatically inferred.

Chapter 4's diff gives you figure and ground only if you choose the right snapshot. In program analysis, bi-abduction infers both from the function's local heap behavior, instead of requiring a global snapshot chosen in advance.


Infer scales bi-abduction to millions of lines

Infer is the tool Calcagno and O'Hearn built at Facebook. It runs bi-abduction on every diff submitted to the codebase: millions of lines, every day. For each function, Infer infers the precondition (what must the caller set up?) and the frame (what does the function leave alone?). When the inferred precondition conflicts with the calling context, Infer reports a bug: null dereference, resource leak, race condition.

The key to scaling: compositionality. Infer analyzes each function independently, using bi-abduction to infer its contract, then composes contracts across call boundaries. Function A calls B calls C. Start at C, infer its contract, use that contract when analyzing B, and so on up the call graph. Each function is analyzed once, with dependencies summarized as inferred contracts.

Bylander et al. (1991) proved that abduction in general is NP-hard: consider all possible explanations for a global observation and the search space explodes. Bi-abduction sidesteps this by reasoning locally: one function at a time, composing summaries up the call graph.

Approach Scope Complexity
General abduction Global: all possible explanations for all observations NP-hard (Bylander et al. 1991)
Bi-abduction Local: one function at a time, compose results Polynomial per function, linear composition

Abduction becomes tractable when you decompose the problem into local pieces with clear boundaries. Separation logic provides the decomposition. The separating conjunction * guarantees that the pieces don't interfere. Without that guarantee, local reasoning is unsound; a change in one function could silently affect another's frame.


Code: a toy model of bi-abduction bookkeeping

A simplified analogue outside verification. A function reads and writes keys in a state dictionary. Given explicitly supplied reads and writes (the hard part, which real bi-abduction infers from code), compute which keys must exist beforehand (anti-frame) and which are untouched (frame).

Python

The function never mentions theme, locale, or session_token, so bi-abduction infers they are untouched: the frame. It reads payment_method, which isn't in the known state, so bi-abduction infers it must exist: the anti-frame. From a known footprint, the system computed the ground and the hypothesis without the observer choosing what to snapshot.

Now compose. The order-confirmation function reads order_id (which checkout wrote) and email (which nobody mentioned). Bi-abduction infers email as a new anti-frame entry. Chain the two: composed anti-frame is {payment_method, email}, composed frame is {theme, locale, session_token}. Each function analyzed locally; composition is a set union. Infer's actual composition is richer, but the structure is the same.


Compositionality is the key

Three properties make bi-abduction work where general abduction doesn't.

Locality. Each function is analyzed in isolation. You don't need the whole program to infer one function's contract. The mechanic diagnoses one subsystem at a time: checking the alternator doesn't require understanding the transmission.

Separation. Frame and footprint are disjoint. What one function touches, another cannot silently depend on. The separating conjunction * is the formal guarantee. Without it, a change in function A could invalidate function B's frame, making local reasoning unsound.

Compositionality. Inferred contracts compose across boundaries. Function A's postcondition becomes function B's precondition context. The anti-frames and frames chain. The analysis scales linearly in the call graph, not exponentially in the state space.

The strategy transfers beyond programs, but only where the domain has something like separation: parts whose interactions are explicit enough that local conclusions can compose. Diagnose locally, compose globally.


What breaks

Bi-abduction reasons along a single execution path: one (anti-frame, frame) pair per path. But programs branch. Consider:

Python

Branch A touches {card_number, authorization_code}. Branch B touches {bank_account, transfer_id}. The two branches have different footprints and different frames. Bi-abduction on branch A says card_number must exist. Bi-abduction on branch B says bank_account must exist. Which anti-frame is correct?

Both, path-conditioned. If payment_method == "credit_card", require card_number; otherwise require bank_account. The precondition is disjunctive. Bi-abduction gives one answer per path; relating the paths requires understanding that the branch condition itself determines the outcome, that the choice between paths is the real structure.

Not a bug in bi-abduction. Its boundary. The primitive handles one path. When execution forks, you need the diff between branches, not just the diff within each branch.

That is tri-abduction. Chapter 6.


Sources

Calcagno et al. 2009 "Compositional Shape Analysis by Means of Bi-Abduction." POPL 2009. ACM DL. The original bi-abduction paper. Introduces simultaneous inference of anti-frame and frame for compositional program analysis.
Bylander et al. 1991 "The Computational Complexity of Abduction." Artificial Intelligence 49(1–3). Proves abduction is NP-hard in general. The result that makes local decomposition necessary.
O'Hearn 2019 "Separation Logic." Communications of the ACM 62(2). Retrospective on separation logic and the frame rule, from the co-inventor.
Reynolds 2002 "Separation Logic: A Logic for Shared Mutable Data Structures." LICS 2002. The foundational paper on separation logic and the frame rule.
Facebook Infer fbinfer.com. Open-source static analyzer using bi-abduction. Deployed at Facebook, Amazon, Uber, and others. The primitive in production.
Neighbors

External