Bi-abduction

Chapter 5 · Calcagno et al. 2009, Bylander et al. 1991, O'Hearn 2019

Facebook runs Infer on millions of lines of code every day. It infers the frame (what the diff didn't touch) automatically. Chapter 4's diff requires the observer to choose what to snapshot. Bi-abduction removes that requirement.

Diff requires a choice

Chapter 4 defined the primitive: snapshot before, snapshot after, XOR. The figure is what changed; the ground is what held. The mechanic taps the alternator. The engine stalls. The diff between "before tap" and "after tap" is the figure. Everything else — battery, fuel, spark — is the ground.

But the mechanic chose what to observe. She chose to look at the engine, not the headlights. She chose to snapshot RPM, not cabin temperature. The diff works on whatever you give it. It cannot tell you what you forgot to give it.

Scale breaks this. A function modifies some variables and leaves others untouched. Snapshot all of them, you get the right diff. Miss one, the ground is incomplete, and the hypothesis overgeneralizes. "Tile size 16 is optimal" is a different claim from "tile size 16 is optimal when loop order is column-major and vectorization is enabled." The ground is the qualification that makes the hypothesis precise.

The question: can a system infer the ground automatically, without the observer pre-selecting what to snapshot?

Separation logic: the frame rule

The answer comes from separation logic, a branch of formal verification. The key idea: the heap (all the memory a program can access) separates into two disjoint parts.

Part	What it is	In diff terms
Footprint	The memory the function actually reads and writes	Figure
Frame	Everything else — memory the function doesn't touch	Ground

Separation logic's frame rule: if a function is correct on its footprint alone, it stays correct on its footprint plus any frame. The frame passes through untouched. Figure/ground separation, stated as a theorem.

The separating conjunction * means "and separately." P * F asserts two disjoint heap regions: one satisfying P (footprint), one satisfying F (frame). The frame rule:

   {P} C {Q}
─────────────────────   (Frame Rule)
 {P * F} C {Q * F}

If command C transforms precondition P into postcondition Q, then running C on a larger heap P * F transforms it into Q * F. The frame F is unchanged. The ground, guaranteed by the logic itself.

Bi-abduction: inferring both sides

The frame rule assumes you already know the frame. Calcagno, Distefano, O'Hearn, and Yang (2009) asked: what if you don't?

Their insight: solve for two unknowns at once. Given a function, find:

The anti-frame (missing precondition): what must be true before the function runs. The heap the function needs but that wasn't stated in the spec.
The frame (untouched heap): what stays true after the function runs. The heap the function doesn't touch.

The bi-abduction problem: given A (known heap) and a function requiring precondition P, find M (anti-frame) and F (frame) such that:

A * M  ⊢  P * F

where:
  A = what we know (current heap state)
  M = anti-frame (what must also be true — inferred)
  P = what the function needs (precondition)
  F = frame (what the function won't touch — inferred)

Two unknowns, one equation, solved simultaneously. The name bi-abduction comes from this double inference: abducing the missing precondition (M) and abducing the frame (F) at the same time.

Anti-frame is hypothesis, frame is ground

Translate out of verification language and back into the language of inquiry.

Verification	Methodeutics	What it does
Anti-frame (M)	Hypothesis	What was missing that would explain the observation. The abductive leap.
Frame (F)	Ground	What was untouched: the context that qualifies the hypothesis. Automatically inferred.

Chapter 4's diff gives you figure and ground only if you choose the right snapshot. In program analysis, bi-abduction infers both from the function's local heap behavior, instead of requiring a global snapshot chosen in advance.

Infer scales bi-abduction to millions of lines

Infer is the tool Calcagno and O'Hearn built at Facebook. It runs bi-abduction on every diff submitted to the codebase: millions of lines, every day. For each function, Infer infers the precondition (what must the caller set up?) and the frame (what does the function leave alone?). When the inferred precondition conflicts with the calling context, Infer reports a bug: null dereference, resource leak, race condition.

The key to scaling: compositionality. Infer analyzes each function independently, using bi-abduction to infer its contract, then composes contracts across call boundaries. Function A calls B calls C. Start at C, infer its contract, use that contract when analyzing B, and so on up the call graph. Each function is analyzed once, with dependencies summarized as inferred contracts.

Bylander et al. (1991) proved that abduction in general is NP-hard: consider all possible explanations for a global observation and the search space explodes. Bi-abduction sidesteps this by reasoning locally: one function at a time, composing summaries up the call graph.

Approach	Scope	Complexity
General abduction	Global: all possible explanations for all observations	NP-hard (Bylander et al. 1991)
Bi-abduction	Local: one function at a time, compose results	Polynomial per function, linear composition

Abduction becomes tractable when you decompose the problem into local pieces with clear boundaries. Separation logic provides the decomposition. The separating conjunction * guarantees that the pieces don't interfere. Without that guarantee, local reasoning is unsound; a change in one function could silently affect another's frame.

Code: a toy model of bi-abduction bookkeeping

A simplified analogue outside verification. A function reads and writes keys in a state dictionary. Given explicitly supplied reads and writes (the hard part, which real bi-abduction infers from code), compute which keys must exist beforehand (anti-frame) and which are untouched (frame).

Python

def bi_abduce(known_state, reads, writes):
    """Infer the anti-frame and frame for a state-machine transition.

    known_state: dict of keys we know exist with their values
    reads:       set of keys the function reads
    writes:      dict of keys the function writes (key -> new_value)

    Returns (anti_frame, frame):
      anti_frame: keys the function needs but aren't in known_state
      frame:      keys in known_state the function doesn't touch
    """
    # Anti-frame: keys the function reads that aren't in known_state.
    # These MUST exist for the function to succeed.
    anti_frame = reads - set(known_state.keys())

    # Frame: keys in known_state that the function neither reads nor writes.
    # These are UNTOUCHED — the ground.
    touched = reads | set(writes.keys())
    frame = {k: v for k, v in known_state.items() if k not in touched}

    return anti_frame, frame


# --- Example: a checkout function in an e-commerce state machine ---

# What we know about the current state
known_state = {
    "cart": ["item_a", "item_b"],
    "user_id": 42,
    "session_token": "abc123",
    "theme": "dark",
    "locale": "en-US",
}

# The checkout function reads these keys:
reads = {"cart", "user_id", "payment_method"}

# And writes these keys:
writes = {"order_id": "ord_789", "cart": []}

anti_frame, frame = bi_abduce(known_state, reads, writes)

print("Anti-frame (missing preconditions):")
for key in sorted(anti_frame):
    print(f"  {key} — must exist before checkout runs")

print("\nFrame (untouched state):")
for key in sorted(frame):
    print(f"  {key} = {frame[key]!r}")

print("\nFigure (touched by checkout):")
touched = reads | set(writes.keys())
for key in sorted(touched):
    if key in writes:
        print(f"  {key} -> {writes[key]!r}")
    else:
        print(f"  {key} (read only)")

def bi_abduce(known_state, reads, writes):
    """Infer the anti-frame and frame for a state-machine transition.

known_state: dict of keys we know exist with their values
    reads:       set of keys the function reads
    writes:      dict of keys the function writes (key -> new_value)

Returns (anti_frame, frame):
      anti_frame: keys the function needs but aren't in known_state
      frame:      keys in known_state the function doesn't touch
    """
    # Anti-frame: keys the function reads that aren't in known_state.
    # These MUST exist for the function to succeed.
    anti_frame = reads - set(known_state.keys())

# Frame: keys in known_state that the function neither reads nor writes.
    # These are UNTOUCHED — the ground.
    touched = reads | set(writes.keys())
    frame = {k: v for k, v in known_state.items() if k not in touched}

return anti_frame, frame

# --- Example: a checkout function in an e-commerce state machine ---

# What we know about the current state
known_state = {
    "cart": ["item_a", "item_b"],
    "user_id": 42,
    "session_token": "abc123",
    "theme": "dark",
    "locale": "en-US",
}

# The checkout function reads these keys:
reads = {"cart", "user_id", "payment_method"}

# And writes these keys:
writes = {"order_id": "ord_789", "cart": []}

anti_frame, frame = bi_abduce(known_state, reads, writes)

print("Anti-frame (missing preconditions):")
for key in sorted(anti_frame):
    print(f"  {key} — must exist before checkout runs")

print("\nFrame (untouched state):")
for key in sorted(frame):
    print(f"  {key} = {frame[key]!r}")

print("\nFigure (touched by checkout):")
touched = reads | set(writes.keys())
for key in sorted(touched):
    if key in writes:
        print(f"  {key} -> {writes[key]!r}")
    else:
        print(f"  {key} (read only)")

The function never mentions theme, locale, or session_token, so bi-abduction infers they are untouched: the frame. It reads payment_method, which isn't in the known state, so bi-abduction infers it must exist: the anti-frame. From a known footprint, the system computed the ground and the hypothesis without the observer choosing what to snapshot.

Now compose. The order-confirmation function reads order_id (which checkout wrote) and email (which nobody mentioned). Bi-abduction infers email as a new anti-frame entry. Chain the two: composed anti-frame is {payment_method, email}, composed frame is {theme, locale, session_token}. Each function analyzed locally; composition is a set union. Infer's actual composition is richer, but the structure is the same.

Compositionality is the key

Three properties make bi-abduction work where general abduction doesn't.

Locality. Each function is analyzed in isolation. You don't need the whole program to infer one function's contract. The mechanic diagnoses one subsystem at a time: checking the alternator doesn't require understanding the transmission.

Separation. Frame and footprint are disjoint. What one function touches, another cannot silently depend on. The separating conjunction * is the formal guarantee. Without it, a change in function A could invalidate function B's frame, making local reasoning unsound.

Compositionality. Inferred contracts compose across boundaries. Function A's postcondition becomes function B's precondition context. The anti-frames and frames chain. The analysis scales linearly in the call graph, not exponentially in the state space.

The strategy transfers beyond programs, but only where the domain has something like separation: parts whose interactions are explicit enough that local conclusions can compose. Diagnose locally, compose globally.

What breaks

Bi-abduction reasons along a single execution path: one (anti-frame, frame) pair per path. But programs branch. Consider:

Python

def process_payment(state):
    if state["payment_method"] == "credit_card":
        # Branch A: reads card_number, writes authorization_code
        validate(state["card_number"])
        state["authorization_code"] = charge(state["card_number"])
    else:
        # Branch B: reads bank_account, writes transfer_id
        verify(state["bank_account"])
        state["transfer_id"] = transfer(state["bank_account"])

Branch A touches {card_number, authorization_code}. Branch B touches {bank_account, transfer_id}. The two branches have different footprints and different frames. Bi-abduction on branch A says card_number must exist. Bi-abduction on branch B says bank_account must exist. Which anti-frame is correct?

Both, path-conditioned. If payment_method == "credit_card", require card_number; otherwise require bank_account. The precondition is disjunctive. Bi-abduction gives one answer per path; relating the paths requires understanding that the branch condition itself determines the outcome, that the choice between paths is the real structure.

Not a bug in bi-abduction. Its boundary. The primitive handles one path. When execution forks, you need the diff between branches, not just the diff within each branch.

That is tri-abduction. Chapter 6.

Sources

Calcagno et al. 2009	"Compositional Shape Analysis by Means of Bi-Abduction." POPL 2009. ACM DL. The original bi-abduction paper. Introduces simultaneous inference of anti-frame and frame for compositional program analysis.
Bylander et al. 1991	"The Computational Complexity of Abduction." Artificial Intelligence 49(1–3). Proves abduction is NP-hard in general. The result that makes local decomposition necessary.
O'Hearn 2019	"Separation Logic." Communications of the ACM 62(2). Retrospective on separation logic and the frame rule, from the co-inventor.
Reynolds 2002	"Separation Logic: A Logic for Shared Mutable Data Structures." LICS 2002. The foundational paper on separation logic and the frame rule.
Facebook Infer	fbinfer.com. Open-source static analyzer using bi-abduction. Deployed at Facebook, Amazon, Uber, and others. The primitive in production.

Neighbors

🔭 Methodeutics
Abduction — the blog post that covers bi-abduction, tri-abduction, and three witnesses
Ch 7: Economy of Research — which hypotheses are worth testing?

External