← back to methodeutics

The Hypothesis Graph

Chapter 10 · Kill conditions generate edges

A monotone trend test fires but curvature is indeterminate. Is the system decelerating or drifting? You can’t tell from the current data. But you know exactly what to run next: a longer experiment that resolves curvature. The failure mode named the next hypothesis.

Chapter 9 classified trajectories into four bins: convergent, divergent, oscillatory, aperiodic. Given a trajectory, you can assign a label. But a label is a noun, not a verb. It tells you what the system did, not what to do next.

After Chapter 9, you look at a trajectory and say "oscillatory." After this chapter, you say "oscillatory — test the interface between the two subsystems that are fighting." The label becomes an action. The mechanism is the kill condition.


Failure modes are constructive

The failure mode of a test names the next experiment.

Every test in the classification tree either fires (assigns a label) or misfires (the data is ambiguous). When a test fires, you get a classification. When it misfires, you get something more valuable: a specific statement about what the data cannot resolve. That statement is a hypothesis pointing to the experiment that would resolve it.

The kill condition is a decision point where the test lacks the information to proceed. What the test needed and didn't have specifies the next measurement. The failure mode builds the graph.

The kill-condition decision tree

The classifier from Chapter 9, as a decision tree. Each branch is annotated by its failure mode.

  1. Test for monotone trend. Is the trajectory consistently increasing or decreasing? If no trend, skip to step 3 (periodicity).
  2. If monotone, test curvature. Is the rate of change decelerating (convergent) or constant/accelerating (divergent)?
  3. Test for spectral peaks. Does the frequency spectrum have a narrow peak? If yes: oscillatory.
  4. Test for aperiodic structure. Is there structure that isn't periodic? If yes: aperiodic.
  5. Nothing triggered. Null.

Five tests. Each test that fires produces a label. Each test that misfires produces a hypothesis: an edge pointing to the experiment that would resolve the ambiguity.

Misfires are edges

Three examples of misfires and the edges they generate.

Misfire What the test needed Edge (next experiment)
Monotone trend detected, curvature indeterminate More samples to distinguish deceleration from drift Run a longer experiment
Spectral peak detected but broad Resolution to distinguish a noisy cycle from colored noise Test at a different frequency / longer window
Nothing triggered Any detectable structure at all Test a different perturbation site

The first row is the opening example. The trend test fires: the trajectory is monotone. But the curvature test can't resolve whether the rate is decelerating (heading toward a fixed point) or constant (heading toward infinity). Both look the same in a short window. The failure mode "insufficient samples to distinguish deceleration from drift" names the remedy: collect more data. That edge points to a specific experiment with a specific expected outcome.

The second row: the spectral test detects a peak, but the peak is broad. A narrow peak means a clean oscillation with a definite period. A broad peak means either a noisy cycle (signal contaminated) or colored noise (no true period, just autocorrelation). The failure mode names two candidate hypotheses and the experiment that discriminates: sample at a different frequency, or extend the window until the peak sharpens or dissolves.

The third row: nothing triggered. No monotone trend, no spectral peak, no aperiodic structure. Either the system has no response to this perturbation (dead end) or you perturbed the wrong node. The edge points to a different perturbation site. It says "try elsewhere" without specifying where, but that is still more than "fail to reject."


Successes generate edges too

A successful classification also generates hypotheses. Each label implies a structural claim about the system, and that claim has consequences.

Classification What it tells you What to try next
Convergent Something absorbed the perturbation Test a different node — this one is stabilized
Divergent Single point of failure Test what depends on it — find the blast radius
Oscillatory Two subsystems are fighting Test the interface between them
Aperiodic Input exceeds the architecture’s capacity for structured response Decompose differently — the current partitioning is wrong

Convergent: something compensated. The perturbation was absorbed. The next question is not "test this node harder" but "test a different node." The absorbing mechanism works; the interesting structure is elsewhere. The label redirects.

Divergent: this node is critical and nothing compensated. What depends on it? If it fails, what else fails? The edge points downstream.

Oscillatory: two constraints are in conflict. The system alternates between satisfying one and violating the other. The edge points to the coupling point between them, not to either subsystem individually.

Aperiodic: the response has structure but no repeating pattern. The current decomposition can't resolve it into periodic components. The edge points to the analyst's modeling choices, not to the system itself.


The hypothesis graph

Apply the kill-condition tree iteratively and it builds a graph. Each node is an observation (experiment + classification). Each edge is a hypothesis (generated by the classification or misfire, pointing to the next experiment).

The algorithm:

  1. Pick a frontier node. (Choose what to test next.)
  2. Perturb the system. (Run the experiment.)
  3. Classify the response. (Run the kill-condition tree.)
  4. Generate edges. (The classification or misfire names the open questions.)
  5. Repeat until the frontier stops expanding.

This loop has been implicit since Chapter 1. Abduction, deduction, induction are phases of one cycle. Abduction generates the hypothesis (step 4). Deduction derives a prediction ("if this is oscillatory, testing the interface should show the coupling"). Induction tests it (step 2). The kill-condition tree closes the loop: observation produces hypothesis produces experiment produces observation.


The same mechanism in four fields

Lakatos: the guilty lemma (1976)

In Proofs and Refutations, Lakatos describes a recurring pattern: a proof is proposed, a counterexample appears, but the counterexample does not kill the theorem. It reveals which step of the proof was wrong. Lakatos calls this the "guilty lemma." The counterexample points to the specific assumption that fails, and that assumption is the site of the next revision.

Kill conditions work the same way. A classification test fires but the result is ambiguous. The ambiguity reveals which assumption was insufficient. That assumption is the guilty lemma. The next experiment targets it. Lakatos was describing mathematics, not empirical science, but the structure is identical: a test fails in a way that names the point of failure, and the failure generates the next move.

Reiter: model-based diagnosis (1987)

Reiter formalized diagnosis as hitting sets over conflicts. A conflict is a set of components that cannot all be working given the observations. A diagnosis is a minimal set whose failure explains all conflicts.

In kill-condition terms: each misfire is a conflict, a set of hypotheses that cannot all hold given the data. The next experiment resolves the conflict by determining which member is faulty. Reiter's hitting-set algorithm is a systematic way to generate the edges.

De Kleer & Williams: GDE (1987)

The General Diagnostic Engine (Chapter 7) computes which measurement maximally discriminates among remaining diagnoses. Kill conditions produce the edges; GDE ranks them by information gain per cost. Together: observe → classify → generate edges → rank edges (GDE) → run the best one → observe again.

Schoenfeld: the proof manual (1985)

Schoenfeld filmed undergraduates solving math problems. The stuck pattern was diagnostic. Trying induction repeatedly on the same base case means the base case is wrong. Failing on a specific step of the contrapositive means that step's assumption is the weak point. The failure mode names the escalation technique.

When induction fails because the residual loses structure, the shape of the lost structure tells you which technique to try next. The shape of failure is the edge.


Code: the kill-condition decision tree

The classifier takes a trajectory (e-values over time) and returns both a label and an edge.

Python

The second trajectory is the key case. The trend test fires (monotone, confidence 0.89) but the curvature test misfires (SNR = 0.38, below threshold). The classifier does not guess. It reports the misfire and generates the edge: "run longer." A p-value system would report "significant trend, p < 0.01" and stop. The kill condition says: significant trend, yes, but the type of trend is unresolved. More data is the edge.


Building the graph

One classification produces one or two edges. Iterate the loop (classify, follow, classify again) and it builds a graph.

Python

Three rounds, three edges. The first classification misfired (curvature indeterminate) and generated "run longer." The second succeeded (convergent) and redirected: "test a different node." The third succeeded (oscillatory) and targeted the coupling point. Each step followed from the previous result.

The frontier has three open nodes: three experiments suggested but not yet run. Each is a specific action generated by a specific kill condition.


Convergence is not guaranteed

Each kill condition generates one edge. Three rounds produced three edges and three frontier nodes. Ten rounds produce ten edges and up to ten new frontier nodes. The graph grows. Does it close?

Convergence means every frontier edge points to a node already tested and stably classified. With finite structure and stable classifications, the graph should close. But "should" is not a proof. Feedback can create cycles: convergent at node A points to node B, which oscillates, which points back to A with new data that changes A's classification. Classifications are data-dependent.

The pieces for a convergence proof exist independently: Chernoff (1959) proved adaptive experiment selection converges; Grünwald (2024) proved e-values compose across adaptive experiments. Connecting them to the hypothesis graph is open. That is Chapter 11.

Neighbors

External