← back to methodeutics

Economy of Research

Chapter 7 ยท Peirce 1879, Chamberlin 1890, Platt 1964, de Kleer 1987, Hintikka 1999

Ten hypotheses, budget for three experiments. Which three? Chapters 4–6 formalized the abductive primitive (diff, bi-abduction, tri-abduction). The primitive generates hypotheses. This chapter selects among them.

Hypothesis generation is cheap. Testing is expensive. Every experiment costs time, compute, reagents, or attention. A single surprising observation can spawn dozens of candidate explanations. Running all of them is never feasible. Which subset do you test, in what order, and when do you stop?

Peirce formalized this in 1879. Chamberlin addressed the cognitive prerequisite in 1890. Platt operationalized it in 1964. De Kleer and Williams automated it in 1987. Hintikka gave it game-theoretic foundations in the 1990s. The machine-learning literature keeps reinventing the same criteria without connecting back to abduction.


Peirce: marginal return on research investment (1879)

In "Note on the Theory of the Economy of Research," Peirce made the allocation problem explicit: given a fixed budget, how do you distribute it across experiments to maximize what you learn?

His answer used marginal analysis. Each dollar spent on an experiment yields some increment of knowledge. That increment diminishes as you pour more into the same line of inquiry. The optimal allocation equalizes marginal return per dollar across all active experiments. If A yields more information per dollar than B, shift budget from B to A until the marginals equalize.

Peirce derived a specific result: optimal sample size depends on the ratio of information utility to observation cost. High-utility, low-cost experiments get large samples. Low-utility, high-cost experiments get small samples or none. Obvious in retrospect, but Peirce formalized it 68 years before Wald.

"The doctrine of economy, in general, treats of the relations between utility and cost. The economy of research is that particular application of it which gives the rules for the expenditure of money, energy, and time upon the different elements of an investigation."

— Peirce, "Note on the Theory of the Economy of Research" (1879)

Wald (1947) asked: given a stream of data, when should you stop collecting? Peirce asked the prior question: which stream should you open? Wald's work became a field. Peirce's note was ignored until historians rediscovered it in the 1960s.

Chamberlin: breadth before depth (1890)

Chamberlin's "Method of Multiple Working Hypotheses" addresses the cognitive prerequisite. You cannot select optimally among hypotheses you haven't generated. And you won't generate competitors to your favorite if you have one.

His argument is structural. A scientist holding one ruling theory unconsciously steers experiments toward confirmation. Fitting data is amplified; contradicting data is explained away. The remedy: hold multiple hypotheses simultaneously, without ranking, and let data sort the field.

Chamberlin supplies the input to Peirce's economy. Before you can ask "which experiment next?" you need a hypothesis set broad enough that the true explanation is likely among them.

Step Who Contribution
GenerateAbductive primitive (Ch 4–6)Produce candidate hypotheses from the diff between expected and observed
DiversifyChamberlin 1890Hold all candidates without ranking; prevent premature convergence on a favorite
SelectPeirce 1879Allocate budget to maximize information gain per unit cost

Platt: strong inference as a selection protocol (1964)

Platt operationalized Chamberlin into a loop: enumerate alternative hypotheses, design a crucial experiment that excludes at least one, run it, repeat. His contribution is the crucial experiment: one whose outcome eliminates hypotheses regardless of which way it goes.

Crucial experiments are maximally economical. A non-crucial experiment can confirm a hypothesis without eliminating alternatives; it spends budget without shrinking the hypothesis space. A crucial experiment guarantees the space shrinks every round. Platt observed that fields practicing strong inference (molecular biology, particle physics) moved faster than fields that didn't (psychology, ecology). The rate difference tracks the fraction of experiments that are crucial.

But Platt's framework is qualitative. He tells you what a good experiment looks like, not how to rank several crucial alternatives. For ranking, you need de Kleer.

De Kleer & Williams: GDE and computed experiment selection (1987)

The General Diagnostic Engine (GDE) computed which measurement to take next from the structure of remaining hypotheses. The context was circuit diagnosis: a chip has a fault, multiple components could be responsible, and each probe has a cost.

GDE maintains candidate diagnoses: minimal sets of faulty components consistent with all observations so far. For each possible measurement, it computes how much probing that node reduces the entropy of the diagnosis distribution, then selects the measurement with the highest information gain per unit cost.

The algorithm is straightforward:

  1. Enumerate remaining candidate diagnoses, each with a probability.
  2. For each possible measurement, compute the expected posterior entropy over diagnoses.
  3. Subtract from the current entropy to get expected information gain.
  4. Divide by measurement cost. Select the measurement with the highest ratio.

GDE realizes the same structure as Peirce's economy (marginal information per dollar), Platt's crucial experiments (maximize elimination), and Shannon's information theory (entropy reduction) in a single algorithm. The domain was circuit diagnosis, but the architecture generalizes. Replace "component" with "hypothesis" and "probe" with "experiment" and you have economy of research, automated.

Hintikka: inquiry as a game (1988/1999)

Hintikka recast investigation as a two-player game between the Inquirer and Nature. The Inquirer asks questions (runs experiments). Nature answers (provides data). The goal: reach a definite conclusion at minimum total cost.

Each question is an abductive act: "If H were true, what would I observe at point X?" The answer supports H or eliminates it. The optimal strategy is economy of research expressed as a game tree.

The game-theoretic framing captures what information theory misses: the next question depends on previous answers. GDE recomputes from scratch after each measurement. Hintikka encodes the dependency structure. The Inquirer's strategy is a policy (a function from observation history to the next question), not a ranking. This is the difference between a sorted list and a decision tree.

Framework Selection criterion What it adds Limitation
Peirce 1879Marginal utility / costBudget allocation across experimentsNo formal measure of "utility of information"
Chamberlin 1890Breadth of hypothesis setCognitive debiasing; improves coverage and reduces fixationNo selection criterion; all hypotheses held equally
Platt 1964Crucial experiment (must eliminate ≥1)Guarantees hypothesis space shrinks each roundQualitative; no ranking among crucial experiments
GDE 1987Max entropy reduction / costComputable, information-theoretic, automatedMyopic (one step lookahead); no sequential dependency
Hintikka 1999Optimal game-tree strategySequential dependency; next question depends on previous answersComputationally intractable for large hypothesis spaces

Modern connections

Machine learning has reinvented economy of research under several names, each formalizing "which experiment next?" without citing Peirce or Hintikka.

Active learning. Which unlabeled point should an oracle label next? The standard criterion (uncertainty sampling, query-by-committee) selects the point the model is most uncertain about. This is GDE's entropy-reduction criterion restricted to a single model class.

Bayesian experimental design. Choose the experiment that maximizes expected information gain about model parameters. Lindley (1956) formalized this; Chaloner and Verdinelli (1995) surveyed the field. The criterion is GDE's, applied to continuous parameter spaces instead of discrete diagnostic candidates.

Multi-armed bandits. Allocate trials across K arms to maximize cumulative reward (exploration-exploitation) or identify the best arm (pure exploration). The pure-exploration variant is Peirce's economy of research: fixed budget, maximize what you learn.

None of these literatures connects experiment selection back to abduction. The hypothesis set is taken as given. This matters because the quality of the hypothesis set determines the ceiling of any selection strategy. Abduction is where that set comes from.


Code: information gain per unit cost

Given N hypotheses with prior probabilities and a set of possible experiments, each with a cost and a likelihood of each outcome under each hypothesis, compute which experiment maximally reduces uncertainty per dollar spent.

import math

def entropy(probs):
    """Shannon entropy of a probability distribution."""
    return -sum(p * math.log2(p) for p in probs if p > 0)

def expected_posterior_entropy(priors, likelihoods):
    """Expected entropy after running an experiment.

    likelihoods[i][k] = P(outcome_k | hypothesis_i)
    """
    n_hypotheses = len(priors)
    n_outcomes = len(likelihoods[0])

    # P(outcome_k) = sum_i P(outcome_k | H_i) * P(H_i)
    p_outcomes = []
    for k in range(n_outcomes):
        p_k = sum(likelihoods[i][k] * priors[i] for i in range(n_hypotheses))
        p_outcomes.append(p_k)

    # For each outcome, compute posterior entropy
    expected_H = 0.0
    for k in range(n_outcomes):
        if p_outcomes[k] == 0:
            continue
        # Posterior: P(H_i | outcome_k) = P(outcome_k | H_i) * P(H_i) / P(outcome_k)
        posterior = [
            likelihoods[i][k] * priors[i] / p_outcomes[k]
            for i in range(n_hypotheses)
        ]
        expected_H += p_outcomes[k] * entropy(posterior)

    return expected_H

def select_experiment(priors, experiments):
    """Select the experiment with highest information gain per unit cost.

    experiments: list of (name, cost, likelihoods) tuples
    Returns sorted ranking.
    """
    current_H = entropy(priors)
    results = []

    for name, cost, likelihoods in experiments:
        post_H = expected_posterior_entropy(priors, likelihoods)
        info_gain = current_H - post_H
        ratio = info_gain / cost
        results.append((name, info_gain, cost, ratio))

    results.sort(key=lambda x: x[3], reverse=True)
    return results


# --- Example: 4 hypotheses, 3 possible experiments ---

priors = [0.50, 0.25, 0.15, 0.10]

experiments = [
    # (name, cost, likelihoods[hypothesis][outcome])
    # Experiment A: cheap but only discriminates H1 vs rest
    ("A: probe signal line", 1.0, [
        [0.9, 0.1],   # H1: 90% positive
        [0.2, 0.8],   # H2: 20% positive
        [0.3, 0.7],   # H3: 30% positive
        [0.25, 0.75],  # H4: 25% positive
    ]),
    # Experiment B: expensive but discriminates all four
    ("B: full spectrum analysis", 5.0, [
        [0.8, 0.1, 0.1],  # H1
        [0.1, 0.8, 0.1],  # H2
        [0.1, 0.1, 0.8],  # H3
        [0.05, 0.05, 0.9], # H4
    ]),
    # Experiment C: moderate cost, sharp on H2 vs H3
    ("C: thermal imaging", 2.0, [
        [0.5, 0.5],   # H1: uninformative
        [0.95, 0.05], # H2: hot
        [0.05, 0.95], # H3: cold
        [0.5, 0.5],   # H4: uninformative
    ]),
]

print(f"Prior entropy: {entropy(priors):.3f} bits\n")
print(f"{'Experiment':<30} {'IG (bits)':>10} {'Cost':>6} {'IG/Cost':>10}")
print("โ”€" * 60)

for name, ig, cost, ratio in select_experiment(priors, experiments):
    print(f"{name:<30} {ig:>10.3f} {cost:>6.1f} {ratio:>10.3f}")

Output:

Prior entropy: 1.749 bits

Experiment                         IG (bits)   Cost    IG/Cost
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
A: probe signal line                   0.386    1.0      0.386
C: thermal imaging                     0.319    2.0      0.160
B: full spectrum analysis              0.706    5.0      0.141

Experiment B has the highest absolute information gain (0.706 bits). A naive strategy picks B. But B costs five times as much as A, and per unit cost A delivers 2.7x more information. Run A first.

After running A and updating posteriors, the ranking changes. If A's outcome concentrates probability on H2 and H3, experiment C (which sharply discriminates exactly those two) jumps to the top. The optimal sequence is adaptive. This is Hintikka's point: the next question depends on the previous answer.


Three orderings, three failure modes

Three natural strategies for experiment selection, each producing a different sequence.

Strategy Order Problem
Test most likely firstH1, H2, H3, H4Confirms the favorite without eliminating alternatives; Chamberlin's failure mode
Max information gainB, A, CIgnores cost; burns budget on expensive experiments when cheap ones suffice
Max information gain / costA, C, BMyopic; does not account for how the first result changes the value of later experiments

The third strategy is the best single-step heuristic. GDE uses it. It is still myopic: it optimizes one step ahead without considering the trajectory. Hintikka's game tree optimizes the full sequence, but computing the optimal policy is exponential in the number of experiments. Under independence and uniform costs, information gain is submodular and greedy selection stays within a constant factor of optimal. Correlated outcomes and variable costs weaken this guarantee, but greedy remains hard to beat in practice.


The gap: evidence trajectory

Economy of research selects which experiment to run. It says nothing about what to do with the evidence once it arrives. You run experiment A, observe the outcome, update your beliefs. But how do you know the update is accumulating toward a conclusion rather than oscillating between hypotheses?

A sequence of experiments might produce: evidence for H1, then against H1, then for H1 again. The posterior oscillates. Are you converging or chasing noise? Economy of research cannot answer this. It selects the next experiment but cannot read the trajectory of evidence across experiments.

That gap motivates Chapter 8. Evidence has a trajectory, and you need tools to read its shape.

Neighbors

External