← back to reading

🔬 The Scientific Method

Four centuries of arguing about what counts as knowledge. Each work below changed how we do science, often by showing that the previous way was broken. The vocabulary we inherited is still load-bearing.

1620 Bacon 1637 Descartes 1739 Hume 1843 Mill 1854 Boole 1890 Chamberlin 1934 Popper 1962 Kuhn 1974 Feynman 2005 Ioannidis 2018 Mayo 2010→ Gwern each one responding to a failure of what came before

The arc

Bacon says: observe, don't speculate. Descartes says: doubt everything, rebuild from certainty. Hume says: your observations don't prove anything. Mill says: here's how to extract causes anyway. Boole says: logic is algebra, make it mechanical. Chamberlin says: hold multiple hypotheses or you'll fool yourself. Popper says: you can't prove a theory, only disprove it. Kuhn says: scientists don't actually work that way. Feynman says: the first principle is that you must not fool yourself. Ioannidis says: you're fooling yourself. Mayo says: here's how to stop. Gwern says: any method can be gamed, so publish the trail. He calls it long content.

Each thinker responds to a failure of the previous method. The failures are the interesting part.


Works

Work What it changed
Bacon 1620Replaced Aristotelian deduction with systematic observation. Gave us induction.🔬
Descartes 1637Doubt everything, rebuild from certainty. Decompose, solve from simplest to complex.🔬
Hume 1739No amount of observation proves the next instance. Killed the certainty Descartes promised.🔬
Mill 1843Five methods for extracting causes. Vary one thing, hold the rest constant.🔬
Boole 1854Reduced logic to algebra. Made reasoning computable. Every if/then/and/or descends from this.🔬
Peirce 1878Added abduction: inference to the best explanation. The meaning of a concept is its practical consequences.🔬
Chamberlin 1890Hold multiple hypotheses or your favorite will blind you. Five pages that still fix confirmation bias.🔬
Fisher 1935Randomization, significance tests, ANOVA. Built the machinery that modern science runs on.🔬
Popper 1934You can't prove a theory, only disprove it. Falsifiability as the line between science and everything else.🔬
Kuhn 1962Science doesn't accumulate. It breaks and rebuilds. Paradigm shifts happen when anomalies pile up.🔬
Platt 1964Strong inference: alternative hypotheses, crucial experiments, repeat. Why some fields move faster.🔬
Feynman 1974Cargo cult science. The first principle is that you must not fool yourself, and you are the easiest person to fool.🔬
Ioannidis 2005Why most published research findings are false. The replication crisis, diagnosed before it had a name.🔬
Mayo 2018Severe testing: a test only confirms if it had a real chance of catching you being wrong.🔬
Gwern 2010→Long content: any method can be Goodharted, so publish the full trail — data, nulls, revisions — and let anyone audit it.🔬
Gwern 2010→The self-experiment: Fisher's randomization at n=1. Self-blinding, power analysis, and published nulls — without the institution.🔬
Gwern 2010→Registered prediction: assign a probability, set a deadline, publish it, score it. Falsification operationalized.🔬

Vocabulary we inherited

Every word in this table was coined or formalized by someone on the list above. We use them without thinking about the argument that produced them.

Term Source What it settled
InductionBaconGeneralize from observations, don't deduce from axioms
Method of doubtDescartesAccept nothing without evidence
Problem of inductionHumeObservation never proves necessity
Method of differenceMillVary one thing, hold the rest constant
Boolean logicBooleReasoning is algebra
AbductionPeirceInference to the best explanation
Null hypothesis, p-valueFisherQuantify how surprised you should be
Type I/II error, powerNeyman-PearsonHypothesis testing as decision procedure
FalsifiabilityPopperThe line between science and non-science
Paradigm shiftKuhnScience breaks and rebuilds, it doesn't just accumulate
Strong inferencePlattDesign experiments that exclude alternatives
Severe testingMayoA test confirms only if it could have caught you being wrong
Long contentGwernPublish the full trail — data, nulls, revisions — so the method can't be Goodharted
Registered predictionGwernAssign a probability, set a deadline, score it. Falsification with a timestamp.

Failures that drove the next step

The collection reads differently when you see what broke.

Failure What broke What it produced
Aristotelian physicsDeduction from first principles without observationBacon's inductive method
Phlogiston theoryFit the data until Lavoisier weighed the airQuantitative experiment over plausible narrative
Luminiferous aetherMichelson-Morley found nothing. The most productive null result in history.Einstein's relativity; Kuhn's paradigm shifts
EugenicsFisher's statistics weaponized by ideology. Sound method, monstrous application.Research ethics, informed consent, institutional review
Lysenko affairPolitical authority overriding evidence. Soviet genetics destroyed for decades.Scientific autonomy as a value worth defending
Cargo cult scienceForm without substance. Rituals of method without the honesty.Feynman's integrity principle
Replication crisisP-hacking, publication bias, garden of forking pathsPre-registration, open data, Mayo's severe testing
Goodharting the methodPre-registration gamed, severity selectively reported. The form of rigor without the trail.Long content, registered prediction

The protocol

Each step exists because someone on the timeline proved it was necessary.

Step What you do Who established it
Pre-registerWrite down your hypothesis, your analysis plan, and what would disprove you. Before you see the data.Popper, Mayo
Red-team to convergenceGenerate alternative hypotheses. Design experiments that discriminate between them. Keep generating until you can't anymore.Chamberlin, Platt
Work log frequentlyRecord observations as they happen. Version your drafts. Tag your confidence. The log is the evidence that you didn't backfill.Bacon, Gwern
Publish allThe positives, the nulls, the embarrassing stuff, the code. A curated trail can be Goodharted. A complete one can't.Feynman, Gwern
pre-register Popper, Mayo red-team Chamberlin, Platt work log Bacon, Gwern publish all Feynman, Gwern registered prediction alternatives list timestamped log complete dataset the trail

Each step produces a public artifact. That trail is what counts as knowledge. Following this protocol outside the institution is discipline. Following it inside is career risk. Blame the game, not the player.

The integrity essay makes the full case.

📺 Video lectures: UPenn: Philosophy of Science (Coursera)

Neighbors