🔬 The Scientific Method

Four centuries of arguing about what counts as knowledge. Each work below changed how we do science, often by showing that the previous way was broken. The vocabulary we inherited is still load-bearing.

The arc

Bacon says: observe, don't speculate. Descartes says: doubt everything, rebuild from certainty. Hume says: your observations don't prove anything. Mill says: here's how to extract causes anyway. Boole says: logic is algebra, make it mechanical. Chamberlin says: hold multiple hypotheses or you'll fool yourself. Popper says: you can't prove a theory, only disprove it. Kuhn says: scientists don't actually work that way. Feynman says: the first principle is that you must not fool yourself. Ioannidis says: you're fooling yourself. Mayo says: here's how to stop. Gwern says: any method can be gamed, so publish the trail. He calls it long content.

Each thinker responds to a failure of the previous method. The failures are the interesting part.

Works

Work	What it changed
Bacon 1620	Replaced Aristotelian deduction with systematic observation. Gave us induction.	🔬
Descartes 1637	Doubt everything, rebuild from certainty. Decompose, solve from simplest to complex.	🔬
Hume 1739	No amount of observation proves the next instance. Killed the certainty Descartes promised.	🔬
Mill 1843	Five methods for extracting causes. Vary one thing, hold the rest constant.	🔬
Boole 1854	Reduced logic to algebra. Made reasoning computable. Every `if/then/and/or` descends from this.	🔬
Peirce 1878	Added abduction: inference to the best explanation. The meaning of a concept is its practical consequences.	🔬
Chamberlin 1890	Hold multiple hypotheses or your favorite will blind you. Five pages that still fix confirmation bias.	🔬
Fisher 1935	Randomization, significance tests, ANOVA. Built the machinery that modern science runs on.	🔬
Popper 1934	You can't prove a theory, only disprove it. Falsifiability as the line between science and everything else.	🔬
Kuhn 1962	Science doesn't accumulate. It breaks and rebuilds. Paradigm shifts happen when anomalies pile up.	🔬
Platt 1964	Strong inference: alternative hypotheses, crucial experiments, repeat. Why some fields move faster.	🔬
Feynman 1974	Cargo cult science. The first principle is that you must not fool yourself, and you are the easiest person to fool.	🔬
Ioannidis 2005	Why most published research findings are false. The replication crisis, diagnosed before it had a name.	🔬
Mayo 2018	Severe testing: a test only confirms if it had a real chance of catching you being wrong.	🔬
Gwern 2010→	Long content: any method can be Goodharted, so publish the full trail — data, nulls, revisions — and let anyone audit it.	🔬
Gwern 2010→	The self-experiment: Fisher's randomization at n=1. Self-blinding, power analysis, and published nulls — without the institution.	🔬
Gwern 2010→	Registered prediction: assign a probability, set a deadline, publish it, score it. Falsification operationalized.	🔬

Vocabulary we inherited

Every word in this table was coined or formalized by someone on the list above. We use them without thinking about the argument that produced them.

Term	Source	What it settled
Induction	Bacon	Generalize from observations, don't deduce from axioms
Method of doubt	Descartes	Accept nothing without evidence
Problem of induction	Hume	Observation never proves necessity
Method of difference	Mill	Vary one thing, hold the rest constant
Boolean logic	Boole	Reasoning is algebra
Abduction	Peirce	Inference to the best explanation
Null hypothesis, p-value	Fisher	Quantify how surprised you should be
Type I/II error, power	Neyman-Pearson	Hypothesis testing as decision procedure
Falsifiability	Popper	The line between science and non-science
Paradigm shift	Kuhn	Science breaks and rebuilds, it doesn't just accumulate
Strong inference	Platt	Design experiments that exclude alternatives
Severe testing	Mayo	A test confirms only if it could have caught you being wrong
Long content	Gwern	Publish the full trail — data, nulls, revisions — so the method can't be Goodharted
Registered prediction	Gwern	Assign a probability, set a deadline, score it. Falsification with a timestamp.

Failures that drove the next step

The collection reads differently when you see what broke.

Failure	What broke	What it produced
Aristotelian physics	Deduction from first principles without observation	Bacon's inductive method
Phlogiston theory	Fit the data until Lavoisier weighed the air	Quantitative experiment over plausible narrative
Luminiferous aether	Michelson-Morley found nothing. The most productive null result in history.	Einstein's relativity; Kuhn's paradigm shifts
Eugenics	Fisher's statistics weaponized by ideology. Sound method, monstrous application.	Research ethics, informed consent, institutional review
Lysenko affair	Political authority overriding evidence. Soviet genetics destroyed for decades.	Scientific autonomy as a value worth defending
Cargo cult science	Form without substance. Rituals of method without the honesty.	Feynman's integrity principle
Replication crisis	P-hacking, publication bias, garden of forking paths	Pre-registration, open data, Mayo's severe testing
Goodharting the method	Pre-registration gamed, severity selectively reported. The form of rigor without the trail.	Long content, registered prediction

The protocol

Each step exists because someone on the timeline proved it was necessary.

Step	What you do	Who established it
Pre-register	Write down your hypothesis, your analysis plan, and what would disprove you. Before you see the data.	Popper, Mayo
Red-team to convergence	Generate alternative hypotheses. Design experiments that discriminate between them. Keep generating until you can't anymore.	Chamberlin, Platt
Work log frequently	Record observations as they happen. Version your drafts. Tag your confidence. The log is the evidence that you didn't backfill.	Bacon, Gwern
Publish all	The positives, the nulls, the embarrassing stuff, the code. A curated trail can be Goodharted. A complete one can't.	Feynman, Gwern

Each step produces a public artifact. That trail is what counts as knowledge. Following this protocol outside the institution is discipline. Following it inside is career risk. Blame the game, not the player.

The integrity essay makes the full case.

📺 Video lectures: UPenn: Philosophy of Science (Coursera)

Neighbors