🔬 The Scientific Method
Four centuries of arguing about what counts as knowledge. Each work below changed how we do science, often by showing that the previous way was broken. The vocabulary we inherited is still load-bearing.
The arc
Bacon says: observe, don't speculate. Descartes says: doubt everything, rebuild from certainty. Hume says: your observations don't prove anything. Mill says: here's how to extract causes anyway. Boole says: logic is algebra, make it mechanical. Chamberlin says: hold multiple hypotheses or you'll fool yourself. Popper says: you can't prove a theory, only disprove it. Kuhn says: scientists don't actually work that way. Feynman says: the first principle is that you must not fool yourself. Ioannidis says: you're fooling yourself. Mayo says: here's how to stop. Gwern says: any method can be gamed, so publish the trail. He calls it long content.
Each thinker responds to a failure of the previous method. The failures are the interesting part.
Works
| Work | What it changed | |
|---|---|---|
| Bacon 1620 | Replaced Aristotelian deduction with systematic observation. Gave us induction. | 🔬 |
| Descartes 1637 | Doubt everything, rebuild from certainty. Decompose, solve from simplest to complex. | 🔬 |
| Hume 1739 | No amount of observation proves the next instance. Killed the certainty Descartes promised. | 🔬 |
| Mill 1843 | Five methods for extracting causes. Vary one thing, hold the rest constant. | 🔬 |
| Boole 1854 | Reduced logic to algebra. Made reasoning computable. Every if/then/and/or descends from this. | 🔬 |
| Peirce 1878 | Added abduction: inference to the best explanation. The meaning of a concept is its practical consequences. | 🔬 |
| Chamberlin 1890 | Hold multiple hypotheses or your favorite will blind you. Five pages that still fix confirmation bias. | 🔬 |
| Fisher 1935 | Randomization, significance tests, ANOVA. Built the machinery that modern science runs on. | 🔬 |
| Popper 1934 | You can't prove a theory, only disprove it. Falsifiability as the line between science and everything else. | 🔬 |
| Kuhn 1962 | Science doesn't accumulate. It breaks and rebuilds. Paradigm shifts happen when anomalies pile up. | 🔬 |
| Platt 1964 | Strong inference: alternative hypotheses, crucial experiments, repeat. Why some fields move faster. | 🔬 |
| Feynman 1974 | Cargo cult science. The first principle is that you must not fool yourself, and you are the easiest person to fool. | 🔬 |
| Ioannidis 2005 | Why most published research findings are false. The replication crisis, diagnosed before it had a name. | 🔬 |
| Mayo 2018 | Severe testing: a test only confirms if it had a real chance of catching you being wrong. | 🔬 |
| Gwern 2010→ | Long content: any method can be Goodharted, so publish the full trail — data, nulls, revisions — and let anyone audit it. | 🔬 |
| Gwern 2010→ | The self-experiment: Fisher's randomization at n=1. Self-blinding, power analysis, and published nulls — without the institution. | 🔬 |
| Gwern 2010→ | Registered prediction: assign a probability, set a deadline, publish it, score it. Falsification operationalized. | 🔬 |
Vocabulary we inherited
Every word in this table was coined or formalized by someone on the list above. We use them without thinking about the argument that produced them.
| Term | Source | What it settled |
|---|---|---|
| Induction | Bacon | Generalize from observations, don't deduce from axioms |
| Method of doubt | Descartes | Accept nothing without evidence |
| Problem of induction | Hume | Observation never proves necessity |
| Method of difference | Mill | Vary one thing, hold the rest constant |
| Boolean logic | Boole | Reasoning is algebra |
| Abduction | Peirce | Inference to the best explanation |
| Null hypothesis, p-value | Fisher | Quantify how surprised you should be |
| Type I/II error, power | Neyman-Pearson | Hypothesis testing as decision procedure |
| Falsifiability | Popper | The line between science and non-science |
| Paradigm shift | Kuhn | Science breaks and rebuilds, it doesn't just accumulate |
| Strong inference | Platt | Design experiments that exclude alternatives |
| Severe testing | Mayo | A test confirms only if it could have caught you being wrong |
| Long content | Gwern | Publish the full trail — data, nulls, revisions — so the method can't be Goodharted |
| Registered prediction | Gwern | Assign a probability, set a deadline, score it. Falsification with a timestamp. |
Failures that drove the next step
The collection reads differently when you see what broke.
| Failure | What broke | What it produced |
|---|---|---|
| Aristotelian physics | Deduction from first principles without observation | Bacon's inductive method |
| Phlogiston theory | Fit the data until Lavoisier weighed the air | Quantitative experiment over plausible narrative |
| Luminiferous aether | Michelson-Morley found nothing. The most productive null result in history. | Einstein's relativity; Kuhn's paradigm shifts |
| Eugenics | Fisher's statistics weaponized by ideology. Sound method, monstrous application. | Research ethics, informed consent, institutional review |
| Lysenko affair | Political authority overriding evidence. Soviet genetics destroyed for decades. | Scientific autonomy as a value worth defending |
| Cargo cult science | Form without substance. Rituals of method without the honesty. | Feynman's integrity principle |
| Replication crisis | P-hacking, publication bias, garden of forking paths | Pre-registration, open data, Mayo's severe testing |
| Goodharting the method | Pre-registration gamed, severity selectively reported. The form of rigor without the trail. | Long content, registered prediction |
The protocol
Each step exists because someone on the timeline proved it was necessary.
| Step | What you do | Who established it |
|---|---|---|
| Pre-register | Write down your hypothesis, your analysis plan, and what would disprove you. Before you see the data. | Popper, Mayo |
| Red-team to convergence | Generate alternative hypotheses. Design experiments that discriminate between them. Keep generating until you can't anymore. | Chamberlin, Platt |
| Work log frequently | Record observations as they happen. Version your drafts. Tag your confidence. The log is the evidence that you didn't backfill. | Bacon, Gwern |
| Publish all | The positives, the nulls, the embarrassing stuff, the code. A curated trail can be Goodharted. A complete one can't. | Feynman, Gwern |
Each step produces a public artifact. That trail is what counts as knowledge. Following this protocol outside the institution is discipline. Following it inside is career risk. Blame the game, not the player.
The integrity essay makes the full case.
📺 Video lectures: UPenn: Philosophy of Science (Coursera)