The Logic of Scientific Discovery
Karl Popper · 1934 (German), 1959 (English) · Routledge · Internet Archive
You cannot prove a theory, only disprove it. Falsifiability is the line between science and non-science. A theory that forbids nothing explains nothing.
The argument
Hume demonstrated that induction has no logical foundation. Observing a thousand white swans does not prove the next one will be white. The inference from "all observed" to "all" is a habit of mind, not a logical entailment. Most philosophers treated this as a problem to solve. Popper treated it as a fact to accept.
If induction cannot prove theories, what can science do? Popper's answer: science advances by conjecture and refutation. Scientists propose bold theories that stick their necks out, make specific predictions, forbid certain observations. Then they try to falsify those predictions. A theory that survives repeated attempts at falsification is "corroborated" but never confirmed. A theory that fails is discarded.
Falsifiability becomes the demarcation criterion. A statement is scientific if and only if it specifies what observations would refute it. "All swans are white" is scientific because a black swan refutes it. "Everything happens for a reason" is not scientific because no observation could refute it. The content of a theory is what it forbids. A theory that forbids nothing tells you nothing.
Popper sharpened this into a measure of theoretical virtue. Bolder theories forbid more, so they are more falsifiable, so they are more scientific. Newton's theory is better than a vague claim about gravity because Newton's predicts specific trajectories that could be wrong. Einstein's is better still because it predicts specific deviations from Newton that could also be wrong. Science progresses toward theories that risk more.
Discussion
Hume is Popper's explicit starting point. The problem of induction appears in the first chapter, and Popper returns to it throughout. Where other responses to Hume tried to rehabilitate induction (through pragmatism, probability, or habit), Popper simply conceded: Hume was right. Then he reframed what science does. Science does not prove. It eliminates.
Kuhn (1962) showed that scientists do not actually behave the way Popper prescribed. Normal science does not consist of bold conjectures and severe tests. It consists of puzzle-solving within an accepted paradigm. Anomalies accumulate until a crisis triggers a paradigm shift. Kuhn's account is descriptive where Popper's is normative. They answer different questions: Kuhn asks how science works, Popper asks how it should work. The tension between them has never been fully resolved.
Lakatos tried to split the difference. His "methodology of scientific research programmes" preserves Popper's emphasis on criticism but softens the demand for instant falsification. A research programme has a "hard core" of unfalsifiable assumptions surrounded by a "protective belt" of auxiliary hypotheses that can be modified. A programme is progressive if its modifications lead to new predictions; degenerative if they only explain away anomalies. This captures what Kuhn observed (scientists protect their core commitments) while retaining what Popper demanded (there must be a way to tell progressive science from stagnation).
Mayo gave falsification statistical teeth. Popper's framework is qualitative: either an observation falsifies a theory or it doesn't. Mayo's severe testing says the strength of evidence depends on the probability of the test detecting the falsity if the theory is false. This connects Popper to Fisher's p-values and to the Neyman-Pearson framework. A low p-value is evidence against the null precisely because the test was designed to produce a high p-value if the null were true. Severity is quantified falsification.
Chamberlin (1890) anticipated the spirit of Popper's method without the formalization. Multiple working hypotheses serve the same function as bold conjectures: they give the scientist something to try to kill. Chamberlin's contribution was psychological (resist attachment), while Popper's was logical (specify what would count as refutation).
Failure mode
Popper's own examples of unfalsifiable pseudoscience were Freudian psychoanalysis, Adlerian psychology, and Marxist historicism. In each case the theory could accommodate any observation. A patient's behavior confirmed the Freudian interpretation regardless of what the behavior was. Historical events confirmed the Marxist narrative regardless of what happened. The theories did not forbid anything, so they explained everything, so they explained nothing.
The more insidious failure is a theory that starts falsifiable and becomes unfalsifiable through ad hoc modifications. Ptolemaic astronomy added epicycles to save geocentrism. Each epicycle was a patch that absorbed an anomaly without improving predictions. Lakatos's framework handles this better than Popper's: the question is not whether the modification saves the theory but whether it leads somewhere new.
String theory is a contemporary example that generates debate along Popperian lines. Critics argue that the landscape of possible string vacua is so vast that the theory accommodates any observation, making it unfalsifiable in practice. Defenders argue that the theory makes structural predictions that could in principle be tested. The debate turns on whether "in principle" falsifiability counts or whether science requires "in practice" falsifiability. Popper's framework raises the question but does not settle it.
Integrity
The core demand is specifying in advance what would disprove your theory. Before collecting data, before running the experiment, write down what results would make you abandon the hypothesis. This is harder than it sounds. Most scientists can articulate what would confirm their theory. Fewer can articulate what would falsify it. The asymmetry reveals where their commitment lies.
Pre-registration of hypotheses and analysis plans in clinical trials and psychology is the institutional version of this demand. When researchers specify their predictions before seeing the data, they cannot unconsciously adjust their hypotheses to fit what they find. The replication crisis in psychology showed what happens when this discipline breaks down: flexible analysis produces impressive-looking results that evaporate on replication.
Popper was asking scientists to practice a specific kind of honesty. Not the honesty of reporting data accurately (that is necessary but insufficient) but the honesty of committing to the conditions under which they would change their minds. The willingness to be refuted is not a personality trait. It is a practice, and it requires structure.
Neighbors
- Chamberlin 1890 — multiple hypotheses as psychological preparation for falsification
- Fisher 1935 — the statistical tools that make falsification quantitative
- Hume 1739 — the problem of induction that Popper sought to escape through falsifiability
- 🔑 Logic Ch.6 — predicate logic and quantification: Popper's falsifiability criterion is about existential vs. universal statements
- Mayo 2018 — severity testing as a rigorous version of corroboration: Popper gave the idea, Mayo gave it statistics
External
David Hume (Wikipedia) — the problem of induction that Popper accepted
Thomas Kuhn (Wikipedia) — the descriptive account that challenged Popper's normative one
Imre Lakatos (Wikipedia) — research programmes as a softened Popperian framework
Deborah Mayo (Wikipedia) — severe testing as quantified falsification
Falsifiability (Wikipedia) — the demarcation criterion