Bayesian Statistics

Base Rate

The base rate is the prior prevalence of a condition or event in a population, and base rate neglect — the systematic failure to account for it — is one of the most consequential reasoning errors in medicine, law, and everyday judgment.

P(Disease | +Test) = P(+Test | Disease) · P(Disease) / P(+Test)

The base rate is the unconditional probability of a condition, event, or category in a given population. In a medical context, it is the prevalence of a disease; in a legal context, the prior probability of guilt; in a signal detection context, the proportion of signals versus noise. The base rate is, in Bayesian terms, the prior probability — and its proper incorporation is essential for correct inference. Yet decades of research in cognitive psychology have shown that humans systematically underweight or ignore base rates, a bias known as base rate neglect or the base rate fallacy.

The practical consequences of base rate neglect are severe. It leads physicians to overestimate the probability of disease after a positive screening test, jurors to overvalue forensic match evidence, security analysts to overestimate the probability that an alert indicates a genuine threat, and individuals to misjudge the personal relevance of population-level statistics.

Base Rate in Bayesian Inference P(H | E) = P(E | H) · P(H) / P(E)

Where P(H)     →  Base rate (prior prevalence of the condition)
P(E | H) →  Sensitivity (true positive rate)
P(E)     →  Overall positive rate (includes false positives)

The Classic Example: Medical Screening

Consider a screening test for a disease with 1% prevalence (base rate = 0.01). The test has 90% sensitivity (P(+Test | Disease) = 0.90) and 95% specificity (P(−Test | Healthy) = 0.95, so the false positive rate is 5%). A randomly selected person tests positive. What is the probability they actually have the disease?

Worked Example P(Disease) = 0.01     P(Healthy) = 0.99
P(+Test | Disease) = 0.90     P(+Test | Healthy) = 0.05

P(+Test) = 0.90 × 0.01 + 0.05 × 0.99 = 0.009 + 0.0495 = 0.0585

P(Disease | +Test) = 0.009 / 0.0585 ≈ 15.4%

Interpretation Despite 90% sensitivity and 95% specificity,
the positive predictive value is only about 15%.
The low base rate means most positive tests are false positives.

Most people — including many physicians — estimate the probability of disease at 80% or higher, anchoring on the test's 90% sensitivity and ignoring the 1% base rate. The correct answer of about 15% is startling because the large pool of healthy individuals (99% of the population) generates far more false positives than the small pool of diseased individuals generates true positives.

Base Rate Neglect: The Cognitive Bias

The systematic underweighting of base rates was documented by Daniel Kahneman and Amos Tversky in their foundational research on heuristics and biases in the 1970s. Their famous "taxi cab problem" and "Tom W." experiments demonstrated that people rely on representativeness — how well evidence matches a category — rather than properly combining evidence with prior probability.

1973

Kahneman and Tversky publish "On the Psychology of Prediction," demonstrating that people ignore base rates when given vivid, individuating information. The "engineer-lawyer" problem becomes a classic demonstration.

1974

"Judgment Under Uncertainty: Heuristics and Biases" appears in Science, systematically documenting base rate neglect alongside anchoring, availability, and other cognitive biases.

1978

Casscells, Schoenberger, and Graboys publish a study showing that only 18% of Harvard Medical School faculty and students correctly solve a base-rate problem about disease screening.

1995

Gerd Gigerenzer demonstrates that presenting information in natural frequencies rather than probabilities dramatically reduces base rate neglect. "1 out of 10 people who test positive has the disease" is easier to process than conditional probabilities.

Natural Frequencies: A Remedy for Base Rate Neglect

Gigerenzer showed that rephrasing Bayesian problems using natural frequencies nearly eliminates base rate neglect. Instead of "P(Disease) = 0.01, P(+Test | Disease) = 0.90," say: "Out of 1,000 people, 10 have the disease. Of those 10, 9 test positive. Of the 990 healthy people, 50 test positive. So of the 59 people who test positive, 9 actually have the disease — about 15%." This framing makes the base rate visible and intuitive. It is now recommended in medical communication guidelines and statistical literacy curricula.

Base Rates in Law

The legal system is especially vulnerable to base rate fallacies. Two notorious instances:

The prosecutor's fallacy. A forensic scientist testifies that a DNA profile match has a random match probability of 1 in 1,000,000. The prosecution argues this means there is a 1 in 1,000,000 chance the defendant is innocent. But in a city of 5 million people, roughly 5 would match by chance alone. Without accounting for the base rate — the prior probability that any given person committed the crime — the match probability is grossly misleading. Bayes' theorem provides the corrective framework.

The defense attorney's fallacy. Conversely, a defense attorney might argue that since 5 people in the city match, the probability of guilt is only 1 in 5 (20%). This ignores other evidence (motive, opportunity, witnesses) that should adjust the prior. Both fallacies arise from failing to integrate base rates with case-specific evidence — precisely the operation that Bayes' theorem performs.

Base Rates in Security and Intelligence

Post-9/11 screening systems face a fundamental base rate problem. If the base rate of terrorism among airline passengers is, say, 1 in 10,000,000, then even a highly accurate detection system will generate overwhelmingly more false positives than true positives. A system with 99.9% sensitivity and 99.9% specificity, applied to 800 million annual passengers in the US, would flag about 800,000 innocent travelers for every actual threat detected — a ratio that makes the system operationally useless without additional layers of intelligence.

The Bayesian Perspective

From a Bayesian standpoint, the base rate is simply the prior probability, and the "solution" to base rate neglect is simply to apply Bayes' theorem correctly. The theorem does not privilege the prior over the likelihood or vice versa — it gives each its proper weight. When the base rate is very low, the prior pulls the posterior strongly toward the null even in the face of seemingly strong evidence. When the base rate is high, even weak evidence can push the posterior to near certainty.

The lesson is general: the strength of evidence depends not only on the likelihood ratio (how much more probable the evidence is under one hypothesis than another) but also on the base rate. Evidence that is diagnostic in a high-prevalence setting may be nearly worthless in a low-prevalence setting. Bayes' theorem makes this dependence explicit and automatic.

"I beseech you, in the bowels of Christ, think it possible that you may be mistaken." — Oliver Cromwell, letter to the Church of Scotland (1650), quoted by Dennis Lindley in formulating Cromwell's Rule

The base rate is the humility term in Bayesian reasoning. It reminds the analyst that no matter how striking the evidence appears, its interpretation depends on what was already known — or believed — before the evidence arrived. Ignoring the base rate is not merely a statistical error; it is, as the Bayesian framework reveals, a failure to reason coherently under uncertainty.

Example: Airport Security and the Rare Threat

An airport security scanner is designed to detect concealed weapons. It has impressive specs: 99.5% detection rate (sensitivity) and 98% accuracy on safe passengers (specificity). On any given day, roughly 1 in 100,000 passengers carries a weapon. A passenger triggers the alarm. How worried should security be?

The Base Rate Changes Everything

Given P(Weapon) = 1/100,000 = 0.00001   — the base rate
P(Alarm | Weapon) = 0.995
P(Alarm | No Weapon) = 0.02

Applying Bayes' Theorem P(Weapon | Alarm) = (0.995 × 0.00001) / (0.995 × 0.00001 + 0.02 × 0.99999)
                 = 0.00000995 / 0.02000985
                 ≈ 0.050%

Despite the scanner's excellent performance, an alarm means only a 1-in-2,000 chance the passenger is actually armed. Why? Because the base rate is so low that the 2% false alarm rate — applied to 99,999 innocent passengers — generates roughly 2,000 false alarms for every true detection.

The Base Rate Fallacy in Action

If security officers ignore the base rate and focus only on the scanner's 99.5% accuracy, they might believe nearly every alarm is genuine. This is the base rate fallacy — and it has real consequences in medicine (rare disease screening), law (DNA evidence in cold cases), and cybersecurity (intrusion detection systems). In every domain, the base rate determines whether impressive-sounding test accuracy translates into useful predictions or mountains of false positives.

Interactive Calculator

Each row is a security screening: the passenger's true status (threat or safe) and the scanner alarm (yes/no). With a very low base rate, even excellent scanners produce mostly false alarms. Watch how the positive predictive value depends critically on the base rate.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics