Bayesian Statistics

Cromwells Rule

Cromwell's Rule is the principle that a rational agent should never assign probability 0 or 1 to any empirical proposition, because doing so renders the agent permanently unable to update that belief regardless of any evidence encountered.

0 < P(H) < 1 for any empirical hypothesis H

Cromwell's Rule, named by the statistician Dennis Lindley after Oliver Cromwell's famous plea to the Church of Scotland, states that no empirical proposition should be assigned a probability of exactly 0 or exactly 1. The reason is mathematical and devastating: Bayes' theorem cannot rescue a prior of 0 or 1. If P(H) = 0, then P(H | E) = 0 for any evidence E, no matter how strongly E supports H. The agent is locked into her certainty forever, unable to learn from experience. The same holds for P(H) = 1: no evidence can reduce absolute certainty.

This principle follows directly from the mechanics of Bayesian updating but carries profound epistemological implications. It counsels intellectual humility: no matter how confident you are, you should retain a sliver of doubt. It also exposes a critical design requirement for Bayesian models — priors must have full support over the parameter space if the model is to remain responsive to data.

Why Extremes Cannot Update P(H | E) = P(E | H) · P(H) / P(E)

If P(H) = 0:   P(H | E) = P(E | H) · 0 / P(E) = 0   (for any E)
If P(H) = 1:   P(~H | E) = P(E | ~H) · 0 / P(E) = 0   ⇒  P(H | E) = 1

The Historical Quotation

The name comes from a letter Oliver Cromwell wrote to the General Assembly of the Church of Scotland on August 3, 1650, before the Battle of Dunbar:

"I beseech you, in the bowels of Christ, think it possible that you may be mistaken." — Oliver Cromwell, letter to the General Assembly of the Church of Scotland (1650)

Dennis Lindley invoked this quotation in his 1991 paper "Making Decisions" and his textbook work to crystallize the Bayesian point: dogmatic certainty is the enemy of rational learning. An agent who is certain — who assigns probability 0 or 1 — has placed herself beyond the reach of evidence. She has, in effect, decided that no possible observation could change her mind. For an empirical proposition, this is never warranted.

Mathematical Consequences

Cromwell's Rule has precise mathematical consequences in Bayesian inference. If a prior distribution assigns zero probability density to some region of the parameter space, then the posterior will also assign zero density to that region, regardless of the data. This means the model will never "discover" parameter values that the prior excluded.

Prior Support and Posterior Support supp(π(θ | x)) ⊆ supp(π(θ))

Translation The posterior can only assign positive probability to values
that already had positive prior probability.

This property — that the support of the posterior is contained in the support of the prior — is a direct consequence of Bayes' theorem. It means that the prior does not merely influence the posterior; it constrains it. A prior that excludes a region of parameter space makes that region permanently invisible to inference.

When Is Probability 0 or 1 Acceptable?

Cromwell's Rule applies to empirical propositions — claims about the world that could in principle be falsified by evidence. It does not apply to logical or mathematical truths: P(2 + 2 = 4) = 1 is perfectly fine, because no empirical evidence could refute it. Similarly, P(a married bachelor exists) = 0 is acceptable because the proposition is logically impossible. The rule targets precisely those beliefs that should remain responsive to evidence — and insists that they do.

Practical Implications

Prior Specification

In applied Bayesian statistics, Cromwell's Rule argues against priors with bounded support unless there are strong physical reasons for the bounds. If you are estimating a proportion and use a Beta(a, b) prior on [0, 1], you should generally avoid point masses at 0 or 1. More broadly, if there is any possibility that the true parameter lies in some region, the prior should assign positive density there.

Model Comparison

When comparing models using Bayes factors, Cromwell's Rule applies at the model level. If you assign prior probability 0 to a model, no amount of data supporting that model will give it positive posterior probability. This argues for assigning nonzero prior probability to all models under consideration — even ones you find unlikely.

Hypothesis Testing

In Bayesian hypothesis testing, a "sharp null hypothesis" H₀: θ = θ₀ is often given positive prior probability (a point mass). This does not violate Cromwell's Rule because the hypothesis remains updatable — the prior probability of H₀ is strictly between 0 and 1. What would violate the rule is assigning P(H₀) = 0 or P(H₀) = 1, thereby deciding the test before seeing data.

Connection to Other Principles

Cromwell's Rule is closely related to the concept of regularity in probability theory. A probability function is regular if it assigns positive probability to every logically possible proposition. Regularity is a stronger condition than Cromwell's Rule (which applies only to empirical propositions), but both express the same underlying intuition: rational agents should not be dogmatically closed to possibilities that experience might reveal.

The rule also connects to the merging-of-opinions theorem (Blackwell and Dubins, 1962). That theorem guarantees that two Bayesian agents with different priors will eventually agree — but only if their priors are mutually absolutely continuous, meaning neither assigns probability 0 to an event the other considers possible. Cromwell's Rule is essentially the prior condition that makes opinion merging possible.

Violations and Their Costs

History offers cautionary examples of Cromwell's Rule violations. Before the discovery of black swans in Australia, a European ornithologist who assigned P(black swan exists) = 0 would have been unable to update even upon seeing one — she would have been forced to doubt her senses rather than her prior. More prosaically, a clinical trial analyst who assigns zero prior probability to a drug having a negative effect will never detect harm, no matter how much adverse-event data accumulates.

In machine learning, Cromwell's Rule motivates techniques like Laplace smoothing in naive Bayes classifiers. Without smoothing, a single word never seen in training data for a particular class drives the entire posterior to zero for that class. Adding a small pseudocount — assigning nonzero prior probability to every word-class combination — is a direct application of Cromwell's Rule and dramatically improves robustness.

The rule is, ultimately, a counsel of epistemic humility. It reminds us that certainty is a luxury reserved for logic and mathematics. In the empirical world — the world of data, evidence, and surprise — wisdom lies in always leaving the door open, however slightly, to the possibility that we are wrong.

Example: A Spam Filter That Can Never Learn

A software engineer builds a naive Bayes spam classifier. During training, the word "cryptocurrency" appears in 50 spam emails but zero legitimate emails. The classifier estimates:

The Problem: Zero Probability P("cryptocurrency" | Legitimate) = 0 / 500 = 0.00

Now an important email arrives from a financial advisor discussing cryptocurrency regulations. The classifier must compute:

The Consequence P(Legitimate | "cryptocurrency", other words) ∝ P("cryptocurrency" | Legit) × ... = 0 × ... = 0

No matter what other evidence favors legitimacy, the posterior is zero.
The email is classified as spam with 100% certainty.

Cromwell's Rule Violated

By assigning P = 0 to a non-impossible event, the classifier became permanently incapable of learning that legitimate emails can contain "cryptocurrency." Even if 1,000 legitimate cryptocurrency emails arrive in the future, multiplying by zero always gives zero. The classifier is stuck.

The Fix: Laplace Smoothing

Applying Cromwell's Rule via Laplace Smoothing Instead of P = 0/500, use P = (0 + 1)/(500 + |Vocabulary|)
P("cryptocurrency" | Legit) ≈ 0.001 — small, but not zero

This tiny nonzero probability is enough to let the classifier update when contradictory evidence arrives. Cromwell's Rule has been satisfied — the door is cracked open, and learning can resume.

The General Principle

Cromwell's Rule says: never assign probability 0 or 1 to any empirical proposition, because doing so makes Bayesian updating impossible for that proposition forever. In Bayesian statistics, this motivates the use of proper priors that assign positive density everywhere in the parameter space. In everyday reasoning, it's a reminder that "I'm 100% certain" is almost always an overstatement — and a dangerous one, because it means no evidence could ever change your mind.

Interactive Calculator

Each row is a training example: the email category (spam or legit) and a word. Some words may appear only in spam. The calculator shows how a zero count creates a permanent blind spot — and how Laplace smoothing (adding pseudocounts) fixes it, satisfying Cromwell's Rule.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics