Bayesian Statistics

Thomas S. Ferguson

Thomas S. Ferguson defined the Dirichlet process in 1973, laying the mathematical foundation for Bayesian nonparametric statistics and enabling models whose complexity grows with the data.

G ~ DP(α, H) ⟹ G = Σₖ πₖ δ_{θₖ}, θₖ ~ H, π ~ GEM(α)

Thomas S. Ferguson is an American mathematical statistician at UCLA whose 1973 paper "A Bayesian Analysis of Some Nonparametric Problems" introduced the Dirichlet process as a prior distribution over probability measures. This single contribution created the field of Bayesian nonparametrics, providing a mathematically rigorous framework for Bayesian inference in problems where the underlying distribution is entirely unknown. The Dirichlet process and its extensions have become fundamental tools in modern statistics and machine learning.

Life and Career

1929

Born in the United States. Studies mathematics and statistics at the University of Washington and UC Berkeley.

1958

Earns his Ph.D. from the University of Washington, beginning a career in mathematical statistics.

1967

Publishes influential work on mathematical statistics and decision theory, establishing himself as a leading theoretical statistician.

1973

Publishes "A Bayesian Analysis of Some Nonparametric Problems" in the Annals of Statistics, defining the Dirichlet process and demonstrating its use as a nonparametric prior.

1983

Publishes "Bayesian Density Estimation by Mixtures of Normal Distributions," introducing the Dirichlet process mixture model that becomes the workhorse of Bayesian nonparametrics.

The Dirichlet Process

The Dirichlet process is a distribution over probability distributions. A draw G from a Dirichlet process DP(α, H) is itself a probability distribution, with the base measure H specifying the expected shape of G and the concentration parameter α controlling how closely G resembles H. Ferguson proved that the Dirichlet process has several remarkable properties that make it suitable as a Bayesian prior.

Dirichlet Process — Definition (Ferguson) G ~ DP(α, H) if for every finite measurable partition (A₁, ..., Aₖ):
(G(A₁), ..., G(Aₖ)) ~ Dirichlet(αH(A₁), ..., αH(Aₖ))

Stick-Breaking Construction (Sethuraman) G = Σₖ₌₁^∞ πₖ δ_{θₖ}
θₖ ~ H,   βₖ ~ Beta(1, α),   πₖ = βₖ ∏_{j=1}^{k-1} (1 − β_j)

Posterior G | x₁, ..., xₙ ~ DP(α + n, (αH + Σᵢ δ_{xᵢ})/(α + n))

Ferguson showed that the Dirichlet process has a conjugate posterior: given observations drawn from G, the posterior distribution on G is also a Dirichlet process, with the base measure updated as a weighted combination of the prior base measure and the empirical distribution of the data. This conjugacy makes the Dirichlet process analytically tractable despite being a distribution over an infinite-dimensional space.

Almost Surely Discrete

One of the most surprising properties of the Dirichlet process, proved by Ferguson, is that draws from it are almost surely discrete: the random distribution G assigns all its mass to a countable number of atoms, even when the base measure H is continuous. This discreteness, which might initially seem like a limitation, is what enables the Dirichlet process to serve as a prior for clustering models. The shared atoms naturally group observations into clusters, with the number of clusters growing logarithmically with the sample size.

Dirichlet Process Mixture Models

While the discreteness of the Dirichlet process makes it unsuitable as a direct prior for continuous distributions, Ferguson and subsequent researchers showed that this limitation is overcome by using G as a mixing distribution. In a Dirichlet process mixture model, each observation is generated by first drawing a parameter from G and then drawing the observation from a parametric family (such as a normal distribution) indexed by that parameter. The resulting marginal distribution is continuous and infinitely flexible.

Legacy

Ferguson's Dirichlet process is one of the most influential constructions in modern statistics. It spawned the entire field of Bayesian nonparametrics, leading to the hierarchical Dirichlet process, the Pitman-Yor process, Gaussian process models, and Indian Buffet processes. Every time a statistician fits a model where the number of clusters is learned from data, or uses a prior that allows infinite-dimensional function spaces, they are working within the framework that Ferguson created.

"The Dirichlet process provides a natural and elegant way to use probability distributions as unknowns in Bayesian inference, freeing the analyst from the constraint of finite-dimensional parametric models." — Thomas S. Ferguson

Related Topics