Bayesian Statistics

Nested Sampling Algorithm

Nested Sampling, invented by John Skilling in 2004, computes the marginal likelihood (evidence) by transforming a challenging multi-dimensional integral into a one-dimensional sum over nested likelihood contours, producing posterior samples as a by-product.

Z = ∫ L(θ) π(θ) dθ ≈ ∑ᵢ Lᵢ · wᵢ, where wᵢ = ½(Xᵢ₋₁ − Xᵢ₊₁)

Computing the marginal likelihood Z = ∫ L(θ) π(θ) dθ is fundamental to Bayesian model comparison but notoriously difficult — the integral spans the entire parameter space and concentrates in a tiny posterior region. Nested Sampling elegantly transforms this integral by introducing the prior volume X(λ) = ∫_{L(θ)>λ} π(θ) dθ, the fraction of the prior with likelihood exceeding a threshold λ. As λ increases from 0 to L_max, X decreases monotonically from 1 to 0, and the evidence becomes a one-dimensional integral Z = ∫₀¹ L(X) dX.

Algorithm Overview

Nested Sampling Procedure 1. Draw N "live points" θ₁, …, θ_N from the prior π(θ)
2. Record the lowest-likelihood point: L* = min{L(θᵢ)}
3. Replace that point with a new draw from π(θ) constrained to L(θ) > L*
4. Repeat, accumulating discarded points and shrinking the prior volume
5. Estimate X_i ≈ exp(−i/N) and Z ≈ ∑ᵢ Lᵢ · wᵢ

At each iteration, the prior volume shrinks by a factor whose expectation is N/(N+1), so X_i ≈ exp(−i/N) on average. The discarded points trace out the likelihood function L(X), and the evidence is approximated by numerical quadrature over this one-dimensional function. The simplicity of this scheme belies its power: the algorithm systematically "peels away" shells of likelihood, climbing from the prior toward the posterior.

The Constrained Prior Sampling Challenge

The practical difficulty lies in step 3: drawing uniformly from the prior subject to L(θ) > L*. As the algorithm progresses, this constraint becomes increasingly tight, defining smaller and potentially complex regions. Various strategies address this:

Ellipsoidal decomposition (MultiNest, Feroz et al., 2009) encloses the live points in a set of overlapping ellipsoids and samples uniformly within them. Slice sampling (PolyChord, Handley et al., 2015) uses slice sampling within the constrained region, scaling better to high dimensions. Diffusive nested sampling (Brewer et al., 2011) embeds nested sampling within an MCMC framework.

Evidence Estimation Accuracy

Nested sampling provides a direct estimate of log Z with a calculable uncertainty: the standard deviation on log Z is approximately √(H/N), where H is the posterior information (the KL divergence from prior to posterior) and N is the number of live points. This allows the user to set N to achieve a desired precision. Few other algorithms provide such principled error estimates for the marginal likelihood.

Posterior Samples as a By-product

Although designed for evidence computation, nested sampling also yields posterior samples. Each discarded point θᵢ receives a weight proportional to Lᵢ · wᵢ / Z. These weighted samples can be resampled to produce an unweighted posterior sample. This dual output — evidence and posterior — makes nested sampling exceptionally efficient for Bayesian workflows that require both model comparison and parameter estimation.

Historical Development

2004

John Skilling presented nested sampling at the MaxEnt conference and in a seminal Bayesian Analysis paper, introducing the prior-volume transformation.

2009

Feroz, Hobson, and Bridges released MultiNest, using multi-ellipsoidal decomposition, which became the standard tool in cosmology and astrophysics.

2015

Handley, Hobson, and Lasenby introduced PolyChord, using slice sampling to handle higher-dimensional and more complex likelihood surfaces.

2019–present

Dynamic nested sampling (Higson et al.) adaptively allocates live points to regions of greatest uncertainty, and packages like dynesty, UltraNest, and jaxns have broadened accessibility.

Applications

Nested sampling has had its greatest impact in astrophysics and cosmology, where model comparison is central (e.g., testing cosmological models, exoplanet detection). It is also widely used in gravitational-wave data analysis (the LALInference and Bilby packages), particle physics, and systems biology. Its ability to handle multimodal posteriors — a common feature of astrophysical likelihoods — gives it an edge over standard MCMC in these settings.

"Nested sampling converts the hard problem of multi-dimensional integration into a sequence of easy problems of sorting by likelihood, and that is its genius."— John Skilling, 2006

Worked Example: Evidence Estimation from Likelihood Shells

We estimate the Bayesian evidence Z from 10 nested sampling iterations, where each iteration records the likelihood at the current worst point and the remaining prior volume.

Given 10 likelihood shells sorted by decreasing prior volume:
(X=1.00, logL=−10.0), (X=0.80, logL=−7.2), (X=0.60, logL=−5.2),
(X=0.40, logL=−3.9), (X=0.20, logL=−3.0), (X=0.10, logL=−2.7),
(X=0.05, logL=−2.5), (X=0.02, logL=−2.4), (X=0.01, logL=−2.35),
(X=0.001, logL=−2.32)

Step 1: Trapezoid Integration Z ≈ Σ Lᵢ × (X_{i−1} − X_{i+1})/2
Shell 1: L=e⁻¹⁰ × (1.00 − 0.60)/2 = 4.54×10⁻⁶ × 0.20 = 9.1×10⁻⁷
Shell 5: L=e⁻³·⁰ × (0.40 − 0.10)/2 = 0.0498 × 0.15 = 7.47×10⁻³
Shell 10: L=e⁻²·³² × (0.01 − 0)/2 = 0.0985 × 0.005 = 4.93×10⁻⁴

Step 2: Evidence Z ≈ Σ all contributions ≈ 0.0201
log Z ≈ −3.91

The evidence Z ≈ 0.020 represents the average likelihood over the prior — effectively measuring how well the prior-predictive distribution matches the data. Most of the evidence comes from the middle shells (prior volume 0.1–0.4) where the likelihood is moderate and the volume is non-negligible. This illustrates nested sampling's key insight: converting a difficult multi-dimensional integral into a tractable 1D sum over prior volume shells.

Interactive Calculator

Each row has a log_likelihood and prior_volume (between 0 and 1). The calculator estimates the Bayesian evidence (marginal likelihood) Z by integrating the likelihood over the prior, using the nested sampling trapezoid rule. It also computes the information gain H and effective number of parameters.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics