Computing the marginal likelihood Z = ∫ L(θ) π(θ) dθ is fundamental to Bayesian model comparison but notoriously difficult — the integral spans the entire parameter space and concentrates in a tiny posterior region. Nested Sampling elegantly transforms this integral by introducing the prior volume X(λ) = ∫_{L(θ)>λ} π(θ) dθ, the fraction of the prior with likelihood exceeding a threshold λ. As λ increases from 0 to L_max, X decreases monotonically from 1 to 0, and the evidence becomes a one-dimensional integral Z = ∫₀¹ L(X) dX.
Algorithm Overview
2. Record the lowest-likelihood point: L* = min{L(θᵢ)}
3. Replace that point with a new draw from π(θ) constrained to L(θ) > L*
4. Repeat, accumulating discarded points and shrinking the prior volume
5. Estimate X_i ≈ exp(−i/N) and Z ≈ ∑ᵢ Lᵢ · wᵢ
At each iteration, the prior volume shrinks by a factor whose expectation is N/(N+1), so X_i ≈ exp(−i/N) on average. The discarded points trace out the likelihood function L(X), and the evidence is approximated by numerical quadrature over this one-dimensional function. The simplicity of this scheme belies its power: the algorithm systematically "peels away" shells of likelihood, climbing from the prior toward the posterior.
The Constrained Prior Sampling Challenge
The practical difficulty lies in step 3: drawing uniformly from the prior subject to L(θ) > L*. As the algorithm progresses, this constraint becomes increasingly tight, defining smaller and potentially complex regions. Various strategies address this:
Ellipsoidal decomposition (MultiNest, Feroz et al., 2009) encloses the live points in a set of overlapping ellipsoids and samples uniformly within them. Slice sampling (PolyChord, Handley et al., 2015) uses slice sampling within the constrained region, scaling better to high dimensions. Diffusive nested sampling (Brewer et al., 2011) embeds nested sampling within an MCMC framework.
Nested sampling provides a direct estimate of log Z with a calculable uncertainty: the standard deviation on log Z is approximately √(H/N), where H is the posterior information (the KL divergence from prior to posterior) and N is the number of live points. This allows the user to set N to achieve a desired precision. Few other algorithms provide such principled error estimates for the marginal likelihood.
Posterior Samples as a By-product
Although designed for evidence computation, nested sampling also yields posterior samples. Each discarded point θᵢ receives a weight proportional to Lᵢ · wᵢ / Z. These weighted samples can be resampled to produce an unweighted posterior sample. This dual output — evidence and posterior — makes nested sampling exceptionally efficient for Bayesian workflows that require both model comparison and parameter estimation.
Historical Development
John Skilling presented nested sampling at the MaxEnt conference and in a seminal Bayesian Analysis paper, introducing the prior-volume transformation.
Feroz, Hobson, and Bridges released MultiNest, using multi-ellipsoidal decomposition, which became the standard tool in cosmology and astrophysics.
Handley, Hobson, and Lasenby introduced PolyChord, using slice sampling to handle higher-dimensional and more complex likelihood surfaces.
Dynamic nested sampling (Higson et al.) adaptively allocates live points to regions of greatest uncertainty, and packages like dynesty, UltraNest, and jaxns have broadened accessibility.
Applications
Nested sampling has had its greatest impact in astrophysics and cosmology, where model comparison is central (e.g., testing cosmological models, exoplanet detection). It is also widely used in gravitational-wave data analysis (the LALInference and Bilby packages), particle physics, and systems biology. Its ability to handle multimodal posteriors — a common feature of astrophysical likelihoods — gives it an edge over standard MCMC in these settings.
"Nested sampling converts the hard problem of multi-dimensional integration into a sequence of easy problems of sorting by likelihood, and that is its genius."— John Skilling, 2006
Worked Example: Evidence Estimation from Likelihood Shells
We estimate the Bayesian evidence Z from 10 nested sampling iterations, where each iteration records the likelihood at the current worst point and the remaining prior volume.
(X=1.00, logL=−10.0), (X=0.80, logL=−7.2), (X=0.60, logL=−5.2),
(X=0.40, logL=−3.9), (X=0.20, logL=−3.0), (X=0.10, logL=−2.7),
(X=0.05, logL=−2.5), (X=0.02, logL=−2.4), (X=0.01, logL=−2.35),
(X=0.001, logL=−2.32)
Step 1: Trapezoid Integration Z ≈ Σ Lᵢ × (X_{i−1} − X_{i+1})/2
Shell 1: L=e⁻¹⁰ × (1.00 − 0.60)/2 = 4.54×10⁻⁶ × 0.20 = 9.1×10⁻⁷
Shell 5: L=e⁻³·⁰ × (0.40 − 0.10)/2 = 0.0498 × 0.15 = 7.47×10⁻³
Shell 10: L=e⁻²·³² × (0.01 − 0)/2 = 0.0985 × 0.005 = 4.93×10⁻⁴
Step 2: Evidence Z ≈ Σ all contributions ≈ 0.0201
log Z ≈ −3.91
The evidence Z ≈ 0.020 represents the average likelihood over the prior — effectively measuring how well the prior-predictive distribution matches the data. Most of the evidence comes from the middle shells (prior volume 0.1–0.4) where the likelihood is moderate and the volume is non-negligible. This illustrates nested sampling's key insight: converting a difficult multi-dimensional integral into a tractable 1D sum over prior volume shells.