Bayesian Statistics

Spike-And-Slab Regression

Spike-and-slab regression is a Bayesian variable selection method that places a mixture prior on each regression coefficient — a point mass ('spike') at zero representing exclusion and a diffuse distribution ('slab') representing inclusion — enabling simultaneous estimation and selection with full posterior uncertainty quantification.

βⱼ | γⱼ ~ γⱼ · Slab(βⱼ) + (1 − γⱼ) · δ₀(βⱼ), where γⱼ ~ Bernoulli(p)

The spike-and-slab prior is the canonical Bayesian approach to variable selection in regression. For each coefficient βj, the prior is a two-component mixture: with probability 1 − p, the coefficient is exactly zero (the "spike," representing the variable's exclusion from the model), and with probability p, the coefficient is drawn from a diffuse distribution (the "slab," representing inclusion). This formulation treats variable selection as a problem of posterior inference over the space of all possible models — 2p models for p predictors — with automatic Occam's razor through the prior on the inclusion indicators.

Spike-and-Slab Prior βⱼ | γⱼ  ~  γⱼ · N(0, τ²) + (1 − γⱼ) · δ₀

γⱼ  ~  Bernoulli(p)   independently for j = 1, …, p

Where δ₀   →  Point mass at zero (spike)
N(0, τ²)   →  Diffuse normal prior (slab)
γⱼ   →  Inclusion indicator for variable j
p   →  Prior inclusion probability

Model Space Exploration

The vector γ = (γ₁, …, γp) encodes which variables are active. The posterior distribution p(γ | y) over the 2p possible models provides a complete summary of the evidence for each variable combination. The posterior inclusion probability (PIP) for variable j is:

Posterior Inclusion Probability PIP_j  =  P(γⱼ = 1 | y)  =  Σ_{γ: γⱼ=1} P(γ | y)

Median Probability Model Include variable j iff PIP_j > 0.5

The median probability model (Barbieri and Berger, 2004) includes all variables with PIP > 0.5. Under certain conditions, this model is optimal for prediction under squared error loss. Alternatively, Bayesian model averaging (BMA) averages predictions over all models weighted by their posterior probabilities, avoiding the need to select a single model at all.

Variants of the Spike-and-Slab

The original spike-and-slab formulation, introduced by Mitchell and Beauchamp (1988), used a point mass spike and a flat slab. George and McCulloch (1993) proposed the stochastic search variable selection (SSVS) variant, replacing the point mass with a very narrow normal distribution (a "pseudo-spike"), which simplifies MCMC computation at the cost of not achieving exact zeros. Ishwaran and Rao (2005) developed the continuous spike-and-slab, using two normal distributions of different scales.

More recent developments include the horseshoe prior (Carvalho, Polson, and Scott, 2010), the Dirichlet-Laplace prior (Bhattacharya et al., 2015), and the R2-D2 prior (Zhang et al., 2022), which achieve spike-and-slab-like behavior through continuous shrinkage rather than discrete mixture components. These continuous alternatives avoid the combinatorial explosion of model space search but sacrifice the interpretability of explicit inclusion/exclusion indicators.

The Curse of Model Space Dimensionality

With p predictors, there are 2p possible models. For p = 30, this is over one billion models; for p = 100, it exceeds the number of atoms in the observable universe. Direct enumeration is impossible. MCMC methods (Gibbs sampling over γ, Metropolis-Hastings with variable addition/deletion moves) explore the model space stochastically, spending more time in high-posterior regions. The Shotgun Stochastic Search (Hans, Dobra, and West, 2007) and adaptations of reversible-jump MCMC (Green, 1995) provide efficient exploration strategies. For very high-dimensional problems (p >> n), variational Bayes approximations to the spike-and-slab posterior have emerged as scalable alternatives.

Connection to Penalized Regression

The spike-and-slab prior has deep connections to frequentist penalized regression methods. The LASSO (L₁ penalty) can be interpreted as the MAP estimate under a Laplace (double-exponential) prior, which is a continuous approximation to the spike-and-slab. Ridge regression corresponds to a normal prior (the slab alone, without a spike). Elastic net combines L₁ and L₂ penalties, corresponding to a mixture of Laplace and normal priors. The spike-and-slab is the fully Bayesian version that achieves exact sparsity — coefficients are literally zero with positive posterior probability, not merely shrunk toward zero.

Historical Development

1988

Mitchell and Beauchamp introduce the spike-and-slab prior for variable selection in linear regression, using a point mass at zero and a uniform slab.

1993

George and McCulloch propose Stochastic Search Variable Selection (SSVS), using a narrow-and-wide normal mixture as a computationally convenient alternative.

2004

Barbieri and Berger establish the optimality of the median probability model for prediction, giving theoretical support to threshold-based variable selection from posterior inclusion probabilities.

2010

Carvalho, Polson, and Scott introduce the horseshoe prior, a continuous shrinkage prior that approximates spike-and-slab behavior while avoiding discrete mixture computation.

"The spike-and-slab prior is the gold standard for Bayesian variable selection. It does in a principled way what stepwise regression tries to do heuristically: identify which variables matter and estimate their effects simultaneously." — Edward I. George and Robert E. McCulloch, Journal of the American Statistical Association (1997)

Worked Example: Variable Selection with Three Predictors

We have 20 observations with three predictors (x₁, x₂, x₃) and want to determine which variables should be included in the model using spike-and-slab variable selection via BIC-based posterior model probabilities.

Given y is strongly correlated with x₁ (r = 0.99), weakly with x₂ (r = −0.21), moderately with x₃ (r = −0.40)
8 possible models: {∅}, {x₁}, {x₂}, {x₃}, {x₁,x₂}, {x₁,x₃}, {x₂,x₃}, {x₁,x₂,x₃}

Step 1: Fit All Models (BIC) {x₁}: R² = 0.985, BIC = −42.1
{x₁, x₃}: R² = 0.988, BIC = −40.5
{x₁, x₂, x₃}: R² = 0.989, BIC = −38.2
{∅}: R² = 0.000, BIC = 45.3

Step 2: Posterior Model Probabilities P({x₁} | data) ≈ 0.72
P({x₁, x₃} | data) ≈ 0.19
P({x₁, x₂, x₃} | data) ≈ 0.05
All other models < 0.03

Step 3: Posterior Inclusion Probabilities PIP(x₁) = 0.72 + 0.19 + 0.05 + ⋯ ≈ 0.99 → Include (strong)
PIP(x₂) = 0.05 + 0.01 + ⋯ ≈ 0.08 → Exclude
PIP(x₃) = 0.19 + 0.05 + 0.02 + ⋯ ≈ 0.27 → Uncertain

The spike-and-slab analysis clearly identifies x₁ as essential (PIP = 99%), x₂ as unnecessary (PIP = 8%), and x₃ as borderline (PIP = 27%). The highest posterior model {x₁} has 72% probability. This automatic Occam's razor effect — penalizing unnecessary complexity — is a key advantage of the Bayesian approach to variable selection over stepwise methods.

Interactive Calculator

Each row has predictors x1, x2, x3, and response y. The calculator implements spike-and-slab variable selection: for each predictor, it computes the posterior inclusion probability (PIP) — the probability that the variable's coefficient is non-zero. High PIP means the variable should be included.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics