Bayesian Statistics

Watanabe–Akaike Information Criterion

The Watanabe–Akaike Information Criterion (WAIC), also called the Widely Applicable Information Criterion, is a fully Bayesian measure of predictive accuracy that uses the log pointwise predictive density and a data-driven penalty for effective model complexity.

WAIC = −2(lppd − p_WAIC), where lppd = ∑ᵢ log E_θ|y[p(yᵢ | θ)] and p_WAIC = ∑ᵢ Var_θ|y[log p(yᵢ | θ)]

The Watanabe–Akaike Information Criterion (WAIC) was developed by Sumio Watanabe as a Bayesian generalization of AIC that is valid even for singular statistical models — models where the Fisher information matrix is degenerate. Unlike DIC, which relies on a point estimate (the posterior mean), WAIC integrates over the full posterior distribution, providing a more principled measure of out-of-sample predictive accuracy.

Definition and Components

WAIC Formula Log pointwise predictive density:
lppd = ∑ᵢ₌₁ⁿ log E_θ|y[p(yᵢ | θ)] = ∑ᵢ₌₁ⁿ log ∫ p(yᵢ | θ) p(θ | y) dθ

Effective number of parameters (variance form):
p_WAIC = ∑ᵢ₌₁ⁿ Var_θ|y[log p(yᵢ | θ)]

WAIC = −2(lppd − p_WAIC)

The lppd (log pointwise predictive density) measures how well the model predicts each observed data point, averaged over the posterior. The penalty p_WAIC captures the effective complexity by summing, for each observation, the posterior variance of its log predictive density. A large variance indicates that the model's predictions for that point are sensitive to the particular parameter values — a sign of overfitting.

Computation from MCMC Samples

Given S posterior draws {θ⁽¹⁾, …, θ⁽ˢ⁾}, WAIC is computed as:

lppd ≈ ∑ᵢ log(1/S ∑ₛ p(yᵢ | θ⁽ˢ⁾)) — the log of the average likelihood for each data point.

p_WAIC ≈ ∑ᵢ Var_s[log p(yᵢ | θ⁽ˢ⁾)] — the sample variance of the log-likelihood for each point across posterior draws.

This computation is straightforward and, like DIC, works directly from standard MCMC output. The key advantage is that it uses the full posterior, not just a point estimate.

WAIC vs. DIC vs. LOO-CV

WAIC is asymptotically equivalent to Bayesian leave-one-out cross-validation (LOO-CV) and to the Bayesian predictive information criterion (BPIC). Vehtari, Gelman, and Gabry (2017) showed that Pareto-smoothed importance sampling LOO (PSIS-LOO) provides a more robust estimate than WAIC when individual terms in the lppd sum are highly variable, because importance sampling diagnostics can flag problematic observations. In current best practice (e.g., the loo R package), PSIS-LOO is generally preferred over WAIC, though both target the same quantity: expected log predictive density (elpd).

Theoretical Foundations

Watanabe's theory of singular learning theory provides the mathematical underpinning. In regular models (where the Fisher information is positive definite), WAIC reduces to AIC up to lower-order terms. In singular models — including mixture models, hidden Markov models, Bayesian neural networks, and many hierarchical models — the effective dimensionality is not a whole number, and AIC/BIC can fail. Watanabe showed that the WAIC penalty correctly captures the "real log canonical threshold" (RLCT), a geometric invariant that governs the model's learning coefficient.

Historical Context

1973

Akaike's AIC established information-theoretic model selection for regular models.

2002

DIC was introduced for Bayesian model comparison via MCMC, but with known limitations for non-Gaussian posteriors.

2009–2010

Sumio Watanabe published WAIC as part of his broader theory of singular learning, proving its validity for singular models.

2014–2017

Gelman, Hwang, and Vehtari brought WAIC into mainstream applied statistics, comparing it to DIC and LOO-CV and advocating its use as a principled Bayesian model comparison tool.

Practical Recommendations

WAIC (and the closely related PSIS-LOO) should be used when the goal is to compare models' predictive performance, which is the most common model comparison objective. For comparing nested models with the same data, Bayes factors remain the gold standard for hypothesis testing. For non-nested models or when prior sensitivity is a concern, robust Bayesian analysis should complement any information criterion. WAIC is implemented in Stan (via the loo package), PyMC, and other modern Bayesian software.

"WAIC bridges information theory and Bayesian statistics, providing a model comparison criterion that respects the full posterior and remains valid even when classical regularity conditions fail."— Sumio Watanabe, 2010

Worked Example: WAIC for Two Competing Models

We compute WAIC from pointwise log-likelihood values for two models fit to 10 observations. WAIC uses the full posterior, not just the point estimate.

Given (pointwise log-likelihoods) Model 1: −1.2, −0.8, −1.5, −0.9, −1.1, −1.3, −0.7, −1.4, −1.0, −0.6
Model 2: −1.5, −1.0, −1.3, −1.1, −1.4, −1.2, −1.0, −1.6, −1.1, −0.9

Step 1: lppd (log pointwise predictive density) lppd₁ = Σ log p(yᵢ | θ̂) = −10.5
lppd₂ = Σ log p(yᵢ | θ̂) = −12.1

Step 2: p_WAIC (effective parameters) p_WAIC = Σ Var_post(log p(yᵢ | θ))
p_WAIC₁ ≈ 0.82
p_WAIC₂ ≈ 0.45

Step 3: WAIC = −2(lppd − p_WAIC) WAIC₁ = −2(−10.5 − 0.82) = 22.64
WAIC₂ = −2(−12.1 − 0.45) = 25.10
ΔWAIC = WAIC₁ − WAIC₂ = −2.46

Model 1 has a lower WAIC (22.64 vs 25.10), indicating better expected out-of-sample predictive performance. The difference of 2.46 is modest. Model 1 achieves this despite having a higher effective parameter count (p_WAIC = 0.82 vs 0.45) because its log pointwise predictive density is substantially better (−10.5 vs −12.1). WAIC, unlike DIC, accounts for the full posterior distribution of each parameter, making it more reliable for non-Normal posteriors.

Interactive Calculator

Each row has an observation index, log_lik_1 (pointwise log-likelihood under model 1), and log_lik_2 (under model 2). The calculator computes WAIC = −2(lppd − p_WAIC) for each model, where lppd is the log pointwise predictive density and p_WAIC is the effective number of parameters estimated from the variance of log-likelihoods.

Click Calculate to see results, or Animate to watch the statistics update one record at a time.

Related Topics