Introduced by David Spiegelhalter, Nicola Best, Brad Carlin, and Angelika van der Linde in 2002, the Deviance Information Criterion (DIC) was designed to fill a gap in the Bayesian toolkit: a practical, easily-computed model comparison criterion that works directly with MCMC samples. Like AIC and BIC, DIC penalizes model complexity, but it uses a Bayesian notion of effective complexity that accounts for the shrinkage effect of priors.
Definition
The deviance is defined as D(θ) = −2 log p(y | θ) (up to an additive constant that cancels in comparisons). DIC is constructed from two quantities:
Deviance at posterior mean: D(θ̄) = −2 log p(y | θ̄), where θ̄ = E_θ|y[θ]
Effective number of parameters: p_D = D̄ − D(θ̄)
DIC = D̄ + p_D = 2D̄ − D(θ̄)
The effective number of parameters p_D measures how much the data have informed the posterior beyond the prior. In a model with very informative priors, p_D can be much smaller than the nominal parameter count, reflecting the reduced freedom. Models with smaller DIC are preferred, analogous to AIC.
Computation from MCMC Output
DIC is straightforward to compute from MCMC samples {θ⁽¹⁾, …, θ⁽ᵐ⁾}: D̄ is the average deviance over samples, θ̄ is the sample mean of θ, and D(θ̄) is the deviance evaluated at θ̄. This simplicity — requiring no additional model runs beyond the standard MCMC — was a key design goal and contributed to DIC's rapid adoption, particularly in the WinBUGS/OpenBUGS ecosystem where it was immediately implemented.
Unlike the nominal parameter count, the effective number of parameters p_D is not guaranteed to be positive. This can occur in models where the posterior mean is a poor summary of the posterior (e.g., in highly non-Gaussian or multimodal posteriors), where the deviance at the posterior mean D(θ̄) exceeds the mean deviance D̄. A negative p_D is a diagnostic signal that DIC may be unreliable for the model in question. An alternative definition, p_D = ½ Var_θ|y[D(θ)], always yields a non-negative value.
Strengths and Criticisms
DIC's strengths are its computational simplicity and its natural integration with MCMC workflows. It has been enormously influential in applied Bayesian statistics, particularly in hierarchical modelling, spatial statistics, and epidemiology.
However, DIC has significant limitations. It is not invariant to reparameterization — different parameterizations of the same model can yield different DIC values. It relies on the posterior mean θ̄ as a point estimate, which is problematic for non-Gaussian posteriors. It does not account for posterior uncertainty in model comparison and is not based on a proper predictive criterion. These limitations motivated the development of WAIC (Watanabe–Akaike Information Criterion), which addresses several of DIC's shortcomings.
Historical Context
Akaike introduced AIC, establishing the paradigm of information-theoretic model selection penalizing complexity.
Spiegelhalter, Best, Carlin, and van der Linde published the DIC paper in the Journal of the Royal Statistical Society, Series B, accompanied by extensive discussion.
DIC became the default model comparison tool in WinBUGS and OpenBUGS, accumulating thousands of citations across applied statistics.
Watanabe's WAIC and Vehtari et al.'s LOO-CV work provided alternatives addressing DIC's theoretical limitations, gradually supplanting it in best practice.
"DIC brought model comparison to the MCMC masses — imperfect, yes, but enormously influential in making Bayesian model selection routine rather than heroic."— Brad Carlin, 2008
Worked Example: Comparing Two Regression Models with DIC
We compare a simple linear model (Model A) with a quadratic model (Model B) using the Deviance Information Criterion. Lower DIC is preferred.
45.2, 44.8, 46.1, 45.5, 44.9, 45.8, 45.0, 45.3
p_D = 2.1 effective parameters
Model B (quadratic): 8 posterior deviance draws
42.0, 41.5, 43.2, 42.8, 41.8, 42.5, 41.9, 42.3
p_D = 4.3 effective parameters
Step 1: Posterior Mean Deviance D̄_A = (45.2 + 44.8 + ⋯ + 45.3)/8 = 45.33
D̄_B = (42.0 + 41.5 + ⋯ + 42.3)/8 = 42.25
Step 2: DIC = D̄ + p_D DIC_A = 45.33 + 2.1 = 47.43
DIC_B = 42.25 + 4.3 = 46.55
Step 3: Comparison ΔDIC = DIC_A − DIC_B = 47.43 − 46.55 = 0.88
Model B (quadratic) has a lower DIC by 0.88, suggesting it is slightly preferred. However, since ΔDIC < 2, the difference is not substantial — both models fit comparably after accounting for complexity. The rule of thumb is that ΔDIC > 7 constitutes strong evidence against the higher-DIC model. Here, the simpler linear model remains competitive despite the quadratic model's better fit (D̄_B < D̄_A) because Model B's greater complexity (p_D = 4.3 vs 2.1) partially offsets its improved fit.