Variables & Notation
Core quantities and symbols used throughout Bayesian inference.
Prior Probability
The probability of a hypothesis before observing data. Encodes existing beliefs or knowledge about the parameter.
P(θ)Before flipping a coin, you believe P(fair) = 0.8 based on experience.
Likelihood
The probability of observing the data given a specific hypothesis or parameter value.
P(D|θ) = ∏ P(xᵢ|θ)Given a fair coin (θ=0.5), the probability of seeing 7 heads in 10 flips.
Posterior Probability
The updated probability of a hypothesis after observing data. The central quantity in Bayesian inference.
P(θ|D) = P(D|θ)·P(θ) / P(D)After seeing 7/10 heads, updated belief that the coin is fair.
Marginal Likelihood (Evidence)
The total probability of the data across all possible hypotheses. Acts as a normalizing constant.
P(D) = ∫ P(D|θ)·P(θ) dθThe overall probability of getting 7/10 heads across all possible coin biases.
Posterior Predictive
The predicted probability of future observations given the observed data, integrating over parameter uncertainty.
P(x̃|D) = ∫ P(x̃|θ)·P(θ|D) dθPredicting the probability of the next coin flip being heads after observing data.
Prior Predictive
The predicted probability of data before observing anything, based only on the prior.
P(x̃) = ∫ P(x̃|θ)·P(θ) dθExpected data distribution before running an experiment.
Bayes Factor
The ratio of marginal likelihoods for two competing models. Quantifies relative evidence.
BF₁₂ = P(D|M₁) / P(D|M₂)BF = 10 means data is 10× more likely under model 1 than model 2.
Hyperparameter
Parameters of the prior distribution. In hierarchical models, these may themselves have priors (hyperpriors).
θ ~ Prior(α, β)The shape (α) and rate (β) of a Gamma prior on a Poisson rate parameter.
Latent Variable
An unobserved variable that influences the observed data. Must be inferred from observations.
P(z|x) ∝ P(x|z)·P(z)Cluster assignments in a Gaussian mixture model.
Credible Interval
An interval in which the parameter lies with a given probability, according to the posterior distribution.
P(a ≤ θ ≤ b | D) = 0.9595% credible interval: the true coin bias lies between 0.45 and 0.85.
MAP Estimate
Maximum A Posteriori — the single most probable parameter value under the posterior.
θ̂_MAP = argmax_θ P(θ|D)The most likely coin bias given the observed flips.
Posterior Mean
The expected value of the parameter under the posterior distribution. Minimizes squared error loss.
E[θ|D] = ∫ θ·P(θ|D) dθThe average coin bias weighted by the posterior distribution.
Posterior Variance
The spread of uncertainty in the parameter estimate under the posterior.
Var(θ|D) = E[θ²|D] − (E[θ|D])²How certain/uncertain we are about the coin's bias after observing data.
Effective Sample Size
In MCMC, the number of independent-equivalent samples. Accounts for autocorrelation in chains.
n_eff = N / (1 + 2·Σ ρₖ)A chain of 10,000 samples might have n_eff = 3,000 due to autocorrelation.
R-hat (Convergence Diagnostic)
Compares between-chain and within-chain variance to assess MCMC convergence. Values near 1 indicate convergence.
R̂ = √(V̂/W)R̂ = 1.01 suggests chains have converged; R̂ > 1.1 signals problems.