Bayesian Statistics

Phylogenetics & Molecular Evolution

MrBayes, BEAST, and other Bayesian phylogenetic software infer evolutionary trees, divergence times, and rates of molecular evolution from DNA and protein sequence data, producing posterior distributions over tree topologies that capture phylogenetic uncertainty.

P(τ, θ | D) ∝ P(D | τ, θ) · P(τ) · P(θ), where D = sequence alignment, τ = tree

Phylogenetics — reconstructing the evolutionary relationships among organisms — has been transformed by Bayesian methods. Before the Bayesian revolution of the late 1990s, phylogeneticists used maximum parsimony or maximum likelihood to find the single best tree. Bayesian phylogenetics instead produces a posterior distribution over trees, capturing the fundamental uncertainty in evolutionary inference. This uncertainty matters: downstream analyses of trait evolution, biogeography, and divergence timing all depend on the tree, and ignoring tree uncertainty leads to overconfident conclusions.

Bayesian Phylogenetic Inference

The posterior distribution of phylogenetic trees given sequence data is computed using MCMC, which samples trees in proportion to their posterior probability. The likelihood of a tree is calculated using Felsenstein's pruning algorithm, which efficiently computes the probability of the observed alignment given the tree topology, branch lengths, and substitution model.

Bayesian Phylogenetic Posterior P(τ, t, θ | D) ∝ P(D | τ, t, θ) · P(τ) · P(t | τ) · P(θ)

Where D = multiple sequence alignment
τ = tree topology (branching pattern)
t = branch lengths or divergence times
θ = substitution model parameters (rates, frequencies)
P(D | τ, t, θ) = phylogenetic likelihood (Felsenstein pruning)

The key MCMC moves include nearest-neighbor interchange (NNI), subtree pruning and regrafting (SPR), and tree bisection and reconnection (TBR), which propose new tree topologies, combined with standard Metropolis-Hastings updates for continuous parameters. The posterior probability of a clade (a group of taxa sharing a common ancestor) is the fraction of MCMC samples containing that clade — a direct, intuitive measure of phylogenetic support.

Software: MrBayes and BEAST

MrBayes, first released by Huelsenbeck and Ronquist in 2001, was the software that brought Bayesian phylogenetics to mainstream biology. Its implementation of Metropolis-coupled MCMC (MC3) — running multiple chains at different temperatures to improve mixing — made it practical to explore the vast space of tree topologies. BEAST (Bayesian Evolutionary Analysis Sampling Trees), developed by Drummond and colleagues, extended Bayesian phylogenetics to molecular clock analyses, estimating divergence times and evolutionary rates jointly with the tree.

The Bayesian Phylogenetic Revolution

The adoption of Bayesian phylogenetics has been remarkably rapid. The original MrBayes paper has been cited over 50,000 times. The method's popularity stems from several advantages over maximum likelihood: posterior probabilities of clades are more intuitive than bootstrap support values, the framework naturally accommodates complex models (relaxed clocks, partition models, fossilized birth-death processes), and the MCMC output provides a sample from the joint posterior of all parameters, enabling integrated uncertainty quantification.

Molecular Clock and Divergence Dating

Bayesian molecular clock models estimate when lineages diverged by combining sequence data with calibration information from fossils or biogeographic events. The strict molecular clock (constant rate across all branches) is unrealistic for most datasets; relaxed clock models allow rates to vary across branches, drawn from a distribution (lognormal, exponential) whose parameters are estimated from the data. Fossil calibrations are incorporated as priors on node ages, and the posterior distribution of divergence times reflects uncertainty from the sequences, the clock model, and the calibrations.

"Bayesian phylogenetics has changed the question from 'What is the best tree?' to 'What trees are consistent with the data, and how probable is each?' This shift from a point estimate to a distribution is the fundamental advance." — John Huelsenbeck, co-developer of MrBayes

Model Selection and Averaging

Bayesian methods enable model selection in phylogenetics — choosing among substitution models, clock models, tree priors, and partitioning schemes. Bayes factors compare the marginal likelihoods of competing models, and reversible-jump MCMC (implemented in MrBayes 3) can sample across substitution models during the MCMC run, automatically averaging over model uncertainty. Stepping-stone sampling and path sampling provide accurate estimates of marginal likelihoods for model comparison.

Applications Beyond Species Trees

Bayesian phylogenetic methods have been applied far beyond traditional species-level systematics. Viral phylodynamics uses BEAST to reconstruct the evolutionary and epidemiological dynamics of pathogens — HIV, influenza, SARS-CoV-2 — from genomic sequences sampled through time. Bayesian phylogeography estimates the spatial spread of lineages. Gene tree-species tree methods account for incomplete lineage sorting by modeling gene trees as realizations within a species tree, with both inferred jointly in a Bayesian framework.

Related Topics