Bayesian Statistics

Genome-Wide Association Studies

Bayesian fine-mapping, polygenic risk scores, and variant annotation integration enable geneticists to identify causal variants from genome-wide association signals and predict disease risk from the combined effects of thousands of genetic variants.

P(causal | GWAS signal, LD, annotations) ∝ P(signal | causal) · P(causal | annotations)

Genome-wide association studies (GWAS) test millions of genetic variants for association with diseases and traits, identifying genomic regions that harbor causal variants. But association is not causation: linkage disequilibrium (LD) means that the statistically significant variant may be a neighbor of the true causal variant rather than the causal variant itself. Bayesian methods address this ambiguity through fine-mapping — determining which variant(s) in an associated region are most likely causal — and through polygenic modeling, which estimates the joint effect of thousands of variants on disease risk.

Bayesian Fine-Mapping

Bayesian fine-mapping computes the posterior probability that each variant in an associated region is causal, given the pattern of association signals and the LD structure. The key quantity is the posterior inclusion probability (PIP) — the probability that a variant is in the causal set — computed by summing over all causal configurations that include that variant.

Bayesian Fine-Mapping P(γ | z, Σ) ∝ P(z | γ, Σ) · P(γ)

Where γ = binary vector indicating which variants are causal
z = vector of GWAS z-scores
Σ = LD correlation matrix
P(γ) = prior on causal configurations (sparse: few variants are causal)

PIPⱼ = Σ_{γ: γⱼ=1} P(γ | z, Σ)     [posterior inclusion probability]

Software tools like FINEMAP, SuSiE (Sum of Single Effects), and CAVIAR implement Bayesian fine-mapping with varying assumptions about the number of causal variants, the prior on effect sizes, and the handling of LD. Credible sets — the smallest set of variants that contains all causal variants with 95% posterior probability — provide actionable results for experimental follow-up.

Functional Annotation Integration

Not all variants are equally likely to be causal a priori: variants in coding regions, regulatory elements, or conserved sequences are more likely to affect gene function. Bayesian fine-mapping methods like PolyFun and fGWAS incorporate functional annotations as prior information, upweighting variants in functional regions. The posterior probability of causality then reflects both the statistical signal and the biological plausibility of each variant.

From Association to Function

GWAS has identified thousands of genomic regions associated with complex diseases, but most associated variants are in non-coding regions with unknown function. Bayesian fine-mapping narrows the search from hundreds of correlated variants to a handful of likely causal ones, guiding expensive functional experiments (CRISPR knockout, massively parallel reporter assays) toward the highest-probability targets. Without Bayesian prioritization, the experimental bottleneck would make the translation of GWAS findings to biological mechanism impossibly slow.

Polygenic Risk Scores

Most complex diseases and traits are influenced by thousands of genetic variants, each with a small effect. Polygenic risk scores (PRS) aggregate these effects into a single measure of genetic risk. Bayesian PRS methods — LDpred, PRS-CS, SBayesR — model the joint distribution of effect sizes across all variants, accounting for LD and incorporating prior distributions on effect sizes that reflect the polygenic architecture (most effects are near zero, a few are larger).

Bayesian Polygenic Risk Score (LDpred) β | causal ~ N(0, h²/(M·p))     [effect sizes of causal variants]
P(causal) = p     [proportion of causal variants]
PRS_i = Σⱼ X_ij · E[βⱼ | GWAS, LD]     [posterior mean effects]

Bayesian PRS methods consistently outperform simple p-value thresholding and clumping because they model LD properly and shrink effect sizes toward zero, reducing noise. The posterior distribution of each effect size reflects the uncertainty in that variant's contribution, and the overall PRS prediction carries a posterior predictive interval that communicates the precision of individual-level risk prediction.

"GWAS gives us regions; fine-mapping gives us variants; functional annotation gives us mechanisms. At every step, Bayesian inference is the glue that combines statistical evidence with biological knowledge." — Hilary Finucane, Broad Institute

Multi-Ancestry and Trans-Ethnic Analysis

Bayesian methods enable fine-mapping and PRS estimation across multiple ancestries by leveraging differences in LD structure: a causal variant may be in LD with different tag variants in different populations, and joint Bayesian analysis across ancestries narrows the credible set. Bayesian meta-analysis of GWAS across populations, with ancestry-specific LD patterns, improves both fine-mapping resolution and PRS portability — addressing the critical equity challenge that most GWAS participants have been of European ancestry.

Related Topics