Bayesian Statistics

Forensic Statistics & DNA Evidence

Likelihood ratios for forensic evidence evaluation, DNA mixture analysis, and probabilistic genotyping provide a rigorous Bayesian framework for quantifying the evidential value of forensic findings in criminal investigations.

LR = P(evidence | H_prosecution) / P(evidence | H_defense)

Forensic science evaluates physical evidence — DNA profiles, fingerprints, glass fragments, fibers — to assess whether it links a suspect to a crime. The correct framework for this evaluation is fundamentally Bayesian: the forensic scientist computes a likelihood ratio measuring how much more probable the evidence is under the prosecution hypothesis (the suspect is the source) than under the defense hypothesis (someone else is the source). This likelihood ratio is then combined with the prior odds — based on other evidence in the case — to produce the posterior odds of guilt.

The Likelihood Ratio Framework

The Association of Forensic Science Providers and the European Network of Forensic Science Institutes have adopted the likelihood ratio as the logically correct measure of evidential value. The forensic scientist's role is to evaluate the evidence, not to determine guilt — and the likelihood ratio cleanly separates the scientist's domain (evidence evaluation) from the court's domain (combining all evidence into a verdict).

Likelihood Ratio for Evidence Evaluation LR = P(E | H₁) / P(E | H₂)

Where H₁ = prosecution hypothesis (e.g., the suspect is the source of the DNA)
H₂ = defense hypothesis (e.g., an unknown person is the source)
E = observed forensic evidence

Posterior Odds P(H₁|E) / P(H₂|E) = LR × P(H₁) / P(H₂)

For a full single-source DNA profile matching the suspect, the likelihood ratio can exceed 10 billion — the evidence is 10 billion times more probable if the suspect is the source than if an unrelated person is. But for partial profiles, mixtures, and degraded samples, the likelihood ratio requires sophisticated probabilistic modeling.

Probabilistic Genotyping

DNA evidence from crime scenes is often a mixture of biological material from multiple contributors — victim, suspect, and unknown persons. Probabilistic genotyping software (STRmix, TrueAllele) uses Bayesian inference to deconvolve these mixtures. The model treats the contributors' genotypes as latent variables, considers stutter artifacts, degradation, and allele dropout, and uses MCMC to compute the likelihood ratio for the hypothesis that a specific person contributed to the mixture.

From Binary Match to Continuous LR

Traditional forensic DNA analysis used a binary framework: does the suspect's profile "match" the crime scene evidence? This approach discards information and fails for complex mixtures. Probabilistic genotyping replaces the binary match with a continuous likelihood ratio that captures all the information in the evidence. A mixture with allele dropout that would have been declared "inconclusive" under binary interpretation may yield a likelihood ratio of millions under probabilistic genotyping, correctly quantifying the strong but imperfect evidence.

Non-DNA Forensic Evidence

The likelihood ratio framework extends beyond DNA to all forms of forensic evidence. For glass fragments, the LR compares the probability of observing the measured refractive index under the hypothesis that the glass came from the crime scene window versus a random source. For handwriting, speaker recognition, and facial comparison, Bayesian score-based methods calibrate the output of automated systems into proper likelihood ratios. For fiber evidence, footwear marks, and tool marks, Bayesian methods quantify the rarity of observed characteristics against reference databases.

"The value of evidence is not a property of the evidence alone but depends on the competing propositions being considered. The likelihood ratio captures this relativity, and Bayes' theorem shows how it updates belief." — Colin Aitken, Statistics and the Evaluation of Evidence for Forensic Scientists

Activity-Level Propositions

Modern forensic evaluation increasingly considers not just source-level questions (who left the DNA?) but activity-level questions (how was the DNA deposited?). Bayesian networks model the chain from activity to transfer, persistence, and recovery of trace evidence, allowing the likelihood ratio to address the forensically relevant question: is the DNA presence more probable under the prosecution's account of events or the defense's?

Validation and Error Rates

Bayesian methods also address the validation of forensic methods themselves. The posterior probability of error — given validation study results, proficiency test performance, and quality control data — provides a more nuanced assessment of method reliability than simple error counts. And Bayesian decision theory provides a framework for setting reporting thresholds that balance the costs of false inclusions and false exclusions.

Related Topics