Bayesian Statistics

WorldPop Project

The WorldPop project uses Bayesian geostatistical models to produce high-resolution estimates of population distributions in regions with incomplete census data, combining satellite imagery, survey data, and spatial random effects to map where people live across the globe.

log(popᵢ) = Σⱼ βⱼ xᵢⱼ + f(sᵢ) + εᵢ

Accurate, spatially detailed population data are essential for public health planning, disaster response, infrastructure development, and the monitoring of Sustainable Development Goals. Yet in much of the world — particularly sub-Saharan Africa, South and Southeast Asia, and parts of Latin America — recent census data are unavailable, unreliable, or out of date. The WorldPop project, based at the University of Southampton, addresses this gap by using Bayesian statistical models to estimate population distributions at fine spatial resolution (typically 100m grid cells), combining incomplete census data with remotely sensed covariates.

The project's methods exemplify the power of Bayesian spatial modeling: they propagate uncertainty from data-sparse regions into the final estimates, quantify the precision of population figures at every location, and provide principled predictions for areas where no census enumeration has occurred.

Modeling Framework

WorldPop models typically relate observed population counts in enumerated areas to a set of geospatial covariates — satellite-derived indicators of settlement, land cover, nighttime lights, road networks, and topography — plus a spatial random effect that captures residual spatial variation.

WorldPop Population Model (Log-Linear Geostatistical Form) log(popᵢ) = β₀ + Σⱼ βⱼ xᵢⱼ + f(sᵢ) + εᵢ

Where popᵢ      →  Population count (or density) in area i
xᵢⱼ       →  Geospatial covariate j at location i (nightlights, land cover, etc.)
βⱼ        →  Regression coefficient for covariate j
f(sᵢ)     →  Spatial random effect (Gaussian process or ICAR model)
εᵢ        →  Independent error term

The spatial random effect f(s) captures population variation not explained by the covariates — local factors such as cultural settlement patterns, informal economies, or small-scale environmental features. Bayesian inference estimates all parameters and the spatial field jointly, producing posterior predictive distributions at every grid cell, including those with no direct census data.

Bayesian Geostatistics

The spatial random effect is typically modeled as a Gaussian process (geostatistical model) or using the Besag-York-Mollie (BYM) model for areal data. In the geostatistical formulation, spatial correlation is specified through a Matern covariance function, and the Integrated Nested Laplace Approximation (INLA) or MCMC methods are used for inference.

Why Bayesian?

Bayesian methods are essential for this application for three reasons. First, they produce full posterior distributions, not just point estimates — enabling honest uncertainty quantification in data-sparse regions. Second, priors can incorporate expert knowledge about plausible population densities and spatial correlation structures. Third, the hierarchical modeling framework naturally handles the multi-scale structure of the problem: national census totals constrain regional estimates, which constrain local grid-cell predictions.

Data Sources

WorldPop integrates diverse data streams, each contributing different information about population distribution.

Census Data

Official census counts at administrative units (districts, provinces) provide the ground truth that anchors the model. These data are available for most countries, though at different dates and spatial resolutions. The model disaggregates these areal counts to fine grid cells using the covariate relationships.

Remote Sensing

Satellite imagery provides information on settlement extent (building footprints, urban areas), land cover type (cropland, forest, water), elevation and slope, and nighttime light intensity — all correlated with population density. These covariates are available globally at high resolution and are updated regularly.

Survey and Ancillary Data

Demographic and Health Surveys (DHS), household surveys, and mobile phone call detail records provide additional information about population characteristics and distribution, particularly in areas between census years.

Applications and Impact

Infectious Disease Mapping

WorldPop data are used extensively in malaria control: estimating the number of people at risk, planning bed net distribution, and targeting indoor residual spraying. During the 2014–2016 West Africa Ebola epidemic, WorldPop estimates helped responders understand population density in affected areas and plan vaccination campaigns.

Disaster Response

When natural disasters strike, responders need to know how many people are in the affected area. WorldPop provides pre-disaster population estimates that, combined with damage assessments, support rapid estimation of affected populations and resource needs.

Sustainable Development Goals

Monitoring progress toward SDG targets — such as universal access to clean water, education, and healthcare — requires knowing where people live. WorldPop data underpin many of the spatial analyses used by the United Nations, World Bank, and national governments to track and plan for development.

"You cannot plan for people you cannot see. High-resolution population maps are not a luxury — they are the foundation of evidence-based policy in the developing world." — Andrew Tatem, Director of WorldPop, University of Southampton

Uncertainty Quantification

A distinguishing feature of WorldPop's Bayesian approach is the provision of uncertainty estimates alongside population figures. Each grid cell comes with a posterior distribution — typically summarized as a mean estimate and 95% credible interval. In data-rich areas near census enumeration zones, uncertainty is small. In remote or recently changed areas, uncertainty is larger, honestly reflecting the limits of the available data.

Posterior Predictive Distribution p(pop_new | data) = ∫ p(pop_new | θ, x_new) · p(θ | data) dθ

Where pop_new     →  Population at an unobserved grid cell
θ           →  Model parameters (regression coefficients, spatial field, variance)
x_new       →  Covariate values at the new location
p(θ | data) →  Posterior distribution from observed areas
Constrained Population Mapping

WorldPop uses a "top-down" disaggregation approach: the model predictions at fine resolution are constrained to sum to known totals at coarser administrative levels. This "pycnophylactic" constraint ensures consistency with official statistics while distributing population within administrative units according to the model's spatial predictions. The Bayesian framework handles this constraint naturally through the hierarchical model structure.

Related Projects and Evolution

WorldPop evolved from the AfriPop and AsiaPop projects, which focused on specific continents. It is now a global dataset, with gridded population estimates available for every country. Related initiatives include Meta's High Resolution Settlement Layer (using deep learning on satellite imagery) and the Gridded Population of the World (GPW) from CIESIN. WorldPop's Bayesian modeling approach remains distinctive for its principled uncertainty quantification and integration of multiple data sources.

Related Topics