Gaussian Process Emulator — BayesianStatistics.com

Modern science and engineering rely on computer simulations — climate models, finite-element analyses, computational fluid dynamics codes — that can take hours or days per run. When the goal is calibration, sensitivity analysis, or optimization, thousands of evaluations may be needed. A Gaussian process (GP) emulator addresses this bottleneck by fitting a probabilistic interpolator to a small set of simulation runs, then predicting the output at untested inputs along with a credible interval that honestly reflects the emulator's uncertainty.

The emulator is not merely a curve fit. Because it is grounded in Bayesian probability, it provides a full posterior distribution over the simulator output at every point in the design space. Where training runs are dense the emulator is confident; where they are sparse the predictive variance widens, automatically flagging regions that need further exploration.

Mathematical Framework

Let f(x) denote the simulator output at input vector x. A GP prior is placed over f:

GP Prior f(x) ~ GP(m(x), k(x, x′))

Posterior Predictive Mean μ*(x) = m(x) + k(x, X)[K(X,X) + σ²I]⁻¹(y − m(X))

Posterior Predictive Variance σ*²(x) = k(x, x) − k(x, X)[K(X,X) + σ²I]⁻¹k(X, x)

Here m(x) is the mean function (often a low-order polynomial or simply zero), k(x, x′) is the covariance or kernel function encoding assumptions about smoothness and length-scale, X is the matrix of training inputs, y is the vector of observed outputs, and σ² captures observation noise (set to zero for deterministic simulators). The kernel hyperparameters — length-scales, signal variance, and smoothness — are typically estimated by maximizing the marginal likelihood or given fully Bayesian treatment with MCMC.

Historical Development

1989

Sacks, Welch, Mitchell, and Wynn publish their landmark paper on the "Design and Analysis of Computer Experiments," proposing kriging (GP regression) as a systematic approach to emulating deterministic computer codes.

1995

Kennedy and O'Hagan develop the Bayesian calibration framework, combining a GP emulator of the simulator with a discrepancy term for model inadequacy, enabling principled calibration of computer models to physical observations.

2000s

The MUCM (Managing Uncertainty in Complex Models) project at Sheffield systematizes GP emulation methodology, producing widely used guidelines and the GEM-SA software.

2010s–present

Deep GP emulators, multi-fidelity emulation, and high-dimensional output emulation extend the framework. Tools such as GPy, GPflow, and BoTorch make GP emulators accessible within Python ecosystems.

Design of Experiments

Because each simulator run is expensive, the choice of training inputs matters enormously. Space-filling designs — Latin hypercube samples, Sobol sequences, maximin designs — are standard. Sequential design strategies use the emulator's predictive variance to choose the next input point, concentrating runs where uncertainty is greatest or where an optimization criterion is most promising. This active-learning perspective connects GP emulation to Bayesian optimization.

Emulators vs. Reduced-Order Models

GP emulators are data-driven surrogates: they treat the simulator as a black box. Reduced-order models, by contrast, exploit the simulator's internal structure (e.g., proper orthogonal decomposition). The two approaches are complementary. GP emulators excel when the simulator is truly opaque or when the input dimension is moderate (typically fewer than 20–30). For very high-dimensional inputs, dimensionality reduction or structured kernels become essential.

Sensitivity Analysis and Calibration

Once an emulator is built, Sobol sensitivity indices can be computed analytically from the GP posterior, decomposing output variance into contributions from each input factor and their interactions. This is far cheaper than running the simulator itself thousands of times for a Monte Carlo sensitivity analysis.

For calibration — adjusting simulator parameters so that outputs match physical observations — the Kennedy–O'Hagan framework embeds the emulator within a hierarchical Bayesian model. The posterior over calibration parameters accounts for observation error, emulator uncertainty, and systematic model discrepancy. History matching, developed by Craig, Goldstein, and colleagues, offers an alternative that iteratively rules out implausible regions of parameter space using emulator-based bounds.

"The emulator is not a model of reality. It is a model of a model — a fast probabilistic mirror of the simulator, through which we can see both what the simulator predicts and where our knowledge of its behavior runs thin." — Tony O'Hagan, summarizing the emulation philosophy

Challenges and Extensions

Standard GP emulators face the cubic cost O(n³) of inverting the covariance matrix, limiting training sets to a few thousand points. Sparse approximations (inducing-point methods), local GP models, and structured exploitations of simulator regularity help scale to larger problems. For multivariate outputs — such as a spatial field produced at each run — approaches include independent emulation of principal components, multi-output GPs with cross-covariance functions, and deep GP architectures.

Multi-fidelity emulation exploits a hierarchy of simulators at different resolutions. A cheap, coarse model provides global coverage, while a few expensive high-fidelity runs anchor the emulator near the true output. The autoregressive co-kriging model of Kennedy and O'Hagan, and its extensions by Le Gratiet and others, fuse information across fidelity levels in a principled Bayesian way.

Mathematical Framework

Historical Development

Design of Experiments

Sensitivity Analysis and Calibration

Challenges and Extensions

Related Topics