Computer vision transforms raw pixel arrays into semantic understanding: detecting objects, recognizing faces, segmenting regions, estimating depth, and interpreting scenes. These tasks are inherently ill-posed — a 2D image is consistent with infinitely many 3D scenes — and Bayesian inference provides the principled framework for resolving these ambiguities by combining image evidence (likelihood) with prior knowledge about the visual world (prior) to produce the most probable interpretation (posterior).
Bayesian Object Detection and Recognition
Object detection involves both locating objects within an image and classifying them. Bayesian approaches maintain posterior distributions over object identity, location, scale, and pose, naturally handling the uncertainty that arises from occlusion, viewpoint variation, and illumination changes. Deformable part models with Bayesian parameter estimation were state-of-the-art before the deep learning era, and Bayesian ideas continue to influence modern detectors through uncertainty-aware bounding box regression and probabilistic class predictions.
The prior P(class) encodes object frequency; P(box | class) encodes
expected locations and scales for each object category.
Image Segmentation
Image segmentation assigns a label to every pixel — foreground versus background, or fine-grained semantic categories. Bayesian segmentation models combine pixel-level appearance likelihoods with spatial priors that encourage coherent regions. Markov random fields (MRFs) and conditional random fields (CRFs) encode the prior belief that neighboring pixels are likely to share the same label. Graph cuts provide efficient MAP inference for binary MRFs, while mean-field variational inference handles multi-class segmentation.
In safety-critical vision systems like autonomous driving, knowing when the perception system is uncertain is as important as getting the right answer. Bayesian deep learning methods — Monte Carlo dropout, deep ensembles, and evidential neural networks — provide per-pixel or per-object uncertainty estimates that enable the vehicle to slow down, request human assistance, or take conservative action when visual understanding is ambiguous. Fog, unusual lighting, and novel object types all produce high epistemic uncertainty that a well-calibrated Bayesian system can detect.
3D Reconstruction and Depth Estimation
Recovering 3D structure from 2D images — structure from motion, multi-view stereo, and monocular depth estimation — is a central problem in computer vision. Bayesian approaches maintain posterior distributions over 3D point positions, camera parameters, and scene geometry. Bayesian bundle adjustment provides uncertainty estimates on reconstructed 3D points, indicating which parts of the reconstruction are reliable and which are poorly constrained. Gaussian process priors on depth maps encourage smooth surfaces while allowing sharp discontinuities at object boundaries.
Generative Models for Vision
Variational autoencoders (VAEs) are explicitly Bayesian generative models that learn a latent representation of images by maximizing a variational lower bound on the marginal likelihood. The encoder approximates the posterior distribution over latent variables, while the decoder generates images from samples. VAEs and their extensions enable image generation, inpainting, super-resolution, and anomaly detection within a coherent Bayesian framework.
"Vision is inference. The eye receives a 2D projection, and the brain reconstructs a 3D world — this is Bayes' theorem in biological hardware." — David Marr's computational theory of vision, reinterpreted through the Bayesian lens
Current Frontiers
Neural Radiance Fields (NeRFs) with Bayesian uncertainty enable view synthesis with confidence estimates. Bayesian active learning selects the most informative images for annotation, reducing labeling costs. Open-world detection using Bayesian nonparametrics handles novel object categories. And the integration of vision-language models with Bayesian uncertainty offers paths to visual reasoning systems that know what they do not see.