Bayesian Statistics

Yee Whye Teh

Yee Whye Teh co-invented the hierarchical Dirichlet process and made fundamental contributions to Bayesian nonparametric methods for grouped data, topic models, and deep generative models.

G_j | γ, G₀ ~ DP(γ, G₀); G₀ | α, H ~ DP(α, H)

Yee Whye Teh is a Malaysian-British computer scientist and statistician at the University of Oxford and a research scientist at DeepMind whose work on Bayesian nonparametrics has introduced some of the most widely used models for grouped and sequential data with unknown complexity. His development of the hierarchical Dirichlet process with Michael I. Jordan provided a principled Bayesian framework for sharing statistical strength across groups when the number of mixture components is unknown, with applications spanning topic modeling, natural language processing, population genetics, and beyond.

Life and Career

1976

Born in Malaysia. Studies computer science and mathematics before pursuing graduate work in machine learning.

2003

Earns his Ph.D. from the University of Toronto under Geoffrey Hinton, working on energy-based models and approximate inference.

2006

Publishes "Hierarchical Dirichlet Processes" with Michael I. Jordan, Yee Whye Teh, and David Blei in the Journal of the American Statistical Association, establishing a foundational model for Bayesian nonparametric grouped data analysis.

2006

Co-develops the hierarchical Pitman-Yor process, extending the HDP framework to power-law distributions common in natural language.

2010s

Joins DeepMind as a research scientist while maintaining his academic position at Oxford, contributing to deep generative models and reinforcement learning.

Hierarchical Dirichlet Processes

The hierarchical Dirichlet process (HDP) addresses a fundamental limitation of the standard Dirichlet process mixture model: when data come in groups (documents in a corpus, patients in different hospitals, genes in different organisms), how can we allow each group to have its own mixture distribution while sharing mixture components across groups?

Hierarchical Dirichlet Process Base measure:   G₀ | α, H ~ DP(α, H)
Group-level:   G_j | γ, G₀ ~ DP(γ, G₀)   for each group j
Observations:   θ_{ji} | G_j ~ G_j,   x_{ji} | θ_{ji} ~ F(θ_{ji})

Key Property G₀ is discrete (being drawn from a DP), so all G_j share the same atoms
but with different weights, enabling shared clusters across groups

The elegant structure of the HDP ensures that all groups share a common set of mixture components (the atoms of G₀) but with group-specific mixing weights. This means that a topic discovered in one document can appear in other documents, and the data from all documents contribute to estimating each topic's word distribution. The number of components is determined automatically by the data, growing as more data arrive, with the concentration parameters controlling the rate of growth.

From HDP to Power Laws

Teh recognized that the Dirichlet process generates components with frequencies that decay exponentially, while many natural phenomena exhibit power-law behavior. His development of the Pitman-Yor process as an alternative to the Dirichlet process, and the hierarchical Pitman-Yor process for language modeling, produced models that better capture the heavy-tailed frequency distributions observed in natural language (Zipf's law), leading to state-of-the-art Bayesian language models before the neural era.

Bayesian Nonparametric Language Models

Teh applied Bayesian nonparametric ideas to n-gram language modeling, showing that a hierarchical Pitman-Yor process could produce language models that outperformed state-of-the-art smoothed n-gram models. This work demonstrated that Bayesian nonparametrics was not merely a theoretical curiosity but could achieve practical performance gains on large-scale prediction tasks. The connection between smoothing techniques developed by the NLP community and formal Bayesian priors provided a unifying perspective on language modeling.

Deep Generative Models

At DeepMind, Teh has contributed to the development of deep generative models that combine the flexibility of neural networks with probabilistic reasoning. His work bridges the Bayesian nonparametric tradition with modern deep learning, exploring how Bayesian ideas about uncertainty, model complexity, and hierarchical structure can improve the reliability and interpretability of neural network systems.

"Bayesian nonparametric models allow the complexity of the model to grow with the data, providing a principled way to learn structure that we cannot specify in advance." — Yee Whye Teh

Legacy

Teh's hierarchical Dirichlet process is one of the most cited models in Bayesian nonparametrics, used across machine learning, computational biology, and the social sciences. His work on hierarchical Pitman-Yor processes demonstrated that careful Bayesian modeling could compete with engineering-heavy approaches on practical tasks. Together with his contributions to deep generative models, Teh has helped maintain the relevance of principled probabilistic reasoning in an era increasingly dominated by deep learning.

Related Topics