Bayesian Statistics

Indian Buffet Process

The Indian buffet process is a stochastic process that generates sparse binary matrices of unbounded column dimension, providing a Bayesian nonparametric prior for latent feature models where each observation can possess any subset of a potentially infinite set of features.

P(Z | α) = (α^K₊) / ∏ᵢ₌₁ᴺ Kᵢ! · exp(−αHₙ) · ∏ₖ₌₁^K₊ [(N−mₖ)!(mₖ−1)!/N!]

In many modeling problems, observations are not well described by belonging to a single cluster. A document may discuss multiple topics; an image may contain multiple objects; a person may exhibit multiple personality traits. Latent feature models capture this by associating each observation with a binary vector indicating which features are present. The Indian buffet process (IBP) provides a prior over such binary matrices that allows the number of features to grow with the data, avoiding the need to fix the feature count in advance.

The Culinary Metaphor

The process is named after a sequential construction, analogous to the Chinese restaurant process for the Dirichlet process. Imagine N customers at an Indian buffet:

Indian Buffet Process — Sequential Construction Customer 1 samples Poisson(α) dishes.
Customer i (for i = 2, …, N):
  • For each previously sampled dish k, takes it with probability mₖ / i
    where mₖ = number of previous customers who took dish k
  • Then samples Poisson(α / i) new dishes

The result is a binary matrix Z with N rows (customers) and a random, potentially unbounded number of columns (dishes/features). The parameter α controls the expected total number of features, which grows as α · Hₙ ≈ α · log N. Features sampled by many customers are likely to be sampled by subsequent ones, producing a "rich get richer" dynamic reminiscent of the Chinese restaurant process — but for features rather than clusters.

2005

Thomas Griffiths and Zoubin Ghahramani introduce the Indian buffet process, deriving its properties and demonstrating its use as a prior in latent feature models. Their paper establishes the combinatorial structure and exchangeability properties of the IBP.

2007

Teh, Görür, and Ghahramani derive the stick-breaking construction of the IBP, connecting it to the Beta process and enabling a broader class of inference algorithms.

2009

The three-parameter IBP and the Beta process framework are developed, generalizing the IBP and connecting it to the theory of completely random measures.

2011–present

IBP-based models are applied to matrix factorization, network analysis, causal discovery, and deep latent feature models. Scalable variational inference methods make IBP models practical for large datasets.

Connection to the Beta Process

Just as the Dirichlet process is the de Finetti mixing measure underlying the Chinese restaurant process, the Beta process is the de Finetti measure underlying the IBP. A Beta process B ~ BP(c, B₀) is a completely random measure on a space Ω, where each atom has a mass in [0, 1]. Given B, each observation draws binary features independently: zₖ ~ Bernoulli(πₖ) for each atom πₖ of B. Marginalizing over B yields the IBP. This connection opens the door to dependent and hierarchical extensions analogous to those available for the DP.

Features vs. Clusters: IBP vs. CRP

The Chinese restaurant process assigns each observation to exactly one cluster. The Indian buffet process assigns each observation to a subset of features. This distinction — partition vs. covering — is fundamental. Many real-world phenomena require the covering perspective: a gene participates in multiple pathways, a pixel belongs to multiple visual layers, a patient has multiple comorbidities. The IBP provides the Bayesian nonparametric machinery for such settings, just as the DP does for clustering.

Latent Feature Models

The most common use of the IBP is as a prior in linear-Gaussian latent feature models. Let Z be the N × K binary feature matrix drawn from IBP(α), and let A be a K × D matrix of feature values. The observed data matrix X is modeled as:

Linear-Gaussian Latent Feature Model X = Z · A + ε,    ε ~ N(0, σ²ₙI)
Z ~ IBP(α),    A ~ N(0, σ²ₐI)

This is a Bayesian nonparametric analogue of factor analysis or principal component analysis. The number of latent factors (columns of Z) is inferred from the data rather than fixed. Inference can be performed via Gibbs sampling, where each entry of Z is updated conditional on the rest, or via variational methods that truncate the feature space at a large but finite K.

Applications

IBP models have been applied to image denoising and source separation (each image is a superposition of binary-selected feature images), collaborative filtering (users have latent feature vectors drawn from an IBP), network modeling (nodes possess latent features that govern edge probabilities), and computational biology (genes participate in multiple latent pathways). The ability to learn the number of features from data, while maintaining a coherent probabilistic framework, makes the IBP a foundational tool in Bayesian nonparametrics.

"The Indian buffet process does for features what the Chinese restaurant process did for clusters — it opens up the possibility of learning the dimensionality of the latent space from data." — Zoubin Ghahramani, on the significance of the IBP

Related Topics