Chinese restaurant process: Difference between revisions

Content deleted Content added

Inline

Revision as of 07:08, 15 January 2016

For other uses, see Chinese restaurant (disambiguation).

In probability theory, the Chinese restaurant process is a discrete-time stochastic process, analogous to seating customers at tables in a Chinese restaurant. Imagine a Chinese restaurant with an infinite number of circular tables, each with infinite capacity. Customer 1 is seated at an unoccupied table with probability 1. At time n + 1, a new customer chooses uniformly at random to sit at one of the following n + 1 places: directly to the left of one of the n customers already sitting at an occupied table, or at a new, unoccupied table.

At time n, the value of the process is a partition of the set of n customers, where the tables are the blocks of the partition. Mathematicians are interested in the probability distribution of this random partition.

David J. Aldous attributes the restaurant analogy to Jim Pitman and Lester Dubins in his 1983 book.^[1]

Formal definition

At any positive-integer time n, the value of the process is a partition B_n of the set {1, 2, 3, ..., n}, whose probability distribution is determined as follows. At time n = 1, the trivial partition { {1} } is obtained with probability 1. At time n + 1 the element n + 1 is either:

added to one of the blocks of the partition B_n, where each block is chosen with probability |b|/(n + 1) where |b| is the size of the block, or
added to the partition B_n as a new singleton block, with probability 1/(n + 1).

The random partition so generated has some special properties. It is exchangeable in the sense that relabeling {1, ..., n} does not change the distribution of the partition, and it is consistent in the sense that the law of the partition of n − 1 obtained by removing the element n from the random partition at time n is the same as the law of the random partition at time n − 1.

The probability assigned to any particular partition (ignoring the order in which customers sit around any particular table) is

\Pr(B_{n}=B)={\dfrac {\prod _{b\in B}(|b|-1)!}{n!}}

where b is a block in the partition B and |b| is the size (i.e. number of elements) of b.

Generalization

This construction can be generalized to a model with two parameters, α and θ,^[2]^[3] commonly called the discount and strength (or concentration) parameters. At time n + 1, the next customer to arrive finds |B| occupied tables and decides to sit at an empty table with probability

{\dfrac {\theta +|B|\alpha }{n+\theta }},

or at an occupied table b of size |b| with probability

{\dfrac {|b|-\alpha }{n+\theta }}.

In order for the construction to define a valid probability measure it is necessary to suppose that either α < 0 and θ = - Lα for some L ∈ {1, 2, ...}; or that 0 ≤ α < 1 and θ > −α.

Under this model the probability assigned to any particular partition B of n, in terms of the Pochhammer k-symbol, is

\Pr(B_{n}=B)={\dfrac {(\theta +\alpha )_{|B|-1,\alpha }}{(\theta +1)_{n-1,1}}}\prod _{b\in B}(1-\alpha )_{|b|-1,1}

where, by convention, $(a)_{0,c}=1$ , and for $b>0$

(a)_{b,c}=\prod _{i=0}^{b-1}(a+ic)={\begin{cases}a^{b}&{\text{if }}c=0,\\\\{\dfrac {c^{b}\,\Gamma (a/c+b)}{\Gamma (a/c)}}&{\text{otherwise}}.\end{cases}}

Thus, for the case when $\theta >0$ the partition probability can be expressed in terms of the Gamma function as

\Pr(B_{n}=B)={\dfrac {\Gamma (\theta )}{\Gamma (\theta +n)}}{\dfrac {\alpha ^{|B|}\,\Gamma (\theta /\alpha +|B|)}{\Gamma (\theta /\alpha )}}\prod _{b\in B}{\dfrac {\Gamma (|b|-\alpha )}{\Gamma (1-\alpha )}}.

In the one-parameter case, where $\alpha$ is zero, this simplifies to

\Pr(B_{n}=B)={\dfrac {\Gamma (\theta )\,\theta ^{|B|}}{\Gamma (\theta +n)}}\prod _{b\in B}\Gamma (|b|).

Or, when $\theta$ is zero,

\Pr(B_{n}=B)={\dfrac {\alpha ^{|B|-1}\,\Gamma (|B|)}{\Gamma (n)}}\prod _{b\in B}{\dfrac {\Gamma (|b|-\alpha )}{\Gamma (1-\alpha )}}.

As before, the probability assigned to any particular partition depends only on the block sizes, so as before the random partition is exchangeable in the sense described above. The consistency property still holds, as before, by construction.

If α = 0, the probability distribution of the random partition of the integer n thus generated is the Ewens distribution with parameter θ, used in population genetics and the unified neutral theory of biodiversity.

Derivation

Here is one way to derive this partition probability. Let C_i be the random block into which the number i is added, for i = 1, 2, 3, ... . Then

\Pr(C_{i}=c|C_{1},\ldots ,C_{i-1})={\begin{cases}{\dfrac {\theta +|B|\alpha }{\theta +i-1}}&{\text{if }}c\in {\text{new block}},\\\\{\dfrac {|b|-\alpha }{\theta +i-1}}&{\text{if }}c\in b;\end{cases}}

The probability that B_n is any particular partition of the set { 1, ..., n } is the product of these probabilities as i runs from 1 to n. Now consider the size of block b: it increases by 1 each time we add one element into it. When the last element in block b is to be added in, the block size is (|b| − 1). For example, consider this sequence of choices: (generate a new block b)(join b)(join b)(join b). In the end, block b has 4 elements and the product of the numerators in the above equation gets θ · 1 · 2 · 3. Following this logic, we obtain Pr(B_n = B) as above.

Expected number of tables

For the one parameter case, with α = 0 and 0 < θ < ∞, the expected number of tables, given that there are $n$ seated customers, is^[4]

{\begin{aligned}\sum _{k=1}^{n}{\frac {\theta }{\theta +k-1}}=\theta \cdot (\Psi (\theta +n)-\Psi (\theta ))\end{aligned}}

where $\Psi (\theta )$ is the digamma function. In the general case (α > 0) the expected number of occupied tables is^[3]

{\begin{aligned}{\frac {\Gamma (\theta +n+\alpha )\Gamma (\theta +1)}{\alpha \Gamma (\theta +n)\Gamma (\theta +\alpha )}}-{\frac {\theta }{\alpha }}.\end{aligned}}

The Indian buffet process

It is possible to adapt the model such that each data point is no longer uniquely associated with a class (i.e. we are no longer constructing a partition), but may be associated with any combination of the classes. This strains the restaurant-tables analogy and so is instead likened to a process in which a series of diners samples from some subset of an infinite selection of dishes on offer at a buffet. The probability that a particular diner samples a particular dish is proportional to the popularity of the dish among diners so far, and in addition the diner may sample from the untested dishes. This has been named the Indian buffet process and can be used to infer latent features in data.^[5]

Applications

The Chinese restaurant process is closely connected to Dirichlet processes and Pólya's urn scheme, and therefore useful in applications of nonparametric Bayesian methods including Bayesian statistics. The Generalized Chinese Restaurant Process is closely related to Pitman–Yor process. These processes have been used in many applications, including modeling text, clustering biological microarray data,^[6] biodiversity modelling and detecting objects in images ^{[citation needed]}.

References

^ Aldous, D. J. (1985). "Exchangeability and related topics". École d'Été de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics. Vol. 1117. pp. 1–1. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3.
^ Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145–158. doi:10.1007/BF01213386. MR 1337249.
^ ^a ^b Pitman, Jim (2006). Combinatorial Stochastic Processes. Berlin: Springer-Verlag.
^ Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf Archived 2011-04-11 at the Wayback Machine
^ Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process. Gatsby Unit Technical Report GCNU-TR-2005-001.
^ Qin, Zhaohui S. "Clustering microarray gene expression data using weighted Chinese restaurant process." Bioinformatics 22.16 (2006): 1988-1997.

External links

Introduction to the Dirichlet Distribution and Related Processes by Frigyik, Kapila and Gupta

A talk by Michael I. Jordan on the CRP:
- http://videolectures.net/icml05_jordan_dpcrp/

[1] Aldous, D. J. (1985). "Exchangeability and related topics". École d'Été de Probabilités de Saint-Flour XIII — 1983. Lecture Notes in Mathematics. Vol. 1117. pp. 1–1. doi:10.1007/BFb0099421. ISBN 978-3-540-15203-3.

[2] Pitman, Jim (1995). "Exchangeable and Partially Exchangeable Random Partitions". Probability Theory and Related Fields. 102 (2): 145–158. doi:10.1007/BF01213386. MR 1337249.

[Pitman2006-3] Pitman, Jim (2006). Combinatorial Stochastic Processes. Berlin: Springer-Verlag.

[4] Xinhua Zhang, "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf Archived 2011-04-11 at the Wayback Machine

[ibpreport-5] Griffiths, T.L. and Ghahramani, Z. (2005) Infinite Latent Feature Models and the Indian Buffet Process. Gatsby Unit Technical Report GCNU-TR-2005-001.

[6] Qin, Zhaohui S. "Clustering microarray gene expression data using weighted Chinese restaurant process." Bioinformatics 22.16 (2006): 1988-1997.

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 102: / Line 102: @@
 === Expected number of tables ===
-For the one parameter case, with ''α''&nbsp;=&nbsp;0 and 0&nbsp;<&nbsp;''θ''&nbsp;<&nbsp;∞, the expected number of tables, given that there are <math>n</math> seated customers, is<ref>Xinhua Zhang,  "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf {{Dead link|date=February 2013}}</ref>
+For the one parameter case, with ''α''&nbsp;=&nbsp;0 and 0&nbsp;<&nbsp;''θ''&nbsp;<&nbsp;∞, the expected number of tables, given that there are <math>n</math> seated customers, is<ref>Xinhua Zhang,  "A Very Gentle Note on the Construction of Dirichlet Process", September 2008, The Australian National University, Canberra. Online: http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf  {{wayback|url=http://users.cecs.anu.edu.au/~xzhang/pubDoc/notes/dirichlet_process.pdf |date=20110411124712 }}</ref>
 :<math>

v t e Stochastic processes
Discrete time	Bernoulli process Branching process Chinese restaurant process Galton–Watson process Independent and identically distributed random variables Markov chain Moran process Random walk Loop-erased Self-avoiding Biased Maximal entropy
Continuous time	Additive process Bessel process Birth–death process pure birth Brownian motion Bridge Excursion Fractional Geometric Meander Cauchy process Contact process Continuous-time random walk Cox process Diffusion process Dyson Brownian motion Empirical process Feller process Fleming–Viot process Gamma process Geometric process Hawkes process Hunt process Interacting particle systems Itô diffusion Itô process Jump diffusion Jump process Lévy process Local time Markov additive process McKean–Vlasov process Ornstein–Uhlenbeck process Poisson process Compound Non-homogeneous Schramm–Loewner evolution Semimartingale Sigma-martingale Stable process Superprocess Telegraph process Variance gamma process Wiener process Wiener sausage
Both	Branching process Gaussian process Hidden Markov model (HMM) Markov process Martingale Differences Local Sub- Super- Random dynamical system Regenerative process Renewal process Stochastic chains with memory of variable length White noise
Fields and other	Dirichlet process Gaussian random field Gibbs measure Hopfield model Ising model Potts model Boolean network Markov random field Percolation Pitman–Yor process Point process Cox Poisson Random field Random graph
Time series models	Autoregressive conditional heteroskedasticity (ARCH) model Autoregressive integrated moving average (ARIMA) model Autoregressive (AR) model Autoregressive–moving-average (ARMA) model Generalized autoregressive conditional heteroskedasticity (GARCH) model Moving-average (MA) model
Financial models	Binomial options pricing model Black–Derman–Toy Black–Karasinski Black–Scholes Chan–Karolyi–Longstaff–Sanders (CKLS) Chen Constant elasticity of variance (CEV) Cox–Ingersoll–Ross (CIR) Garman–Kohlhagen Heath–Jarrow–Morton (HJM) Heston Ho–Lee Hull–White Korn-Kreer-Lenssen LIBOR market Rendleman–Bartter SABR volatility Vašíček Wilkie
Actuarial models	Bühlmann Cramér–Lundberg Risk process Sparre–Anderson
Queueing models	Bulk Fluid Generalized queueing network M/G/1 M/M/1 M/M/c
Properties	Càdlàg paths Continuous Continuous paths Ergodic Exchangeable Feller-continuous Gauss–Markov Markov Mixing Piecewise-deterministic Predictable Progressively measurable Self-similar Stationary Time-reversible
Limit theorems	Central limit theorem Donsker's theorem Doob's martingale convergence theorems Ergodic theorem Fisher–Tippett–Gnedenko theorem Large deviation principle Law of large numbers (weak/strong) Law of the iterated logarithm Maximal ergodic theorem Sanov's theorem Zero–one laws (Blumenthal, Borel–Cantelli, Engelbert–Schmidt, Hewitt–Savage, Kolmogorov, Lévy)
Inequalities	Burkholder–Davis–Gundy Doob's martingale Doob's upcrossing Kunita–Watanabe Marcinkiewicz–Zygmund
Tools	Cameron–Martin formula Convergence of random variables Doléans-Dade exponential Doob decomposition theorem Doob–Meyer decomposition theorem Doob's optional stopping theorem Dynkin's formula Feynman–Kac formula Filtration Girsanov theorem Infinitesimal generator Itô integral Itô's lemma Karhunen–Loève theorem Kolmogorov continuity theorem Kolmogorov extension theorem Lévy–Prokhorov metric Malliavin calculus Martingale representation theorem Optional stopping theorem Prokhorov's theorem Quadratic variation Reflection principle Skorokhod integral Skorokhod's representation theorem Skorokhod space Snell envelope Stochastic differential equation Tanaka Stopping time Stratonovich integral Uniform integrability Usual hypotheses Wiener space Classical Abstract
Disciplines	Actuarial mathematics Control theory Econometrics Ergodic theory Extreme value theory (EVT) Large deviations theory Mathematical finance Mathematical statistics Probability theory Queueing theory Renewal theory Ruin theory Signal processing Statistics Stochastic analysis Time series analysis Machine learning
List of topics Category