Phase-type distribution

Phase-type
Parameters	subgenerator matrix; , probability row vector
Support
PDF	; See article for details
CDF
Mean
Median	no simple closed form
Mode	no simple closed form
Variance
MGF
CF

A phase-type distribution is a probability distribution constructed by a convolution or mixture of exponential distributions.^[1] It results from a system of one or more inter-related Poisson processes occurring in sequence, or phases. The sequence in which each of the phases occurs may itself be a stochastic process. The distribution can be represented by a random variable describing the time until absorption of a Markov process with one absorbing state. Each of the states of the Markov process represents one of the phases.

It has a discrete-time equivalent – the discrete phase-type distribution.

The set of phase-type distributions is dense in the field of all positive-valued distributions, that is, it can be used to approximate any positive-valued distribution.

Definition[edit]

Consider a continuous-time Markov process with m + 1 states, where m ≥ 1, such that the states 1,...,m are transient states and state 0 is an absorbing state. Further, let the process have an initial probability of starting in any of the m + 1 phases given by the probability vector (α₀,α) where α₀ is a scalar and α is a 1 × m vector.

The continuous phase-type distribution is the distribution of time from the above process's starting until absorption in the absorbing state.

This process can be written in the form of a transition rate matrix,

{Q}=\left[{\begin{matrix}0&\mathbf {0} \\\mathbf {S} ^{0}&{S}\\\end{matrix}}\right],

where S is an m × m matrix and S⁰ = –S1. Here 1 represents an m × 1 column vector with every element being 1.

Characterization[edit]

The distribution of time X until the process reaches the absorbing state is said to be phase-type distributed and is denoted PH(α,S).

The distribution function of X is given by,

F(x)=1-{\boldsymbol {\alpha }}\exp({S}x)\mathbf {1} ,

and the density function,

f(x)={\boldsymbol {\alpha }}\exp({S}x)\mathbf {S^{0}} ,

for all x > 0, where exp( · ) is the matrix exponential. It is usually assumed the probability of process starting in the absorbing state is zero (i.e. α₀= 0). The moments of the distribution function are given by

E[X^{n}]=(-1)^{n}n!{\boldsymbol {\alpha }}{S}^{-n}\mathbf {1} .

The Laplace transform of the phase type distribution is given by

M(s)=\alpha _{0}+{\boldsymbol {\alpha }}(sI-S)^{-1}\mathbf {S^{0}} ,

where I is the identity matrix.

Special cases[edit]

The following probability distributions are all considered special cases of a continuous phase-type distribution:

Degenerate distribution, point mass at zero or the empty phase-type distribution – 0 phases.
Exponential distribution – 1 phase.
Erlang distribution – 2 or more identical phases in sequence.
Deterministic distribution (or constant) – The limiting case of an Erlang distribution, as the number of phases become infinite, while the time in each state becomes zero.
Coxian distribution – 2 or more (not necessarily identical) phases in sequence, with a probability of transitioning to the terminating/absorbing state after each phase.
Hyperexponential distribution (also called a mixture of exponential) – 2 or more non-identical phases, that each have a probability of occurring in a mutually exclusive, or parallel, manner. (Note: The exponential distribution is the degenerate situation when all the parallel phases are identical.)
Hypoexponential distribution – 2 or more phases in sequence, can be non-identical or a mixture of identical and non-identical phases, generalises the Erlang.

As the phase-type distribution is dense in the field of all positive-valued distributions, we can represent any positive valued distribution. However, the phase-type is a light-tailed or platykurtic distribution. So the representation of heavy-tailed or leptokurtic distribution by phase type is an approximation, even if the precision of the approximation can be as good as we want.

Examples[edit]

In all the following examples it is assumed that there is no probability mass at zero, that is α₀ = 0.

Exponential distribution[edit]

The simplest non-trivial example of a phase-type distribution is the exponential distribution of parameter λ. The parameter of the phase-type distribution are : S = -λ and α = 1.

Hyperexponential or mixture of exponential distribution[edit]

The mixture of exponential or hyperexponential distribution with λ₁,λ₂,...,λ_n>0 can be represented as a phase type distribution with

{\boldsymbol {\alpha }}=(\alpha _{1},\alpha _{2},\alpha _{3},\alpha _{4},...,\alpha _{n})

with $\sum _{i=1}^{n}\alpha _{i}=1$ and

{S}=\left[{\begin{matrix}-\lambda _{1}&0&0&0&0\\0&-\lambda _{2}&0&0&0\\0&0&-\lambda _{3}&0&0\\0&0&0&-\lambda _{4}&0\\0&0&0&0&-\lambda _{5}\\\end{matrix}}\right].

This mixture of densities of exponential distributed random variables can be characterized through

f(x)=\sum _{i=1}^{n}\alpha _{i}\lambda _{i}e^{-\lambda _{i}x}=\sum _{i=1}^{n}\alpha _{i}f_{X_{i}}(x),

or its cumulative distribution function

F(x)=1-\sum _{i=1}^{n}\alpha _{i}e^{-\lambda _{i}x}=\sum _{i=1}^{n}\alpha _{i}F_{X_{i}}(x).

with $X_{i}\sim Exp(\lambda _{i})$

Erlang distribution[edit]

The Erlang distribution has two parameters, the shape an integer k > 0 and the rate λ > 0. This is sometimes denoted E(k,λ). The Erlang distribution can be written in the form of a phase-type distribution by making S a k×k matrix with diagonal elements -λ and super-diagonal elements λ, with the probability of starting in state 1 equal to 1. For example, E(5,λ),

{\boldsymbol {\alpha }}=(1,0,0,0,0),

and

{S}=\left[{\begin{matrix}-\lambda &\lambda &0&0&0\\0&-\lambda &\lambda &0&0\\0&0&-\lambda &\lambda &0\\0&0&0&-\lambda &\lambda \\0&0&0&0&-\lambda \\\end{matrix}}\right].

For a given number of phases, the Erlang distribution is the phase type distribution with smallest coefficient of variation.^[2]

The hypoexponential distribution is a generalisation of the Erlang distribution by having different rates for each transition (the non-homogeneous case).

Mixture of Erlang distribution[edit]

The mixture of two Erlang distributions with parameter E(3,β₁), E(3,β₂) and (α₁,α₂) (such that α₁ + α₂ = 1 and for each i, α_i ≥ 0) can be represented as a phase type distribution with

{\boldsymbol {\alpha }}=(\alpha _{1},0,0,\alpha _{2},0,0),

and

{S}=\left[{\begin{matrix}-\beta _{1}&\beta _{1}&0&0&0&0\\0&-\beta _{1}&\beta _{1}&0&0&0\\0&0&-\beta _{1}&0&0&0\\0&0&0&-\beta _{2}&\beta _{2}&0\\0&0&0&0&-\beta _{2}&\beta _{2}\\0&0&0&0&0&-\beta _{2}\\\end{matrix}}\right].

Coxian distribution[edit]

The Coxian distribution is a generalisation of the Erlang distribution. Instead of only being able to enter the absorbing state from state k it can be reached from any phase. The phase-type representation is given by,

S=\left[{\begin{matrix}-\lambda _{1}&p_{1}\lambda _{1}&0&\dots &0&0\\0&-\lambda _{2}&p_{2}\lambda _{2}&\ddots &0&0\\\vdots &\ddots &\ddots &\ddots &\ddots &\vdots \\0&0&\ddots &-\lambda _{k-2}&p_{k-2}\lambda _{k-2}&0\\0&0&\dots &0&-\lambda _{k-1}&p_{k-1}\lambda _{k-1}\\0&0&\dots &0&0&-\lambda _{k}\end{matrix}}\right]

and

{\boldsymbol {\alpha }}=(1,0,\dots ,0),

where 0 < p₁,...,p_k-1 ≤ 1. In the case where all p_i = 1 we have the Erlang distribution. The Coxian distribution is extremely important as any acyclic phase-type distribution has an equivalent Coxian representation.

The generalised Coxian distribution relaxes the condition that requires starting in the first phase.

Properties[edit]

Minima of Independent PH Random Variables[edit]

Similarly to the exponential distribution, the class of PH distributions is closed under minima of independent random variables. A description of this is here.

Generating samples from phase-type distributed random variables[edit]

BuTools includes methods for generating samples from phase-type distributed random variables.^[3]

Approximating other distributions[edit]

Any distribution can be arbitrarily well approximated by a phase type distribution.^[4]^[5] In practice, however, approximations can be poor when the size of the approximating process is fixed. Approximating a deterministic distribution of time 1 with 10 phases, each of average length 0.1 will have variance 0.1 (because the Erlang distribution has smallest variance^[2]).

BuTools a MATLAB and Mathematica script for fitting phase-type distributions to 3 specified moments
momentmatching a MATLAB script to fit a minimal phase-type distribution to 3 specified moments^[6]
KPC-toolbox a library of MATLAB scripts to fit empirical datasets to Markovian arrival processes and phase-type distributions.^[7]

Fitting a phase type distribution to data[edit]

Methods to fit a phase type distribution to data can be classified as maximum likelihood methods or moment matching methods.^[8] Fitting a phase type distribution to heavy-tailed distributions has been shown to be practical in some situations.^[9]

PhFit a C script for fitting discrete and continuous phase type distributions to data^[10]
EMpht is a C script for fitting phase-type distributions to data or parametric distributions using an expectation–maximization algorithm.^[11]
HyperStar was developed around the core idea of making phase-type fitting simple and user-friendly, in order to advance the use of phase-type distributions in a wide range of areas. It provides a graphical user interface and yields good fitting results with only little user interaction.^[12]
jPhase is a Java library which can also compute metrics for queues using the fitted phase type distribution^[13]

References[edit]

^ Harchol-Balter, M. (2012). "Real-World Workloads: High Variability and Heavy Tails". Performance Modeling and Design of Computer Systems. pp. 347–348. doi:10.1017/CBO9781139226424.026. ISBN 9781139226424.
^ ^a ^b Aldous, David; Shepp, Larry (1987). "The least variable phase type distribution is erlang" (PDF). Stochastic Models. 3 (3): 467. doi:10.1080/15326348708807067.
^ Horváth, G. B.; Reinecke, P.; Telek, M. S.; Wolter, K. (2012). "Efficient Generation of PH-Distributed Random Variates" (PDF). Analytical and Stochastic Modeling Techniques and Applications. Lecture Notes in Computer Science. Vol. 7314. p. 271. doi:10.1007/978-3-642-30782-9_19. ISBN 978-3-642-30781-2.
^ Bolch, Gunter; Greiner, Stefan; de Meer, Hermann; Trivedi, Kishor S. (1998). "Steady-State Solutions of Markov Chains". Queueing Networks and Markov Chains. pp. 103–151. doi:10.1002/0471200581.ch3. ISBN 0471193666.
^ Cox, D. R. (2008). "A use of complex probabilities in the theory of stochastic processes". Mathematical Proceedings of the Cambridge Philosophical Society. 51 (2): 313–319. doi:10.1017/S0305004100030231. S2CID 122768319.
^ Osogami, T.; Harchol-Balter, M. (2006). "Closed form solutions for mapping general distributions to quasi-minimal PH distributions". Performance Evaluation. 63 (6): 524. doi:10.1016/j.peva.2005.06.002.
^ Casale, G.; Zhang, E. Z.; Smirni, E. (2008). "KPC-Toolbox: Simple Yet Effective Trace Fitting Using Markovian Arrival Processes". 2008 Fifth International Conference on Quantitative Evaluation of Systems (PDF). p. 83. doi:10.1109/QEST.2008.33. ISBN 978-0-7695-3360-5. S2CID 252444.
^ Lang, Andreas; Arthur, Jeffrey L. (1996). "Parameter approximation for Phase-Type distributions". In Chakravarthy, S.; Alfa, Attahiru S. (eds.). Matrix Analytic methods in Stochastic Models. CRC Press. ISBN 0824797663.
^ Ramaswami, V.; Poole, D.; Ahn, S.; Byers, S.; Kaplan, A. (2005). "Ensuring Access to Emergency Services in the Presence of Long Internet Dial-Up Calls". Interfaces. 35 (5): 411. doi:10.1287/inte.1050.0155.
^ Horváth, András S.; Telek, Miklós S. (2002). "PhFit: A General Phase-Type Fitting Tool". Computer Performance Evaluation: Modelling Techniques and Tools. Lecture Notes in Computer Science. Vol. 2324. p. 82. doi:10.1007/3-540-46029-2_5. ISBN 978-3-540-43539-6.
^ Asmussen, Søren; Nerman, Olle; Olsson, Marita (1996). "Fitting Phase-Type Distributions via the EM Algorithm". Scandinavian Journal of Statistics. 23 (4): 419–441. JSTOR 4616418.
^ Reinecke, P.; Krauß, T.; Wolter, K. (2012). "Cluster-based fitting of phase-type distributions to empirical data". Computers & Mathematics with Applications. 64 (12): 3840. doi:10.1016/j.camwa.2012.03.016.
^ Pérez, J. F.; Riaño, G. N. (2006). "jPhase: an object-oriented tool for modeling phase-type distributions". Proceeding from the 2006 workshop on Tools for solving structured Markov chains (SMCtools '06) (PDF). doi:10.1145/1190366.1190370. ISBN 1595935061. S2CID 7863948.

M. F. Neuts (1975), Probability distributions of phase type, In Liber Amicorum Prof. Emeritus H. Florin, Pages 173-206, University of Louvain.
M. F. Neuts. Matrix-Geometric Solutions in Stochastic Models: an Algorithmic Approach, Chapter 2: Probability Distributions of Phase Type; Dover Publications Inc., 1981.
G. Latouche, V. Ramaswami. Introduction to Matrix Analytic Methods in Stochastic Modelling, 1st edition. Chapter 2: PH Distributions; ASA SIAM, 1999.
C. A. O'Cinneide (1990). Characterization of phase-type distributions. Communications in Statistics: Stochastic Models, 6(1), 1-57.
C. A. O'Cinneide (1999). Phase-type distribution: open problems and a few properties, Communication in Statistic: Stochastic Models, 15(4), 731-757.

[1] Harchol-Balter, M. (2012). "Real-World Workloads: High Variability and Heavy Tails". Performance Modeling and Design of Computer Systems. pp. 347–348. doi:10.1017/CBO9781139226424.026. ISBN 9781139226424.

[aldous-2] Aldous, David; Shepp, Larry (1987). "The least variable phase type distribution is erlang" (PDF). Stochastic Models. 3 (3): 467. doi:10.1080/15326348708807067.

[3] Horváth, G. B.; Reinecke, P.; Telek, M. S.; Wolter, K. (2012). "Efficient Generation of PH-Distributed Random Variates" (PDF). Analytical and Stochastic Modeling Techniques and Applications. Lecture Notes in Computer Science. Vol. 7314. p. 271. doi:10.1007/978-3-642-30782-9_19. ISBN 978-3-642-30781-2.

[4] Bolch, Gunter; Greiner, Stefan; de Meer, Hermann; Trivedi, Kishor S. (1998). "Steady-State Solutions of Markov Chains". Queueing Networks and Markov Chains. pp. 103–151. doi:10.1002/0471200581.ch3. ISBN 0471193666.

[5] Cox, D. R. (2008). "A use of complex probabilities in the theory of stochastic processes". Mathematical Proceedings of the Cambridge Philosophical Society. 51 (2): 313–319. doi:10.1017/S0305004100030231. S2CID 122768319.

[6] Osogami, T.; Harchol-Balter, M. (2006). "Closed form solutions for mapping general distributions to quasi-minimal PH distributions". Performance Evaluation. 63 (6): 524. doi:10.1016/j.peva.2005.06.002.

[7] Casale, G.; Zhang, E. Z.; Smirni, E. (2008). "KPC-Toolbox: Simple Yet Effective Trace Fitting Using Markovian Arrival Processes". 2008 Fifth International Conference on Quantitative Evaluation of Systems (PDF). p. 83. doi:10.1109/QEST.2008.33. ISBN 978-0-7695-3360-5. S2CID 252444.

[8] Lang, Andreas; Arthur, Jeffrey L. (1996). "Parameter approximation for Phase-Type distributions". In Chakravarthy, S.; Alfa, Attahiru S. (eds.). Matrix Analytic methods in Stochastic Models. CRC Press. ISBN 0824797663.

[9] Ramaswami, V.; Poole, D.; Ahn, S.; Byers, S.; Kaplan, A. (2005). "Ensuring Access to Emergency Services in the Presence of Long Internet Dial-Up Calls". Interfaces. 35 (5): 411. doi:10.1287/inte.1050.0155.

[10] Horváth, András S.; Telek, Miklós S. (2002). "PhFit: A General Phase-Type Fitting Tool". Computer Performance Evaluation: Modelling Techniques and Tools. Lecture Notes in Computer Science. Vol. 2324. p. 82. doi:10.1007/3-540-46029-2_5. ISBN 978-3-540-43539-6.

[11] Asmussen, Søren; Nerman, Olle; Olsson, Marita (1996). "Fitting Phase-Type Distributions via the EM Algorithm". Scandinavian Journal of Statistics. 23 (4): 419–441. JSTOR 4616418.

[12] Reinecke, P.; Krauß, T.; Wolter, K. (2012). "Cluster-based fitting of phase-type distributions to empirical data". Computers & Mathematics with Applications. 64 (12): 3840. doi:10.1016/j.camwa.2012.03.016.

[13] Pérez, J. F.; Riaño, G. N. (2006). "jPhase: an object-oriented tool for modeling phase-type distributions". Proceeding from the 2006 workshop on Tools for solving structured Markov chains (SMCtools '06) (PDF). doi:10.1145/1190366.1190370. ISBN 1595935061. S2CID 7863948.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]