# Probabilistic automaton

In mathematics and computer science, the probabilistic automaton (PA) is a generalization of the nondeterministic finite automaton; it includes the probability of a given transition into the transition function, turning it into a transition matrix. Thus, the probabilistic automaton generalizes the concept of a Markov chain or subshift of finite type. The languages recognized by probabilistic automata are called stochastic languages; these include the regular languages as a subset. The number of stochastic languages is uncountable.

The concept was introduced by Michael O. Rabin in 1963; a certain special case is sometimes known as the Rabin automaton. In recent years, a variant has been formulated in terms of quantum probabilities, the quantum finite automaton.

## Definition

The probabilistic automaton may be defined as an extension of a non-deterministic finite automaton $(Q,\Sigma ,\delta ,q_{0},F)$ , together with two probabilities: the probability $P$ of a particular state transition taking place, and with the initial state $q_{0}$ replaced by a stochastic vector giving the probability of the automaton being in a given initial state.

For the ordinary non-deterministic finite automaton, one has

• a finite set of states $Q$ • a finite set of input symbols $\Sigma$ • a transition function $\delta :Q\times \Sigma \to P(Q)$ • a set of states $F$ distinguished as accepting (or final) states $F\subset Q$ .

Here, $P(Q)$ denotes the power set of $Q$ .

By use of currying, the transition function $\delta :Q\times \Sigma \to P(Q)$ of a non-deterministic finite automaton can be written as a membership function

$\delta :Q\times \Sigma \times Q\to \{0,1\}$ so that $\delta (q,a,q^{\prime })=1$ if $q^{\prime }\in \delta (q,a)$ and $\delta (q,a,q^{\prime })=0$ if $q^{\prime }\notin \delta (q,a)$ . The curried transition function can be understood to be a matrix with matrix entries

$\left[\theta _{a}\right]_{qq^{\prime }}=\delta (q,a,q^{\prime })$ The matrix $\theta _{a}$ is then a square matrix, whose entries are zero or one, indicating whether a transition $q{\stackrel {a}{\rightarrow }}q^{\prime }$ is allowed by the NFA. Such a transition matrix is always defined for a non-deterministic finite automaton.

The probabilistic automaton replaces these matrices by a family of stochastic matrices $P_{a}$ , for each symbol a in the alphabet $\Sigma$ so that the probability of a transition is given by

$\left[P_{a}\right]_{qq^{\prime }}$ A state change from some state to any state must occur with probability one, of course, and so one must have

$\sum _{q^{\prime }}\left[P_{a}\right]_{qq^{\prime }}=1$ for all input letters $a$ and internal states $q$ . The initial state of a probabilistic automaton is given by a row vector $v$ , whose components are the probabilities of the individual initial states $q$ , that add to 1:

$\sum _{q}\left[v\right]_{q}=1$ The transition matrix acts on the right, so that the state of the probabilistic automaton, after consuming the input string $abc$ , would be

$vP_{a}P_{b}P_{c}$ In particular, the state of a probabilistic automaton is always a stochastic vector, since the product of any two stochastic matrices is a stochastic matrix, and the product of a stochastic vector and a stochastic matrix is again a stochastic vector. This vector is sometimes called the distribution of states, emphasizing that it is a discrete probability distribution.

Formally, the definition of a probabilistic automaton does not require the mechanics of the non-deterministic automaton, which may be dispensed with. Formally, a probabilistic automaton PA is defined as the tuple $(Q,\Sigma ,P,v,F)$ . A Rabin automaton is one for which the initial distribution $v$ is a coordinate vector; that is, has zero for all but one entries, and the remaining entry being one.

## Stochastic languages

The set of languages recognized by probabilistic automata are called stochastic languages. They include the regular languages as a subset.

Let $F=Q_{\text{accept}}\subset Q$ be the set of "accepting" or "final" states of the automaton. By abuse of notation, $Q_{\text{accept}}$ can also be understood to be the column vector that is the membership function for $Q_{\text{accept}}$ ; that is, it has a 1 at the places corresponding to elements in $Q_{\text{accept}}$ , and a zero otherwise. This vector may be contracted with the internal state probability, to form a scalar. The language recognized by a specific automaton is then defined as

$L_{\eta }=\{s\in \Sigma ^{*}\vert vP_{s}Q_{\text{accept}}>\eta \}$ where $\Sigma ^{*}$ is the set of all strings in the alphabet $\Sigma$ (so that * is the Kleene star). The language depends on the value of the cut-point $\eta$ , normally taken to be in the range $0\leq \eta <1$ .

A language is called η-stochastic if and only if there exists some PA that recognizes the language, for fixed $\eta$ . A language is called stochastic if and only if there is some $0\leq \eta <1$ for which $L_{\eta }$ is η-stochastic.

A cut-point is said to be an isolated cut-point if and only if there exists a $\delta >0$ such that

$\vert vP(s)Q_{\text{accept}}-\eta \vert \geq \delta$ for all $s\in \Sigma ^{*}$ ## Properties

Every regular language is stochastic, and more strongly, every regular language is η-stochastic. A weak converse is that every 0-stochastic language is regular; however, the general converse does not hold: there are stochastic languages that are not regular.

Every η-stochastic language is stochastic, for some $0<\eta <1$ .

Every stochastic language is representable by a Rabin automaton.

If $\eta$ is an isolated cut-point, then $L_{\eta }$ is a regular language.

The p-adic languages provide an example of a stochastic language that is not regular, and also show that the number of stochastic languages is uncountable. A p-adic language is defined as the set of strings in the letters $0,1,2,\ldots ,(p-1)$ such that
$L_{\eta }(p)=\{0.n_{1}n_{2}n_{3}\ldots \vert 0\leq n_{k}\eta \}$ That is, a p-adic language is merely the set of real numbers, written in base-p, such that they are greater than $\eta$ . It is straightforward to show that all p-adic languages are stochastic. However, a p-adic language is regular if and only if $\eta$ is rational. In particular, this implies that the number of stochastic languages is uncountable.