= Dependability state model =

A dependability state diagram is a method for modelling a system as a Markov chain. It is used in reliability engineering for availability and reliability analysis.

It consists of creating a finite-state machine which represent the different
states a system may be in. Transitions between states happen as a result of events from underlying Poisson processes with different intensities.

== Example ==

A redundant computer system consist of identical two-compute nodes, which each fail with an intensity of $\lambda$. When failed, they are repaired one at the time by a single repairman with negative exponential distributed repair times with expectation $\mu^{-1}$.

- state 0: 0 failed units, normal state of the system.
- state 1: 1 failed unit, system operational.
- state 2: 2 failed units. system not operational.

Intensities from state 0 and state 1 are $2\lambda$, since each compute node has a failure intensity of $\lambda$. Intensity from state 1 to state 2 is $\lambda$.
Transitions from state 2 to state 1 and state 1 to state 0 represent the repairs of the compute nodes and have the intensity $\mu$, since only a single unit is repaired at the time.

=== Availability ===

The asymptotic availability, i.e. availability over a long period, of the system is equal to the probability that the model is in state 1 or state 2.

This is calculated by making a set of linear equations of the state transition and solving the linear system.

The matrix is constructed with a row for each state. In a row, the intensity into the state is set in the column with the same index, with a negative term.

 $\mathbf{A_0} = \begin{bmatrix}
0 & -\mu & 0 \\
-\lambda & 0 & -\mu \\
0 & \lambda & 0
\end{bmatrix}.$

The identities cells balance the sum of their column to 0:

 $\mathbf{A_1} = \begin{bmatrix}
(\lambda) & -\mu & 0 \\
-\lambda & (\lambda+\mu) & -\mu \\
0 & -\lambda & (\mu) \\
\end{bmatrix}.$

In addition the equality clause must be taken into account:

$\sum_n P_n = 1.$

By solving this equation, the probability of being in state 1 or state 2 can be found, which
is equal to the long-term availability of the service.

=== Reliability ===

The reliability of the system is found by making the failure states absorbing, i.e. removing all outgoing state transitions.

For this system the function is:

 $R(t) = e^{-\lambda t} \,$

== Criticism ==

Finite state models of systems are subject to state explosion. To create
a realistic model of a system one ends up with a model with so many states that it is infeasible to solve or draw the model.
