Memorylessness

For the use of the term in materials science, see hysteresis.

In probability and statistics, memorylessness is a property of certain probability distributions: the exponential distributions of non-negative real numbers and the geometric distributions of non-negative integers.

The property is most easily explained in terms of "waiting times." Suppose that a random variable, X, is defined to be the time elapsed in a shop from 9 am on a certain day until the arrival of the first customer: thus X is the time a server waits for the first customer. The "memoryless" property makes a comparison between the probability distributions of the time a server has to wait from 9 am onwards for his first customer, and the time that the server still has to wait for the first customer on those occasions when no customer has arrived by any given later time: the property of memorylessness is that these distributions of "time from now to the next customer" are exactly the same.

As another example, suppose X is the lifetime of a car engine given in terms of number of miles driven. If the engine has lasted 200,000 miles, then, based on our intuition, it is clear that the probability that the engine lasts another 100,000 miles is not the same as the engine lasting 100,000 miles from the first time it was built. However, memorylessness states that the two probabilities are the same. In essence, we 'forget' what state the car is in. In other words, the probabilities are not influenced by how much time has elapsed. [1]

The terms "memoryless" and "memorylessness" are used in a very different way to refer to Markov processes in which the underlying assumption of the Markov property implies that the properties of random variables related to the future depend only on relevant information about the current time, not on information from further in the past. The present article describes the use outside the Markov property, limited to conditional probability distributions.

Discrete memorylessness

Suppose X is a discrete random variable whose values lie in the set { 0, 1, 2, ... }. The probability distribution of X is memoryless precisely if for any m, n in { 0, 1, 2, ... }, we have

$\Pr(X>m+n \mid X > m)=\Pr(X>n).$

Here, Pr(X > m + n | X  >  m) denotes the conditional probability that the value of X is larger than m + n, given that it is larger than or equal to m.

The only memoryless discrete probability distributions are the geometric distributions, which feature the number of independent Bernoulli trials needed to get one "success," with a fixed probability p of "success" on each trial. In other words those are the distributions of waiting time in a Bernoulli process.

A frequent misunderstanding

"Memorylessness" of the probability distribution of the number of trials X until the first success means that

$\mathrm{}\ \Pr(X>40 \mid X > 30)=\Pr(X>10).\,$

It does not mean that

$\mathrm{}\ \Pr(X>40 \mid X > 30)=\Pr(X>40)\,$

which would be true only if the events X > 40 and X > 30 were independent. However, we can make some valid rearrangements,

$\Pr(X>40 \mid X > 30) = \frac{ \Pr(X>40 , X > 30) }{ \Pr(X > 30) }= \frac{ \Pr(X>40) }{ \Pr(X > 30) }$ .

Continuous memorylessness

Suppose X is a continuous random variable whose values lie in the non-negative real numbers [0, ∞). The probability distribution of X is memoryless precisely if for any non-negative real numbers t and s, we have

$\Pr(X>t+s \mid X>t)=\Pr(X>s).\,$

This is similar to the discrete version except that s and t are constrained only to be non-negative real numbers instead of integers. Rather than counting trials until the first "success," for example, we may be marking time until the arrival of the first phone call at a switchboard.

Geometric distributions and exponential distributions are discrete and continuous analogs.

The memoryless distributions are the exponential distributions

The only memoryless continuous probability distributions are the exponential distributions, so memorylessness completely characterizes the exponential distributions among all continuous ones.

To see this, first define the survival function, G, as

$G(t) = \Pr(X > t).\,$

Note that G(t) is then monotonically decreasing. From the relation

$\Pr(X > t + s | X > t) = \Pr(X > s)\,$

and the definition of conditional probability, it follows that

${\Pr(X > t + s) \over \Pr(X > t)} = \Pr(X > s).$

This gives the functional equation

$G(t + s) = G(t) G(s)\,$

and solutions of this can be sought under the condition that G is a monotone decreasing function (meaning that for times $x\leq y$, then $G(x)\geq G(y)$).

The functional equation alone will imply that G restricted to rational multiples of any particular number is an exponential function. Combined with the fact that G is monotone, this implies that G over its whole domain is an exponential function.