Memorylessness

In probability and statistics, memorylessness is a property of certain probability distributions. It usually refers to the cases when the distribution of a "waiting time" until a certain event does not depend on how much time has elapsed already. To model memoryless situations accurately, we must constantly 'forget' which state the system is in: the probabilities would not be influenced by the history of the process.^[1]

Only two kinds of distributions are memoryless: geometric distributions of non-negative integers and the exponential distributions of non-negative real numbers.

In the context of Markov processes, memorylessness refers to the Markov property,^[2] an even stronger assumption which implies that the properties of random variables related to the future depend only on relevant information about the current time, not on information from further in the past. The present article describes the use outside the Markov property.

Waiting time examples

With memory

Most phenomena are not memoryless, which means that observers will obtain information about them over time. For example, suppose that $X$ is a random variable, the lifetime of a car engine, expressed in terms of "number of miles driven until the engine breaks down". It is clear, based on our intuition, that an engine which has already been driven for 300,000 miles will have a much lower $X$ than would a second (equivalent) engine which has only been driven for 1,000 miles. Hence, this random variable would not have the memorylessness property.

Without memory

In contrast, let us examine a situation which would exhibit memorylessness. Imagine a long hallway, lined on one wall with thousands of safes. Each safe has a dial with 500 positions, and each has been assigned an opening position at random. Imagine that an eccentric person walks down the hallway, stopping once at each safe to make a single random attempt to open it. In this case, we might define random variable $X$ as the lifetime of their search, expressed in terms of "number of attempts the person must make until they successfully open a safe". In this case, $E[X]$ will always be equal to the value of 500, regardless of how many attempts have already been made. Each new attempt has a (1/500) chance of succeeding, so the person is likely to open exactly one safe sometime in the next 500 attempts – but with each new failure they make no "progress" toward ultimately succeeding. Even if the safe-cracker has just failed 499 consecutive times (or 4,999 times), we expect to wait 500 more attempts until we observe the next success. If, instead, this person focused their attempts on a single safe, and "remembered" their previous attempts to open it, they would be guaranteed to open the safe after, at most, 500 attempts (and, in fact, at onset would only expect to need 250 attempts, not 500).

Real-life examples of memorylessness include the universal law of radioactive decay, which describes the time until a given radioactive particle decays, and, potentially, the time until the discovery of a new Bitcoin block, though this has been put in question.^[3] An often used (theoretical) example of memorylessness in queueing theory is the time a storekeeper must wait before the arrival of the next customer.

Discrete memorylessness

Suppose $X$ is a discrete random variable whose values lie in the set {0, 1, 2, ...}. The probability distribution of $X$ is memoryless precisely if for any $m$ and $n$ in ${0, 1, 2, ...}$ , we have

\Pr(X>m+n\mid X\geq m)=\Pr(X>n).

Here, $Pr(X > m + n | X \geq m)$ denotes the conditional probability that the value of $X$ is greater than $m + n$ given that it is greater than or equal to $m$ .

The only memoryless discrete probability distributions are the geometric distributions, which count the number of independent, identically distributed Bernoulli trials needed to get one "success". In other words, these are the distributions of waiting time in a Bernoulli process.

Note that the above definition applies to the definition of geometric distribution with support {0, 1, 2, ...}. The alternative parameterization with support {1, 2, ...} corresponds to a slightly different definition of discrete memorylessness: namely, that $\Pr(X>m+n\mid X>m)=\Pr(X>n).$

A common misunderstanding

"Memorylessness" of the probability distribution of the number of failures $X$ before the first success means that, for example,

\Pr(X>40\mid X\geq 30)=\Pr(X>10).

It does not mean that

\Pr(X>40\mid X\geq 30)=\Pr(X>40),

which would be true only if the events $X > 40$ and $X \geq 30$ were independent, i.e. $\Pr(X\geq 30)=1.$

Continuous memorylessness

Suppose $X$ is a continuous random variable whose values lie in the non-negative real numbers $[0, \infty)$ . The probability distribution of $X$ is memoryless precisely if for any non-negative real numbers $t$ and $s$ , we have

\Pr(X>t+s\mid X>t)=\Pr(X>s).

This is similar to the discrete version, except that $s$ and $t$ are constrained only to be non-negative real numbers instead of integers. Rather than counting trials until the first "success", for example, we may be marking time until the arrival of the first phone call at a switchboard.

The memoryless distribution is an exponential distribution

The only memoryless continuous probability distribution is the exponential distribution, so memorylessness completely characterizes the exponential distribution among all continuous ones. The property is derived through the following proof:

To see this, first define the survival function, $S$ , as

S(t)=\Pr(X>t).

Note that $S (t)$ is then monotonically decreasing. From the relation

\Pr(X>t+s\mid X>t)=\Pr(X>s)

and the definition of conditional probability, it follows that

{\frac {\Pr(X>t+s)}{\Pr(X>t)}}=\Pr(X>s).

This gives the functional equation (which is a result of the memorylessness property):

S(t+s)=S(t)S(s)

From this, we must have for example:

S(2)=S(1)^{2}\quad

S(1)=S(1/2)^{2}{\text{ i.e.}}\quad S(1/2)=S(1)^{1/2}.

In general:

S(a)=S(1)^{a}

The only continuous function that will satisfy this equation for any positive, rational $a$ is:

S(a)=S(1)^{a}=e^{\ln(S(1))a}=e^{-\lambda a},

where $\lambda =-\ln(S(1)).$

Therefore, since $S (a)$ is a probability and must have $\lambda >0,$ then any memorylessness function must be an exponential.

Put a different way, $S$ is a monotone decreasing function (meaning that for times $x\leq y,$ then $S(x)\geq S(y).$ )

The functional equation alone will imply that $S$ restricted to rational multiples of any particular number is an exponential function. Combined with the fact that $S$ is monotone, this implies that $S$ over its whole domain is an exponential function.

Notes

^ "Notes on Memoryless Random Variables" (PDF).
^ "Markov Chains and Random Walks" (PDF).
^ Bowden, Rory; Keeler, Holger Paul; Krzezinski, Anthony E.; Taylor, Peter G. (2018). "Block arrivals in the Bitcoin blockchain". arXiv:1801.07447 [cs.CR].

References

Feller, W. (1971) Introduction to Probability Theory and Its Applications, Vol II (2nd edition),Wiley. Section I.3 ISBN 0-471-25709-5

[1] "Notes on Memoryless Random Variables" (PDF).

[2] "Markov Chains and Random Walks" (PDF).

[3] Bowden, Rory; Keeler, Holger Paul; Krzezinski, Anthony E.; Taylor, Peter G. (2018). "Block arrivals in the Bitcoin blockchain". arXiv:1801.07447 [cs.CR].

[1]

[2]

[3]