# Moving average

For other uses, see Moving average (disambiguation).
An example of two moving average curves

In statistics, a moving average (rolling average or running average) is a calculation to analyze data points by creating series of averages of different subsets of the full data set. It is also called a moving mean (MM)[1] or rolling mean and is a type of finite impulse response filter. Variations include: simple, and cumulative, or weighted forms (described below).

Given a series of numbers and a fixed subset size, the first element of the moving average is obtained by taking the average of the initial fixed subset of the number series. Then the subset is modified by "shifting forward"; that is, excluding the first number of the series and including the next value in the subset.

A moving average is commonly used with time series data to smooth out short-term fluctuations and highlight longer-term trends or cycles. The threshold between short-term and long-term depends on the application, and the parameters of the moving average will be set accordingly. For example, it is often used in technical analysis of financial data, like stock prices, returns or trading volumes. It is also used in economics to examine gross domestic product, employment or other macroeconomic time series. Mathematically, a moving average is a type of convolution and so it can be viewed as an example of a low-pass filter used in signal processing. When used with non-time series data, a moving average filters higher frequency components without any specific connection to time, although typically some kind of ordering is implied. Viewed simplistically it can be regarded as smoothing the data.

## Generic Approach to Moving Average

An element ${\displaystyle v\in V}$ moves in a additive Group (mathematics) or Vector Space V. In a generic approach, we have a moving probability distribution ${\displaystyle P_{v}}$ that defines how the values in the environment of ${\displaystyle v\in V}$ have an impact on the moving average.

### Discrete/continuous Moving Average

According to probability distributions we have to distinguish between a

• discrete (probability mass function ${\displaystyle p_{v}}$) and
• continuous (probability density function ${\displaystyle p_{v}}$)

moving average. The terminology refers to probability distributions and the semantics of probability mass/density function describes the distrubtion of weights around the value ${\displaystyle v\in V}$. In the discrete setting the ${\displaystyle p_{v}(x)=0.2}$ means that ${\displaystyle x}$ has a 20% impact on the moving average ${\displaystyle MA(v)}$ for ${\displaystyle v}$.

### Moving/Shift Distributions

If the probability distribution are shifted by ${\displaystyle v}$ in ${\displaystyle V}$. This means that the probability mass functions ${\displaystyle p_{v}}$ resp. probability density functions ${\displaystyle p_{v}}$ are generated by a probability distribution ${\displaystyle p_{0}}$ at the zero element of the additive group resp. zero vector of the vector space. Due to nature of the collected data f(x) exists for a subset ${\displaystyle T\subseteq V}$. In many cases T are the points in time for which data is collected. The probability and the shift of a distribution is defined by the following property:

• discrete: For all ${\displaystyle x\in V}$ the probability mass function fulfills ${\displaystyle p_{v}(x):=p_{0}(x-v)}$ for ${\displaystyle v\in I}$
• continuous: For all probability density function fulfills ${\displaystyle p_{v}(x):=p_{0}(x-v)}$

The moving average is defined by:

• discrete: (probability mass function ${\displaystyle p_{v}}$)
${\displaystyle MA(v):=\sum _{x\in T}p_{v}(x)\cdot f(x)}$

Remark: ${\displaystyle p_{v}(x)>0}$ for a countable subset of ${\displaystyle V}$

• continuous probability density function ${\displaystyle p_{v}}$
${\displaystyle MA(v):=\int _{T}p_{v}(x)\cdot f(x)\,dx}$

It is important for the definition of probability mass functions resp. probability density functions ${\displaystyle p_{v}}$ that the support (measure theory) of ${\displaystyle p_{v}}$ is a subset of T. This assures that 100% of the probability mass is assigned to collected data. The support ${\displaystyle p_{v}}$ is defined as:

${\displaystyle \mathrm {supp} (p_{v}):={\overline {\{x\in V\mid p_{v}(x)>0\}}}\subset T.}$

## Simple moving average - discrete

In financial applications a simple moving average (SMA) is the unweighted mean of the previous n data. However, in science and engineering the mean is normally taken from an equal number of data on either side of a central value. This ensures that variations in the mean are aligned with the variations in the data rather than being shifted in time. An example of a simple equally weighted running mean for a n-day sample of closing price is the mean of the previous n days' closing prices.

${\displaystyle p_{0}(0)=p_{0}(-1)=\ldots =p_{0}(-(n-1))={\frac {1}{n}}}$

and ${\displaystyle p_{0}(x)=0}$ for ${\displaystyle x\notin \{-n+1,\dots ,-1,0\}}$ with ${\displaystyle V=\mathbb {Z} }$ as additive group.

Let ${\displaystyle C(t)}$ be the cost/price of product at time ${\displaystyle t\in T}$. If those prices are ${\displaystyle C(0),C(1),\dots ,C(97),C(98),C(99),C(100),C(101),\dots }$ and we want to create a simple moving average at day ${\displaystyle t=100}$ and looking back for time span of ${\displaystyle n=5}$ days then the formula is

{\displaystyle {\begin{aligned}SMA(100)&={\frac {1}{5}}\cdot C(100)+{\frac {1}{5}}\cdot C(99)+{\frac {1}{5}}\cdot C(98)+{\frac {1}{5}}\cdot C(97)+{\frac {1}{5}}\cdot C(96)\\&={\frac {1}{5}}\sum _{i=0}^{4}C(100-i)\\&=\sum _{i=0}^{n-1}p_{100}(100-i)\cdot C(100-i)\end{aligned}}}

When calculating successive values for other days/time ${\displaystyle t\in V=\mathbb {Z} }$, a new value comes into the sum and an old value drops out, meaning a full summation each time is unnecessary for this simple case,

{\displaystyle {\begin{aligned}SMA(101)&={\frac {1}{5}}\cdot C(101)+{\frac {1}{5}}\cdot C(100)+{\frac {1}{5}}\cdot C(99)+{\frac {1}{5}}\cdot C(98)+{\frac {1}{5}}\cdot C(97)\end{aligned}}}

${\displaystyle {\textit {SMA}}(t)={\frac {1}{n}}\sum _{i=0}^{n-1}C(t-i)=\sum _{i=0}^{n-1}p_{t}(t-i)\cdot C(t-i)=\sum _{i=0}^{n-1}p_{0}(-i)\cdot C(t-i)}$

The period selected depends on the type of movement of interest, such as short, intermediate, or long-term. In financial terms moving-average levels can be interpreted as support in a falling market, or resistance in a rising market. If you draw a graph for ${\displaystyle {\textit {SMA}}(t)}$ and cost function ${\displaystyle C(t)}$, you will identify, that the graph of ${\displaystyle {\textit {SMA}}}$ runs smoother in the time ${\displaystyle t\in V}$

If the data used are not centered around the mean, a simple moving average lags behind the latest datum point by half the sample width. An SMA can also be disproportionately influenced by old datum points dropping out or new data coming in. One characteristic of the SMA is that if the data have a periodic fluctuation, then applying an SMA of that period will eliminate that variation (the average always containing one complete cycle). But a perfectly regular cycle is rarely encountered.[2]

For a number of applications, it is advantageous to avoid the shifting induced by using only 'past' data. Hence a central moving average can be computed, using data equally spaced on either side of the point in the series where the mean is calculated.[3] This requires using an odd number of datum points in the sample window.

${\displaystyle p_{0}(-n)=p_{0}(-n+1)=\ldots =p_{0}(-1)=p_{0}(0)=p_{0}(1)=\dots =p_{0}(n-1)=p_{0}(n)={\frac {1}{2n+1}}}$

and ${\displaystyle p_{0}(x)=0}$ for ${\displaystyle x\notin \{-n,\dots ,-1,0,1,\dots ,n\}}$ with ${\displaystyle V=\mathbb {Z} }$ as additive group.

${\displaystyle {\textit {CMA}}(t)={\frac {1}{2n+1}}\sum _{i=-n}^{n}C(t+i)=\sum _{i=-n}^{n}p_{t}(t+i)\cdot C(t+i)=\sum _{i=-n}^{n}p_{0}(i)\cdot C(t+i)}$

A major drawback of the SMA is that it lets through a significant amount of the signal shorter than the window length. Worse, it actually inverts it. This can lead to unexpected artifacts, such as peaks in the smoothed result appearing where there were troughs in the data. It also leads to the result being less smooth than expected since some of the higher frequencies are not properly removed.

## Simple moving average - continuous

If we consider a continous measurement of value e.g. a force ${\displaystyle f(t)}$ at time ${\displaystyle t}$. The objective is to smooth the values ${\displaystyle f(t)}$ with a continous simple moving average. We look a time span ${\displaystyle s>0}$ in the past. As probability distribution we use a uniform distribution (mathematics) for the intervall ${\displaystyle [-n,0]}$. The density function is:

${\displaystyle p_{0}(x)={\begin{cases}{\frac {1}{n}}&\mathrm {for} \ -n\leq x\leq 0,\\[8pt]0&\mathrm {for} \ x0\end{cases}}}$ and ${\displaystyle p_{t}(x):=p_{0}(x-t)}$

Application on the moving average definition for continuous probability distriubtions we get:

${\displaystyle SMA(t):=\int _{\mathbb {R} }p_{t}(x)\cdot f(x)\,dx=\int _{t-n}^{t}p_{0}(x-t)\cdot f(x)\,dx={\frac {1}{n}}\int _{t-n}^{t}f(x)\,dx}$

## Cumulative moving average

### Cumulative moving average - discrete

In a cumulative moving average, the data arrive in an ordered datum stream with ${\displaystyle t\in \mathbb {N} _{0}=\{0,1,2,3,\dots \}}$, and the user would like to get the average of all of the data up until the current datum point ${\displaystyle t}$. For example, an investor may want the average price of all of the stock transactions for a particular stock up until the current time ${\displaystyle t}$. The starting point of data collection is ${\displaystyle t=0}$. As each new transaction occurs, the average price at the time of the transaction can be calculated for all of the transactions up to that point using the cumulative average, typically an equally weighted average of the sequence of t+1 values ${\displaystyle x_{0},x_{1}\ldots ,x_{t}}$ up to the current time ${\displaystyle t}$:

${\displaystyle {\textit {CMA}}(t)={\frac {x_{0}+x_{1}+\cdots +x_{t}}{t+1}}\,.}$

The brute-force method to calculate this would be to store all of the data and calculate the sum and divide by the number of datum points every time a new datum point arrived. However, it is possible to simply update cumulative average as a new value, ${\displaystyle x_{t}}$ becomes available, using the formula:

${\displaystyle {\textit {CMA}}(t)={\frac {x_{t}+t\cdot {\textit {CMA}}(t-1)}{t+1}}}$

Thus the current cumulative average ${\displaystyle {\textit {CMA}}(t)}$ for a new datum point ${\displaystyle x_{t}}$ is equal to the previous cumulative average ${\displaystyle {\textit {CMA}}(t-1)}$ at time t-1, times t, plus the latest datum point, all divided by the number of points received so far, n+1. When all of the datum points arrive (n = N), then the cumulative average will equal the final average. It is also possible to store a running total of the datum point as well as the number of points and dividing the total by the number of datum points to get the CMA each time a new datum point arrives.

The derivation of the cumulative average formula is straightforward. Using

${\displaystyle x_{0}+x_{1}+\cdots +x_{t}=(t+1)\cdot {\textit {CMA}}(t)}$

and similarly for t + 1, it is seen that

{\displaystyle {\begin{aligned}x_{t}&=(x_{0}+x_{1}+\cdots +x_{t})-(x_{0}+x_{1}+\cdots +x_{t-1})\\[6pt]&=(t+1)\cdot {\textit {CMA}}(t)-t\cdot {\textit {CMA}}(t-1)\end{aligned}}}

Solving this equation for ${\displaystyle {\textit {CMA}}(t)}$ results in:

{\displaystyle {\begin{aligned}{\textit {CMA}}(t)&={\frac {x_{t}+t\cdot {\textit {CMA}}(t-1)}{t+1}}\end{aligned}}}

### Cumulative moving average - continuous

If we consider a continuous measurement of values e.g. a force ${\displaystyle f(t)}$ at time ${\displaystyle t}$. The objective is to smooth the values ${\displaystyle f(t)}$ with a continous aggregated moving average. We look a time span ${\displaystyle t>0}$ in the past. As probability distribution we use a uniform distribution (mathematics) for the intervall ${\displaystyle [0,t]}$. The density function is:

${\displaystyle p_{t}(x)={\begin{cases}{\frac {1}{t}}&\mathrm {for} \ 0\leq x\leq t,\\[8pt]0&\mathrm {for} \ x<0\ \mathrm {or} \ x>t\end{cases}}}$.

Application on the cumulative moving average definition for continuous probability distriubtions we get:

${\displaystyle SMA(t):=\int _{\mathbb {R} }p_{t}(x)\cdot f(x)\,dx=\int _{0}^{t}p_{t}(x)\cdot f(x)\,dx={\frac {1}{t}}\int _{0}^{t}f(x)\,dx}$

## Moving average applied on images

Pixelization was used to anonymize this photograph

A weighted average is an average that has multiplying factors to give different weights to data at different positions in the sample window. Mathematically, the moving average is the convolution of the datum points with a fixed weighting function. One application is creating a pixelisation from a digital graphical image. For all the image on the right pixelisation is applied for several squares. All pixels in the square are replaced by the color average of all pixels in the square. Because colors are defined by three integer numbers so that color average must be rounded for that application. In order to understand color encoding with integer numbers see HTML Color Picker with the RGB color encoding. Three value between 0 and 255 (e.g. rgb(255, 153, 102) for light orange) encode a color. Due to the fact that the HTML colors for Red, Grenn, Blue (RGB) are integer numbers, the real values of the moving average are rounded as a technical constraint.

The image I with m pixels height and n pixels width is s matrix ${\displaystyle I\in Mat(m\times n,RGB)}$ where all components of the matrix are RGB triples of integer values between 0 and 255, i.e. ${\displaystyle RGB:=\{0,1,\ldots ,255\}^{3}}$.

A single pixel at row r and column c is denoted as ${\displaystyle I_{(r,c)}}$. If we define ${\displaystyle I_{(r,c)}:=(255,153,102)}$ then the

• intensity of red is ${\displaystyle I_{(r,c)}.R=255}$,
• intensity of green is ${\displaystyle I_{(r,c)}.G=153}$,
• intensity of red is ${\displaystyle I_{(r,c)}.B=102}$

If we calculate and average of colors, we calculate the average of red, green and blue separately. As an example we calculate an average of ${\displaystyle 2\times 2}$ sub matrix of the image I for the four pixels:

• ${\displaystyle I_{(r,c)}:=(250,103,21)}$ ${\displaystyle I_{(r,c+1)}:=(230,153,102)}$
• ${\displaystyle I_{(r+1,c)}:=(255,50,12)}$ ${\displaystyle I_{(r+1,c+1)}:=(151,30,20)}$

The calculated moving average for this square is:

• Red: ${\displaystyle A_{red}:=round\left({\frac {250+230+255+151}{4}}\right)=222}$
• Green: ${\displaystyle A_{green}:=round\left({\frac {103+153+50+30}{4}}\right)=84}$
• Blue: ${\displaystyle A_{blue}:=round\left({\frac {21+102+12+20}{4}}\right)=39}$

The calculate moving average for the ${\displaystyle 2\times 2}$ sub matrix of the image I will replace all original colors of the square. Let ${\displaystyle IMA\in Mat(m\times n,RGB)}$ the image with the moving average applied for all ${\displaystyle 2\times 2}$ sub matrices, then the selected sub matrix above in IMA will look like this:

• ${\displaystyle IMA_{(r,c)}:=(222,84,39)}$ ${\displaystyle IMA_{(r,c+1)}:=(222,84,39)}$
• ${\displaystyle IMA_{(r+1,c)}:=(222,84,39)}$ ${\displaystyle IMA_{(r+1,c+1)}:=(222,84,39)}$

The last step assigns the calculated average color rgb(222, 84, 39) to all pixels of the 2x2-square submatrix.

Looking at the example image on right, the application of the moving average are visible, because they are applied on a large submatrix of the image.

For the image processing ${\displaystyle V:=\mathbb {Z} \times \mathbb {Z} }$ with the neutral element ${\displaystyle 0_{V}:=(0,0)}$ as a the additive group with addition:

${\displaystyle (v_{1},v_{2})+(w_{1},w_{2}):=(v_{1}+w_{1},v_{2}+w_{2})}$ and ${\displaystyle T:=\{1,\ldots ,m\}\times \{1,\ldots ,n\}\subset V}$

T is the set of all row and column indices of the pixels. The images is decomposed the squares or even rectangles ${\displaystyle R_{i}}$. The moving average is calculated for all pixels in the rectangle ${\displaystyle R_{i}}$ similar to ${\displaystyle 2\times 2}$ mentioned above. The calculated moving average from the original image I is assigned to all pixels of the square/rectangle ${\displaystyle R_{i}}$ in IMA. If the width and height of the rectangles ${\displaystyle R_{i}}$ have in general a default size. Close the borders of the images, the sizes of these rectangles have to be adapted to the remaining pixels at the right and bottom border of the image I.

## Weighted moving average

In technical analysis of financial data, a weighted moving average (WMA) has the specific meaning of weights that decrease in arithmetical progression.[4] In an n-day WMA the latest day has weight n, the second latest n − 1, etc., down to one. These weights create a discrete probability distribution with:

${\displaystyle s(n):=n+(n-1)+\dots +n={\frac {n\cdot (n+1)}{2}}}$ and ${\displaystyle p_{t}(x)={\begin{cases}{\frac {n-(t-x)}{s(n)}}&\mathrm {for} \ 0\leq t-n\leq x\leq t,\\[8pt]0&\mathrm {for} \ xt\end{cases}}}$

The weighted moving average can be calculated for ${\displaystyle t\geq n}$ with the discrete probability mass function ${\displaystyle p_{t}}$ at time ${\displaystyle t\in \mathbb {N} _{0}:=\{0,1,2\dots ,\}}$, where ${\displaystyle t=0}$ is the initial day, when data collection of the financial data begins and ${\displaystyle C(0)}$ the price/cost of a product at day ${\displaystyle t=0}$. ${\displaystyle C(x)}$ the price/cost of a product at day ${\displaystyle x\in \mathbb {N} _{0}}$ for an arbitrary day x.

${\displaystyle {\text{WMA}}(t):=\sum _{x\in T=\mathbb {N} _{0}}p_{t}(x)\cdot C(x)=\sum _{x=t-n+1}^{t}p_{t}(x)\cdot C(x)={\frac {n\cdot C(t)+(n-1)\cdot C(t-1)+\cdots +2\cdot C(t-n+2)+1\cdot C(t-n+1)}{n+(n-1)+\cdots +2+1}}}$
WMA weights n = 15

The denominator is a triangle number equal to ${\displaystyle {\frac {n(n+1)}{2}}}$ which creates a discrete probability distribution by:

${\displaystyle {\frac {1}{s(n)}}+{\frac {2}{s(n)}}+\ldots +{\frac {n}{s(n)}}={\frac {1+2+\ldots +n}{s(n)}}={\frac {s(n)}{s(n)}}=1}$

The graph at the right shows how the weights decrease, from highest weight at day t for the most recent datum points, down to zero at day t-n.

In the more general case with weights ${\displaystyle w_{0},\ldots ,w_{n}}$ the denominator will always be the sum of the individual weights, i.e.:

${\displaystyle s(n):=\sum _{k=0}^{n}w_{k}}$ and ${\displaystyle w_{0}}$ as weight for for the most recent datum points at day t and ${\displaystyle w_{n}}$ as weight for the day ${\displaystyle t-n}$, which is n-th day before the most recent day ${\displaystyle t}$.

The discrete probability distribution ${\displaystyle p_{t}}$ is defined by:

${\displaystyle p_{t}(x)={\begin{cases}{\frac {w_{t-x}}{s(n)}}&\mathrm {for} \ 0\leq t-n\leq x\leq t,\\[8pt]0&\mathrm {for} \ 0\leq xt\end{cases}}}$

The weighted moving average with arbitrary weights is calculated by:

${\displaystyle {\text{WMA}}(t):=\sum _{x\in T=\mathbb {N} _{0}}p_{t}(x)\cdot C(x)=\sum _{x=t-n}^{t}p_{t}(x)\cdot C(x)={\frac {w_{0}\cdot C(t)+w_{1}\cdot C(t-1)+\cdots +w_{n-1}\cdot C(t-n+1)+w_{n}\cdot C(t-n)}{w_{0}+\cdots +w_{n-1}+w_{n}}}}$

This general approach can be compared to the weights in the exponential moving average in the following section.

## Exponential moving average

Further information: EWMA chart and Exponential smoothing
EMA weights N = 200

An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA),[5] is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease.

The EMA for a series ${\displaystyle C:=(C(0),C(1),\ldots )}$ of collected data with a set of dates ${\displaystyle T:=\mathbb {N} _{0}}$, where ${\displaystyle C(t)}$ is the collected data at time index ${\displaystyle t\in T:=\mathbb {N} _{0}}$.

• First of all we have to define a value ${\displaystyle \alpha }$ with ${\displaystyle 0<\alpha <1}$, that represents the degree of weighting decrease as a constant smoothing factor between 0 and 1. A lower α discounts older observations faster.
• The weights are defined by
${\displaystyle w_{t}=(1-\alpha )\cdot \alpha ^{t}}$ for all ${\displaystyle t\in T:=\mathbb {N} _{0}}$ with ${\displaystyle \sum _{t=0}^{\infty }w_{t}=1}$ (geometric series).
• The sum of weights from 0 to the time index ${\displaystyle t\in T}$ is defined by:
${\displaystyle s(t)=\sum _{k=0}^{t}w_{k}=\sum _{k=0}^{t}(1-\alpha )\cdot \alpha ^{k}=(1-\alpha )\cdot \underbrace {\sum _{k=0}^{t}\alpha ^{k}} _{={\frac {1-\alpha ^{t+1}}{1-\alpha }}}=(1-\alpha )\cdot {\frac {1-\alpha ^{t+1}}{1-\alpha }}=1-\alpha ^{t+1}}$
• The discrete probability mass function is defined by:
${\displaystyle p_{t}(x)={\begin{cases}{\frac {w_{t-x}}{s(t)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}}}&\mathrm {for} \ 0\leq x\leq t,\\[8pt]0&\mathrm {for} \ x<0\ \mathrm {or} \ x>t\end{cases}}}$

The definition above creates the exponential moving average EMA with discrete probability mass function ${\displaystyle p_{t}}$ by

${\displaystyle EMA(t):=\sum _{k\in T}p_{t}(k)\cdot C(k)=\sum _{k=0}^{t}p_{t}(k)\cdot C(k)=\sum _{x=0}^{t}{\frac {w_{t-k}}{s(t)}}\cdot C(k)=\sum _{k=0}^{t}{\frac {(1-\alpha )\cdot \alpha ^{t-k}}{1-\alpha ^{t+1}}}\cdot C(k)}$

The ${\displaystyle EMA}$ at time index ${\displaystyle t\in T:=\mathbb {N} _{0}}$ may be calculated recursively:

${\displaystyle EMA(0):=C(0)}$ and
${\displaystyle EMA(t+1):={\frac {1-\alpha }{1-\alpha ^{t+2}}}\cdot C(t+1)+\alpha \cdot {\frac {1-\alpha ^{t+1}}{1-\alpha ^{t+2}}}\cdot EMA(t)}$ for all ${\displaystyle t\in T=\mathbb {N} _{0}=\{0,1,2,3,...\}}$

Where:

• The coefficient ${\displaystyle \alpha }$ represents the degree of weighting decrease from ${\displaystyle EMA(t)}$ to ${\displaystyle EMA(t+1)}$. This implements the aging of data from ${\displaystyle t}$ to time index ${\displaystyle t+1}$.
• the fraction ${\displaystyle {\frac {1-\alpha ^{t+1}}{1-\alpha ^{t+2}}}}$ adjusts the denominator ${\displaystyle EMA(t)}$ for ${\displaystyle EMA(t+1)}$.
• the coefficient ${\displaystyle {\frac {1-\alpha }{1-\alpha ^{t+2}}}=p_{t+1}(t+1)={\frac {w_{(t+1)-(t+1)}}{s(t+1)}}}$ in the EMA at time index t+1.

### Initialization of EMA and Elimination of MA Impact form old data

${\displaystyle EMA(0)}$ may be initialized in a number of different ways, most commonly by setting ${\displaystyle EMA(0)}$ to the first collected data ${\displaystyle C(0)}$ at time index 0 as shown above, though other techniques exist, such as starting the calculation of the moving average after the first 4 or 5 observations. Furthermore only a most recent subset of collected data before the time index ${\displaystyle t}$ from the total history of collected date might be used for the ${\displaystyle EMA(t)}$. The discrete probability mass function puts weights on the most recent ${\displaystyle m+1}$ values of the collected data by:

${\displaystyle p_{t}(x)={\begin{cases}{\frac {w_{t-x}}{s(t)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}}}&\mathrm {for} \ 0\leq x\leq t\ \mathrm {and} \ t\leq m\\[8pt]{\frac {w_{t-x}}{s(t)-s(t-m)}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{1-\alpha ^{t+1}-(1-\alpha ^{t-m+1})}}={\frac {(1-\alpha )\cdot \alpha ^{t-x}}{\alpha ^{t-m+1}-\alpha ^{t+1}}}&\mathrm {for} \ (t-m)\leq x\leq t\ \mathrm {and} \ t>m,\\[8pt]0&\mathrm {for} \ x<0\ \mathrm {or} \ x>t\end{cases}}}$

The limitations to the most recent ${\displaystyle m+1}$ values of the collected data eliminates the impact of very old data on the resultant moving average completely. By choosing a small ${\displaystyle \alpha }$ old data is less important than recent data and discounts older observations faster, but even the oldest data has a impact on the calculation of ${\displaystyle EMA(t)}$ at time index ${\displaystyle t}$.

Tne initialiation of ${\displaystyle EMA(t)}$ could incorporate something about values prior to the available data, i.e. history before ${\displaystyle t=0}$. Tne initialiation could introduce an error in the ${\displaystyle EMA(t)}$. In view of this the early results should be regarded as unreliable until the iterations have had time to converge. This is sometimes called a 'spin-up' interval.

This formulation of EMA is designed as an application of an expected value, which is a standard definition in probability theory.

According to Hunter (1986).[6] this can be written as an repeated application of the recursive formula for different times ${\displaystyle t}$ without standardisation, i.e.

${\displaystyle \sum _{x\in T}p_{t}(x)=1}$.

An alternate approach defined by Roberts (1959)[7] is missing the standardisation of the probability distribution too, while the basic principle of exponential moving average remains the same.

### Application to measuring computer performance

Some computer performance metrics, e.g. the average process queue length, or the average CPU utilization, use a form of exponential moving average with the recursive definition.

${\displaystyle S_{n}=\alpha (t_{n}-t_{n-1})\times Y_{n}+(1-\alpha (t_{n}-t_{n-1}))\times S_{n-1}.}$

Here α is defined as a function of time between two readings. An example of a coefficient giving bigger weight to the current reading, and smaller weight to the older readings is

${\displaystyle \alpha (t_{n}-t_{n-1})=1-\exp \left({-{{t_{n}-t_{n-1}} \over {W\times 60}}}\right)}$

where exp() is the exponential function, time for readings tn is expressed in seconds, and W is the period of time in minutes over which the reading is said to be averaged (the mean lifetime of each reading in the average). Given the above definition of α, the moving average can be expressed as

${\displaystyle S_{n}=\left(1-\exp \left(-{{t_{n}-t_{n-1}} \over {W\times 60}}\right)\right)\times Y_{n}+\exp \left(-{{t_{n}-t_{n-1}} \over {W\times 60}}\right)\times S_{n-1}}$

For example, a 15-minute average L of a process queue length Q, measured every 5 seconds (time difference is 5 seconds), is computed as

{\displaystyle {\begin{aligned}L_{n}&=\left(1-\exp \left({-{5 \over {15\times 60}}}\right)\right)\times Q_{n}+e^{-{5 \over {15\times 60}}}\times L_{n-1}\\[6pt]&=\left(1-\exp \left({-{1 \over {180}}}\right)\right)\times Q_{n}+e^{-1/180}\times L_{n-1}\\[6pt]&=Q_{n}+e^{-1/180}\times (L_{n-1}-Q_{n})\end{aligned}}}

## Probability Distribution as Distribution of Importance

The definition of expected value provides the mathematical foundation for moving averages in the discrete and continuous setting and the mathematical theory is just an application of basic principles of probability theory. Nevertheless the notion of probability is bit misleading because the semantic of moving average does not refer to probability of events. The probability must be regarded as distribution of importance. In time series e.g. less importance is assigned to older data and that does not mean that older data is less likely than recent data. The events that create the collected data are not considered from probability perspective in general.

Importance can be defined by moving averages by;

• proximity in time (old and recent data)
• proximity in space (see application of the moving average on images above)

To quantify this proximity a Metric or Norm on the underlying vector space ${\displaystyle V}$ can be assigned. Greater distance to reference point in ${\displaystyle v_{o}\in V}$ lead to less importance, e.g. by

${\displaystyle \displaystyle w_{v}:={\frac {1}{1+\|v-v_{0}\|}}\leq 1}$.

The weight for the importance is 1 for ${\displaystyle v=v_{o}}$. For increasing distance measure by the norm ${\displaystyle \|\cdot \|}$ decreases the weight towards 0. Standardization with ${\displaystyle s(n)}$ as sum of all weights for discrete moving averages (as mentioned for EMA) lead to the property of probability distributions:

${\displaystyle \sum _{x\in T}p_{t}(x)=1}$.

Furthermore there are other moving averages that incorporate negative weights. This leads to the fact that

${\displaystyle \sum _{x\in T}p_{t}(x)\not =1}$. This could happen when the positive/negative impact ${\displaystyle I(t)\in \mathbb {R} }$ of collected data ${\displaystyle C(t)}$ is assigned to the weight and the probability mass function. The assignment of impact factors of collected data to the probability/importance values mixes two different properities. This should be avoided and the impact ${\displaystyle I(t)}$ on ${\displaystyle C(t)}$ should be kept separately for a transparent definition of the moving average, i.e.
${\displaystyle MA(t):=\sum _{k\in T}p_{t}(k)\cdot I(k)\cdot C(k)}$ with ${\displaystyle \sum _{x\in T}p_{t}(x)=1}$.

## Other weightings

Other weighting systems are used occasionally – for example, in share trading a volume weighting will weight each time period in proportion to its trading volume.

A further weighting, used by actuaries, is Spencer's 15-Point Moving Average[8] (a central moving average). The symmetric weight coefficients are −3, −6, −5, 3, 21, 46, 67, 74, 67, 46, 21, 3, −5, −6, −3.

Outside the world of finance, weighted running means have many forms and applications. Each weighting function or "kernel" has its own characteristics. In engineering and science the frequency and phase response of the filter is often of primary importance in understanding the desired and undesired distortions that a particular filter will apply to the data.

A mean does not just "smooth" the data. A mean is a form of low-pass filter. The effects of the particular filter used should be understood in order to make an appropriate choice. On this point, the French version of this article discusses the spectral effects of 3 kinds of means (cumulative, exponential, Gaussian).

## Moving median

From a statistical point of view, the moving average, when used to estimate the underlying trend in a time series, is susceptible to rare events such as rapid shocks or other anomalies. A more robust estimate of the trend is the simple moving median over n time points:

${\displaystyle {\textit {SMM}}={\text{Median}}(p_{M},p_{M-1},\ldots ,p_{M-n+1})}$

where the median is found by, for example, sorting the values inside the brackets and finding the value in the middle. For larger values of n, the median can be efficiently computed by updating an indexable skiplist.[9]

Statistically, the moving average is optimal for recovering the underlying trend of the time series when the fluctuations about the trend are normally distributed. However, the normal distribution does not place high probability on very large deviations from the trend which explains why such deviations will have a disproportionately large effect on the trend estimate. It can be shown that if the fluctuations are instead assumed to be Laplace distributed, then the moving median is statistically optimal.[10] For a given variance, the Laplace distribution places higher probability on rare events than does the normal, which explains why the moving median tolerates shocks better than the moving mean.

When the simple moving median above is central, the smoothing is identical to the median filter which has applications in, for example, image signal processing.

## Moving average regression model

Main article: Moving-average model

In a moving average regression model, a variable of interest is assumed to be a weighted moving average of unobserved independent error terms; the weights in the moving average are parameters to be estimated.

Those two concepts are often confused due to their name, but while they share many similarities, they represent distinct methods and are used in very different contexts.