# Mean absolute percentage error

The mean absolute percentage error (MAPE), also known as mean absolute percentage deviation (MAPD), is a measure of prediction accuracy of a forecasting method in statistics, for example in trend estimation, also used as a loss function for regression problems in machine learning. It usually expresses the accuracy as a ratio defined by the formula:

${\displaystyle {\mbox{M}}={\frac {1}{n}}\sum _{t=1}^{n}\left|{\frac {A_{t}-F_{t}}{A_{t}}}\right|,}$

where At is the actual value and Ft is the forecast value. Note that the MAPE is also sometimes reported as a percentage, which is the above equation multiplied by 100. The difference between At and Ft is divided by the actual value At again. The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n. Multiplying by 100% makes it a percentage error.

## MAPE Regressions

Mean absolute percentage error is commonly used as a loss function for regression problems and in model evaluation, because of its very intuitive interpretation in terms of relative error.

Definition

Consider a standard regression setting in which the data are fully described by a random pair ${\displaystyle Z=(X,Y)}$ with values in ${\displaystyle \mathbb {R} ^{d}\times \mathbb {R} }$, and n i.i.d. copies ${\displaystyle (X_{1},Y_{1}),...,(X_{n},Y_{n})}$ of ${\displaystyle (X,Y)}$. Regression models aims at finding a good model for the pair, that is a measurable function ${\displaystyle g}$ from ${\displaystyle \mathbb {R} ^{d}}$ to ${\displaystyle \mathbb {R} }$ such that ${\displaystyle g(X)}$ is “close to” ${\displaystyle Y}$ .

In the classical regression setting, the closeness of ${\displaystyle g(X)}$ to ${\displaystyle Y}$ is measured via the L2 risk, also called the Mean squared error (MSE). In the MAPE regression context[1], the closeness of ${\displaystyle g(X)}$ to ${\displaystyle Y}$ is measured via the MAPE, and the aim of MAPE regressions is to find a model ${\displaystyle g_{MAPE}}$ such that:

${\displaystyle g_{MAPE}(x)=\arg \min _{g\in {\mathcal {G}}}\mathbb {E} \left[\left|{\frac {g(X)-Y}{Y}}\right||X=x\right]}$

where ${\displaystyle {\mathcal {G}}}$ is the class of models considered (e.g. linear models).

In practice

In practice ${\displaystyle g_{MAPE}(x)}$ can be estimated by the Empirical Risk Minimization strategy, leading to

${\displaystyle {\widehat {g}}_{MAPE}(x)=\arg \min _{g\in {\mathcal {G}}}\sum _{i=1}^{n}\left|{\frac {g(X_{i})-Y_{i}}{Y_{i}}}\right|}$

From a practical point of view, the use of the MAPE as a quality function for regression model is equivalent to doing weighted Mean absolute error (MAE) regression, also known as quantile regression. This property is trivial since

${\displaystyle {\widehat {g}}_{MAPE}(x)=\arg \min _{g\in {\mathcal {G}}}\sum _{i=1}^{n}\omega (Y_{i})\left|g(X_{i})-Y_{i}\right|{\mbox{ with }}\omega (Y_{i})=\left|{\frac {1}{Y_{i}}}\right|}$

As a consequence, the use of the MAPE is very easy in practice, for example using existing libraries for quantile regression allowing weights.

Consistency

The use of the MAPE as a loss function for Regression analysis is feasible both on a practical point of view and on a theoretical one, since the existence of an optimal model and the consistency of the Empirical risk minimization can be proved [1].

## Alternative MAPE definitions

Problems can occur when calculating the MAPE value with a series of small denominators. A singularity problem of the form 'one divided by zero' and/or the creation of very large changes in the Absolute Percentage Error, caused by a small deviation in error, can occur.

As an alternative, each actual value (At) of the series in the original formula can be replaced by the average of all actual values (Āt) of that series. This alternative is still being used for measuring the performance of models that forecast spot electricity prices.[2]

Note that this is the same as dividing the sum of absolute differences by the sum of actual values, and is sometimes referred to as WAPE (weighted absolute percentage error).

## Issues

Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application [3], and there are many studies on shortcomings and misleading results from MAPE.[4][5]

• It cannot be used if there are zero values (which sometimes happens for example in demand data) because there would be a division by zero.
• For forecasts which are too low the percentage error cannot exceed 100%, but for forecasts which are too high there is no upper limit to the percentage error.
• MAPE puts a heavier penalty on negative errors, ${\displaystyle A_{t} than on positive errors.[6]. As a consequence, when MAPE is used to compare the accuracy of prediction methods it is biased in that it will systematically select a method whose forecasts are too low. This little-known but serious issue can be overcome by using an accuracy measure based on the logarithm of the accuracy ratio (the ratio of the predicted to actual value), given by ${\displaystyle \log \left({\frac {\text{predicted}}{\text{actual}}}\right)}$. This approach leads to superior statistical properties and leads to predictions which can be interpreted in terms of the geometric mean.[3]

To overcome these issues with MAPE, there are some other measures proposed in literature: