# Talk:Exponential smoothing

WikiProject Statistics (Rated C-class, Low-importance)

This article is within the scope of the WikiProject Statistics, a collaborative effort to improve the coverage of statistics on Wikipedia. If you would like to participate, please visit the project page or join the discussion.

C  This article has been rated as C-Class on the quality scale.
Low  This article has been rated as Low-importance on the importance scale.

## The original version

The whole thing reads like a student essay. --A bit iffy 15:17, 27 April 2006 (UTC)

I do not have access to recent research paper etc., but I can give the following reference: Montgomery, Douglas C "Forecasting and time series analysis" 1976, McGraw-Hill Inc. martin.wagner@nestle.com

It's pretty messy. I've seen worse, though. One of the early formulas seems right, but I haven't looked closely at the text yet. Michael Hardy 20:59, 23 October 2006 (UTC)

For such a widely adopted forecasting method, this is extremely poor. At the very least, standard notation should be adopted, see Makridakis, S., Wheelwright, S. C., & Hyndman, R. J. (1998). Forecasting: Methods and applications (3rd ed.). New Jersey: John Wiley & Sons. State-space notation would also be a useful addition. Dr Peter Catt 02:29, 15 December 2006 (UTC)

I feel really sorry to see poor work like this on Wiki.

## A complete rewrite

OK, nobody liked this article very much, and it even came up over on this talk page. So I've rewritten it in its entirety. I'll try to lay my hands on a few reference books in the next month or so, so that I can verify standard notation, etc. Additional information about double and triple exponential smoothing should also go in this article, but at least I've made a start. DavidCBryant 00:45, 10 February 2007 (UTC)

## Problems with weighting

Now that the article has been re-written to be much clearer, I see that there are some problems which I had not previously noticed.

1. Currently, the way the average is initialized gives much too much weight to the first observation. If α is 0.05 (corresponding to a moving average of about 20 values), when you initialize and then input a second observation, the average is (1*x1 + 19*x0)/20 which gives the first observation 19 times the weight of the second when it would be more appropriate to give the first observation 19/20 of the weight of the second observation.
2. No provision is made for the practical problem of missing data. If an observation is not made at some time, or it is made but lost, then what do we do?

Perhaps these difficulties could be addressed, in part, by separately computing a normalization factor which could be done by forming a sum in the same way using always 1 as the data and then dividing that into the sum of the actual observations. JRSpriggs 03:45, 12 February 2007 (UTC)

Do you see this as a problem with the article itself, or a problem with the statistical technique described in the article? —David Eppstein 08:06, 12 February 2007 (UTC)
Well, that is the question, is it not? I do not know enough about statistics to know whether the technique has been described incorrectly or whether we should point out that the technique has these limitations. JRSpriggs 09:59, 12 February 2007 (UTC)
When I learned about this technique, I think I remember learning that either of the two methods could be used to initialize it (either copy the first data point enough times to fill in the array, of copy the most recent data point enough times). But I have no idea where I learned about this and I have no references on it, so I can't say what is done in practice. The article on moving average also has no discussion of how to initialize the array. It's hardly a limitation on the method because the method is intended for large data sets, not tiny ones. There seem to be some references at moving average like this one. CMummert · talk 13:27, 12 February 2007 (UTC)

Perhaps we should consider merging this article into Moving average#Exponential moving average. Otherwise, I think that the weighting should be more like this:

$s_t = \frac {\sum_k \exp (-\alpha (t - t_k)) x_k}{\sum_k \exp (-\alpha (t - t_k))}$

where the sum is over observations with $t_k \leq t .\!$ What do you-all think? JRSpriggs 04:35, 13 February 2007 (UTC)

I agree on both counts, particularly the merge. MisterSheik 18:47, 16 February 2007 (UTC)

## The exponential moving average

The article says: For example, the method of least squares might be used to determine the value of α for which the sum of the quantities (sn − xn)^2 is minimized. Uh? I think such $\alpha,$ would be one! Albmont 11:14, 28 February 2007 (UTC)

Thanks. It should have been sn-1 instead of sn. I changed it. JRSpriggs 12:25, 28 February 2007 (UTC)

## Is $x_t$ right?

Should it be

\begin{align} s_t& = \alpha x_t + (1-\alpha)s_{t-1}\, \end{align}

or

\begin{align} s_t& = \alpha x_{t-1} + (1-\alpha)s_{t-1}\, \end{align}?

This indicates the first one, but this PowerPoint presentation implies the second one.

If the formula is used for forecasting purposes, then it looks to me like the second one is the only usable one (and also looks more natural somehow). Or am I missing something? (I'm very, very rusty on this now!). Hope someone can clear this up.--A bit iffy 11:18, 3 March 2007 (UTC)

The choice is a matter of convention (for strictly periodic data), but I think that the first one (which we use here) is more natural, since people would compute the smoothed value as soon as possible and would naturally want to label it with the time that they computed it. JRSpriggs 11:27, 4 March 2007 (UTC)

The textbooks that I use for teaching time series use the second one (with the lagged value of the series). See Moore, McCabe, Duckworth and Sclove, The Practice of Business Statistics. In Minitab, on the other hand, the smoothed value at time t is defined using the first formula, while the fitted value at time t is the smoothed value at time (t-1). EconProf86 16:20, 28 May 2007 (UTC)

In my view it should be X_t and not X_{t-1}. The simple exponential filter is analageous to a single pole low pass IIR filter. In both cases you are saying that the current out put is formed from the sum of the new input plus a fraction of the old output — Preceding unsigned comment added by 212.77.61.18 (talk) 08:09, 9 September 2011 (UTC)

Both are correct. As said above, X_t is used when the focus is smoothing a series and X_{t-1} is used when prediction is the focus. As the article mostly follows the NIST handbook we use X_{t-1}.--Muhandes (talk) 16:55, 10 September 2011 (UTC)

Another related point: the following paragraph states "In the limiting case with α = 1 the output series is just the same as the original series". This will only be the case with X_t. Using X_{t-1} means adding a lag in the time series.Eric thiebaut (talk) 17:28, 24 December 2011 (UTC)

Corrected. --Muhandes (talk) 11:45, 25 December 2011 (UTC)

looks like it is back to t-1. I like it better as t. That is how you would program it. — Preceding unsigned comment added by 128.114.150.27 (talk) 22:59, 17 March 2014 (UTC)

## Negative values for smoothing factor α

All references I have looked at suggest that the value of α must be chosen between 0 and 1. However, none offer any reason for this. Although such a range may be "intuitive", I have worked with datasets for which the optimal value for α (in a least-squares sense, as described in the article) is negative. Why would this be wrong? koochak 10:30, 5 March 2008 (UTC)

Look at the meaning of the α. It is a percentage of the smoothed value that should be generated using the previous smoothed value. You cannot have a negative percentage. JLT 15:03, 16 Dec 2009 (CST)

## Corrected an Error

I removed an innacuracy that stated that simple exponential smoothing was the same as Brown exponential smoothing. This is not the case; Brown's method is double exponential smoothing. JLT 1451, 16 Dec 1009 (CST) —Preceding unsigned comment added by 131.10.254.62 (talk)

## Unsatisfying Derivation of Alpha

The statement that there is no simple way to choose α is very unsatisfying.

If one considers the impulse-response of this method, then the time delay of the response (mean) is 1/α data points and the rms width of the response is also on the order of (but not exactly) 1/α data points. Thus the method smooths with a smoothing width of 1/α data points, and this is a perfectly good way to choose an α.

208.252.219.2 (talk) 16:01, 18 August 2010 (UTC)

WP:SOFIXIT --Muhandes (talk) 10:03, 20 August 2010 (UTC)

## last square optimisation of alpha

I do not understand exactly why optimizing alpha using LS methods should work. Sum of squares of differences is minimized for alpha=1, and it equals 0. By continuity optimization problem I suppose there is no other non-trivial optimisation solutions. Please give some citation/reference. —Preceding unsigned comment added by 149.156.82.207 (talk) 18:25, 15 December 2010 (UTC)

I don't follow. For alpha=1 s_t=x_{t-1}, i.e. the estimate is always the last measure. This minimizes the sum of square only when x_{t-1}=x_t, i.e. only for a constant series. If you need a source on this, check section 6.4.3.1 of the NIST/SEMATECH e-Handbook of Statistical Methods, which is the source of most of the article. --Muhandes (talk) 19:15, 15 December 2010 (UTC)

## Double exponential smoothing != double exponential smoothing

After a lot of confusion and searching, I noticed that there are at least three approaches to calculate a double exponential smoothing.

1. possibly the Holt method
This one just calculates a single exponential smoothing, the results $s_{t}$ are used as the starting values for the estimation line (i. e. $F_{t+0}$). Additionally, the trend itself $b_{t}$ is calculated and is used as the gradient of the estimation line. As a result, the estimation difference to single exponential smoothing is just that a trend is assumed, calculated and used, using the result of the simple exp. smoothing as the starting point.
Sources:
2. the Brown method
This one first calculates a single exponential smoothing $S'_{t}$ over the data and then calculates another exponential smoothing $S''_{t}$ over that smoothed line, resulting in a double-smoothing. For both times, the same α is being used. The estimation line has the starting value $2 \cdot S'_{t}-S''_{t}$, the line gradient is described as $\frac{\alpha}{1-\alpha} \cdot (S'_{t}-S''_{t})$
Sources:
3. allegedly the linear exponential smoothing by Holt/Winters (the one talked about in the article)
This one works similarly to the Brown method but instead of just taking the previous result of the single smoothing it takes into account the previously forecasted trend $S''_{t-1}$ by adding it to the previously forecasted level $S'_{t-1}$. Also, the new variable β is used to adjust the influence of the trend on the forecast.
Sources:

My point is: this must be clarified and explained properly.

I’m sorry for the lack of English resources, I hope you can find better ones than me. --Uncle Pain (talk) 14:29, 23 September 2011 (UTC)

A small addition after some comparison: the methods 1 and 2 are indeed very similar, as the German PDF implies by combining them in a connected row. The only difference seems to be the $\frac{\alpha}{1-\alpha}$ factor in method 2 which should be $\frac{1}{1-\alpha}$ to make it match the results of method 1. Both calculate the same gradient of the estimation line. --Uncle Pain (talk) 15:59, 23 September 2011 (UTC)

Very good points. I'll try to work it all out into the article tomorrow, thanks for the resources. --Muhandes (talk) 20:35, 24 September 2011 (UTC)
I added the second method. I never met the first method before, and I'm still trying to figure out if it is different from the second. If you are confident in your analysis, I suggest you add it yourself. --Muhandes (talk) 14:58, 25 September 2011 (UTC)

## Text from Initial value problem

I have removed the following text from the page Initial value problem which is about ODE theory. If somebody with knowledge of the domain thinks it belongs here, please integrate it. --138.38.106.191 (talk) 14:25, 10 May 2013 (UTC)

Exponential smoothing is a general method for removing noise from a data series, or producing a short term forecast of time series data.
Single exponential smoothing is equivalent to computing an exponential moving average. The smoothing parameter is determined automatically, by minimizing the squared difference between the actual and the forecast values. Double exponential smoothing introduces a linear trend, and so has two parameters. For estimating initial value there are several methods. like we use these two formulas;
$y'_0=\left(\frac{\alpha}{1-\alpha}\right)a_t+b_t$
$y''_0=\left(\frac{\alpha}{1-\alpha}\right)a_t+2b_t$

## redundant "Exponential moving average"

Two articles present similar content:

Sorry, I do not have enougth motivation/time to check further and to manage the potential merge.

Oliver H (talk) 09:06, 6 March 2014 (UTC)

## Division by 0 in triple exponential smoothing

How does the method cope with scenarios in which either c(t) is 0 or s(t) is 0? The problem occurs in the next time iteration:

• if c(t) is 0 then s(t+1) = ... x(t) / 0 ...
• if s(t) is 0 then c(t+1) = ... x(t) / 0 ...