pdf.io >> Free >> Damped trend exponential smoothing:.pdf

Damped trend exponential smoothing:
 FileName: DampedtrendModelling.pdf [readonline]


 FileSize: 79 KB download
 Shared by: bauer_uh_edu 18 month ago
 Category: Free
 Report us: delete it


Abstract: Damped trend exponential smoothing:A modelling viewpoint1Damped trend exponential smoothing:A modelling viewpointAbstractIn the past twenty years, damped trend exponential smoothing has performed well in numerous

Damped trend exponential smoothing:
A modelling viewpoint
1
Damped trend exponential smoothing:
A modelling viewpoint
Abstract
In the past twenty years, damped trend exponential smoothing has performed well in numerous
empirical studies and is now well established as an accurate forecasting method. The original
motivation for this method was intuitively appealing, but said very little about why or when
it provided an optimal approach. The aim of this paper is to provide a theoretical rationale
for the damped trend method based on Brown’s original thinking about the form of underlying
models for exponential smoothing. We develop a random coeﬃcient statespace model for
which damped trend smoothing provides an optimal approach, and within which the damping
parameter can be interpreted directly as a measure of the persistence of the linear trend.
Key words: Time series, exponential smoothing, ARIMA models, state space models.
2
Damped trend exponential smoothing:
A modelling viewpoint
1 Introduction
In a series of three papers (Gardner and McKenzie, 1985, 1988, 1989), we developed new
versions of the HoltWinters methods of exponential smoothing that damp the trend as the
forecast horizon increases. Since those papers appeared, damped trend exponential smoothing
has performed well in numerous empirical studies, as discussed in Gardner (2006). In a review
of evidencebased forecasting, Armstrong (2006) recommended the damped trend as a well
established forecasting method that should improve accuracy in practical applications. In a
review of forecasting in operational research, Fildes et al. (2008) concluded that the damped
trend can “reasonably claim to be a benchmark forecasting method for all others to beat.”
Additional empirical evidence for the M3 competition data (Makridakis and Hibon, 2000) is
given in Hyndman, Koehler, Ord and Snyder (HKOS) (2008), who found that use of the damped
trend method alone compared favourably to model selection via information criteria.
Despite this record of empirical success, we still have no compelling rationale for the damped
trend. Our original approach was pragmatic, based on the ﬁndings of the Mcompetition (Makri
dakis et al., 1982), which showed that the practice of projecting a straight line trend indeﬁnitely
into the future was often too optimistic (or pessimistic). Thus we added an autoregressive
damping parameter (φ ) to modify the trend component in Holt’s linear trend method. The
result is a method stationary in ﬁrst diﬀerences, rather than second diﬀerences as in the Holt
method. With a strong, consistent trend in the data, we hypothesized that φ would be ﬁtted
at a value near 1, and the forecasts would be very nearly the same as Holt’s; if the data are
extremely noisy or if the trend is erratic, φ would be ﬁtted at a value less than 1 to create a
3
damped forecast function. This explanation may be intuitively appealing, but it says nothing
about when trend damping is the optimal forecasting approach.
The aim of this paper is to provide a theoretical rationale for the damped trend based on
Brown’s (1963) original thinking about the form of underlying models for exponential smooth
ing. His preference was for processes that are thought to be locally constant. Brown argued
that although the parameters of the model may be constant within any local segment of time,
they may change from one segment to the next, and the changes may be sudden or smooth. We
present a new model for the damped trend method that accommodates both types of change.
Interestingly, our interpretation of this model essentially reverses our original thinking on the
use of damped trend forecasting in practice.
2 A Modelling Viewpoint
Our development is based on the class of single source of error (SSOE) state space models
(HKOS). We begin with the model for a linear trend with additive errors:
yt = t−1 + bt−1 + εt (1)
t = t−1 + bt−1 + (1 − α)εt (2)
bt = bt−1 + (1 − β)εt (3)
where {yt } is the observed series, { t } is its level and {bt } the gradient of its linear trend. This
model has a single source of error {εt }, and hence the name. We note that what we have to say
here still applies even if we consider models with multiple sources of error. Compared to the
presentation in HKOS, we have written the coeﬃcients of the innovations in the level (2) and
gradient (3) revision equations in a slightly unusual way to simplify some of the results which
4
follow. The model (13) has a reduced form as the ARIMA(0,2,2):
(1 − B)2 yt = εt − (α + β)εt−1 + αεt−2 (4)
The two models are equivalent but the state space expression is easier to interpret, especially
when the parameters take on extreme values. The usual minimum mean square error (MMSE)
forecasts of this model can be generated using the recursive formulae of Holt.
To damp the trend component in (13), we incorporate an autoregressivedamping parameter
φ to create another SSOE model:
yt = t−1 + φbt−1 + εt (5)
t = t−1 + φbt−1 + (1 − α)εt (6)
bt = φbt−1 + (1 − β)εt (7)
This model (57) has a reduced form as the ARIMA(1,1,2):
(1 − φB)(1 − B)yt = εt − (α + φβ)εt−1 + φαεt−2 (8)
Note that the gradient revision equation (7) is an AR(1) rather than the random walk form
used in (3). Thus, revision equation (7) allows the gradient to change but in a stationary way,
whereas in (3) such changes are nonstationary and the longerterm behaviour is quite diﬀerent.
In (57), we can interpret φ as a direct measure of the persistence of the linear trend. With
φ close to 1, the linear trend is highly persistent, but φ moving away from 1 towards zero
indicates weaker persistence. And, of course, φ = 0 would indicate the complete absence of any
linear trend.
Now we recall Brown’s idea of a locally constant model and apply it to the gradient of
the linear trend. For the model in (13), this means that the usual random walk form of the
5
gradient revision equation (3) holds for a while, but then the gradient changes to a new value,
and that holds for a while, and then changes again, and so on. Thus, we have runs of the
same linear trend model given by (13), but each run ends when the gradient revision equation
(3) restarts with a new gradient. Such behaviour may be modelled by rewriting the gradient
revision equation in the form
bt = At bt−1 + (1 − β)εt (9)
where {At } is a sequence of independent, identically distributed binary random variates with
P (At = 1) = φ and P (At = 0) = (1 − φ). At each time point we have the current linear
trend model with probability φ, or an alternative linear trend model, starting with a new and
unrelated gradient, with probability (1 − φ).
At ﬁrst sight, this is a strange model, but it is easy to see what happens in particular cases.
If we wish to model a strongly persistent trend then φ will be close to 1, and the sequence {At }
will consist of long runs of 1s interrupted by occasional 0s. This yields long runs of a linear
trend model with a similar gradient, one changing smoothly by means of equation (3), but
which can change suddenly, with a small probability (1 − φ), to a completely diﬀerent gradient.
If φ is close to 0 there are long runs of 0’s with occasional 1’s, so the model displays only a
very weak linear trend (if any), with a frequently changing gradient. With φ between 0 and 1
we get a mixture, resulting in diﬀerent linear trend models operating over shorter time scales,
i.e. low persistence of trend. In passing, we note that the mean length of such runs is given
by φ/(1 − φ), which may also be thought of as a way to measure the persistence of trend. We
also note that equation (9) is not the only possible form we could use here. For example, if we
wish to generate a greater level of variation at the gradient changepoint, i.e. when At = 0, we
could replace (9) by
bt = At bt−1 + (1 − At )dt + (1 − β)εt (10)
6
where {dt } is another, independent, white noise source, and we would obtain similar results.
We will use equation (9) here because it is the simplest form.
The new state space model corresponding to the incorporation of the new gradient revision
equation (9) is a random coeﬃcient state space model:
yt = t−1 + At bt−1 + εt (11)
t = t−1 + At bt−1 + (1 − α∗ )εt (12)
bt = At bt−1 + (1 − β ∗ )εt (13)
whose reduced form is a random coeﬃcient ARIMA(1,1,2):
(1 − At B)(1 − B)yt = εt − (α∗ + At β ∗ )εt−1 + At α∗ εt−2 (14)
We use (α∗ , β ∗ ) here rather than (α, β) in order to emphasise that these coeﬃcients will diﬀer
in our discussion of the two models (57) and (1113), whereas the same value of φ will apply
to both.
Although this random coeﬃcient state space model may appear complex, it is simply a
stochastic mixture of two well known forms. Thus, for example, equation (14) may be rewritten
as
(1 − B)2 yt = εt − (α∗ + β ∗ )εt−1 + α∗ εt−2 with probability φ (15)
(1 − B)yt = εt − α∗ εt−1 with probability (1 − φ) (16)
In this model, {yt } is generated by the ARIMA(0,2,2) given by (15) or (4), the usual linear
trend model, with probability φ ; but then, with probability (1 − φ), the gradient changes
completely, the generation process switching to the ARIMA(0,1,1) given by (16), the usual
underlying model for simple exponential smoothing. The resulting process is a mixture of the
two.
7
Now, in this model (1114), it may be shown that the stationary process of ﬁrst diﬀerences,
{(1 − B)yt }, has exactly the same autocorrelation function as a standard ARMA(1,2) with
autoregressive parameter φ, i.e. ρ(k) = φk−2 ρ(2) for k ≥ 2. It follows that {yt } can be
generated by a stochastic diﬀerence equation of the form:
(1 − φB)(1 − B)yt = at − θ1 at−1 − θ2 at−2 (17)
where {at } is a white noise process whose variance and the parameters θ1 and θ2 are complicated
functions of the parameters φ, α∗ , β ∗ and the variance of the innovation process {εt }. Thus,
the MMSE forecasts of yt deﬁned by equation (17) are the MMSE forecasts of the random
coeﬃcient ARIMA(1,1,2) given by (14), and thus also of our random coeﬃcient state space
model (1113). Moreover, the MMSE forecasts of (17) are clearly damped trend forecasts.
Hence, to summarize these relationships, the standard damped trend forecasts optimal for
(57) are also optimal for a random coeﬃcient statespace model of the form of (1113), with
the same parameter value, φ, in both, but with diﬀerent values of α and β in (1113). The
values of these corresponding parameters in (1113), α∗ and β ∗ say, can be computed from
the parameters of the damped trend model in (57), but our intention here is simply to note
that the damped trend forecasts are also optimal for such a more general and broader class of
models. We also argue that such a random coeﬃcient state space model is itself often a good
approximation to the behaviour of practically occurring nonseasonal time series, and that this
is one of the main reasons for the empirical success of the damped trend method.
3 Other Models/Methods
The same discussion and argument will apply in the cases of other similar models that contain
a linear trend component. In particular, we note here two important cases. The ﬁrst is the
8
additive seasonal model (of period n) which, in random coeﬃcient form, is given by
yt = t−1 + At bt−1 + St−n + εt (18)
t = t−1 + At bt−1 + St−n + (1 − α)εt (19)
bt = At bt−1 + (1 − β)εt (20)
St = St−n + γεt (21)
If the random coeﬃcient At is replaced by the constant value 1 or 0, we obtain models
for which HoltWinterstype linear trend with additive seasonality, or trendfree seasonality,
forecasting methods respectively are optimal. If we replace At by φ , the damped trend version
(e.g. Gardner and McKenzie,1989) is optimal.
The second model we wish to extend is the linear trend version of the very important
multiplicativeerror models of HKOS. It is given by
yt = ( t−1 + bt−1 )(1 + εt ) (22)
t = ( t−1 + bt−1 )(1 + (1 − α)εt ) (23)
bt = bt−1 + (1 − β)( t−1 + bt−1 )εt (24)
The importance of models of the form of (2527) lies in the fact that although the driving
innovation terms have variances that are now functions of the level, nevertheless exponential
smoothing methods can be optimal. The random coeﬃcient version of this is given by
yt = ( t−1 + At bt−1 )(1 + εt ) (25)
t = ( t−1 + At bt−1 )(1 + (1 − α)εt ) (26)
bt = At bt−1 + (1 − β)( t−1 + At bt−1 )εt (27)
and, for completeness, we note that the reduced random coeﬃcient ARIMA may be written in
the mixture form we have used before, thus:
9
with probability φ:
(1 − B)2 yt = ωt − (α + β)ωt−1 + αωt−2 where ωt = ( t−1 + bt−1 )εt (28)
and, with probability (1 − φ):
(1 − B)yt = ωt − αωt−1 where ωt = t−1 εt (29)
This form is essentially the same as (15) and (16) except that the innovation process is now
dependent on level.
4 Conclusions
We have developed a model, given by (1113) or (14) or (15) and (16), for which damped
trend smoothing provides an optimal approach and within which the damping parameter can
be interpreted directly as a measure of the persistence of the linear trend. Developing these
models has lead us to reverse our earlier view that a damped trend is a good approximation
to a linear trend at short leadtimes and is better for longer ones because the linearity must
eventually break down. Now, our argument is that the underlying random coeﬃcient linear
trend model is more realistic, i.e. is more often closer to the true process that underlies our
time series, and the linear trend model is simply a good approximation to it for short leadtimes.
Technically, we are arguing that it makes more practical sense to model the uncertainty of the
gradient process of our putative linear trend as a random coeﬃcient autoregression (13) rather
than a random walk (3), thus greatly widening the legitimacy of damped trend forecasting.
We see this model as a natural extension of Brown’s (1963) original work. Our aim is to
capture the locally constant nature of the linear trend by means of its gradient which may
change smoothly or suddenly. The random walk form of the gradient revision equation allows
10
smooth change very well, but is less successful with occasional, sudden change. Our random
coeﬃcient model accommodates both kinds of change.
Finally, we note that if we assume the random coeﬃcient state space model (1113) does
indeed generate our observed time series, then damped trend forecasting may be optimal but
the corresponding prediction intervals will be much wider than if we assume the standard
damped trend model of equations (57). This is because of the extra variation introduced by
the presence of the random binary coeﬃcient, and may go some way to explaining the often
conservative performance of prediction intervals in this area. This important topic will be
explored elsewhere.
Acknowledgements:
We would like to thank Ralph Snyder and Rob Hyndman for their insightful comments on a
talk describing this random coeﬃcient model given at the ISF 2008 in Nice, France.
References
Armstrong, J.S. (2006). Findings from evidencebased forecasting: Methods for reducing
forecast error, International Journal of Forecasting, 22, 583598.
Brown, R.G. (1963) Smoothing, Forecasting and Prediction of Discrete Time Series, Prentice
Hall, Inc., Englewood Cliﬀs, NJ.
Fildes, R., Nikolopous, K., Crone, S., & Syntetos, A. (2008) Forecasting and operational
research: a review, Journal of the Operational Research Society, 59, 123.
Gardner Jr., E. S. (2006). Exponential smoothing: The state of the art Part II. International
Journal of Forecasting, 22, 637666.
11
Gardner Jr., E.S. & McKenzie, E. (1985) Forecasting trends in time series, Management
Science, 31, 12371246.
Gardner Jr., E.S. & McKenzie, E. (1988) Model identiﬁcation in exponential smoothing, Jour
nal of the Operational Research Society , 39, 863867.
Gardner Jr., E.S. & McKenzie, E. (1989) Seasonal exponential smoothing with damped trends,
Management Science, 35, 372376.
Hyndman, R., Koehler, A., Ord, J.K., & Snyder, R.D. (2008) Forecasting with exponential
smoothing: The state space approach, SpringerVerlag: Berlin.
Makridakis, S., Andersen, A., Carbone, R., Fildes, R., Hibon, M., Lewandowski, R., Newton,
J., Parzen, R., & Winkler, R. (1982). The accuracy of extrapolation (time series) methods:
Results of a forecasting competition, Journal of Forecasting, 1, 111153.
12
 Related pdf books
 STAT 3331 (11048) SYLLABUS FALL 2010 Page No. 1
 Group Decisions, Contracts and Informational Cascades
 Resource Allocation Effects of Price Reactions to Disclosures*
 RecRuiteR Guide
 Does Asymmetric Information Drive Capital Structure Decisions?
 ***Check appropriate courses ***Please place a by the courses ...
 2007 DEGREE ROGRAM (effective Fall 2007)
 A.M. Best Credit Report  Insurance Professional
 Popular epubs
 brassica seeds facts for kids
 grade 10 caps term 4 paper
 Equation for Success
 HARRIS COUNTY Job No
 Benchtop Peristaltic Dispenser
 Huviteatmik 2012 08
 microprocessor 8085 index of any
 Georgia State University GRADUATE ASSISTANT POLICY
 Baobonbon Satomi Ichikawa
 ENTENTE ATHLETIQUE DOUCHYNOISE
 Survival surgery
Download the ebook