Wednesday, July 9, 2014

the basic of Time serie

Time Series Analysis: The Basics
 


WHAT IS A TIME SERIES?

A time series is a collection of observations of well-defined data items obtained through repeated measurements over time. For example, measuring the value of retail sales each month of the year would comprise a time series. This is because sales revenue is well defined, and consistently measured at equally spaced intervals. Data collected irregularly or only once are not time series.

An observed time series can be decomposed into three components: the trend (long term direction), the seasonal (systematic, calendar related movements) and the irregular (unsystematic, short term fluctuations).


WHAT ARE STOCK AND FLOW SERIES?

Time series can be classified into two different types: stock and flow.

A stock series is a measure of certain attributes at a point in time and can be thought of as “stocktakes”. For example, the Monthly Labour Force Survey is a stock measure because it takes stock of whether a person was employed in the reference week.

Flow series are series which are a measure of activity over a given period. For example, surveys of Retail Trade activity. Manufacturing is also a flow measure because a certain amount is produced each day, and then these amounts are summed to give a total value for production for a given reporting period.

The main difference between a stock and a flow series is that flow series can contain effects related to the calendar (trading day effects). Both types of series can still be seasonally adjusted using the same seasonal adjustment process.


WHAT ARE SEASONAL EFFECTS?

A seasonal effect is a systematic and calendar related effect. Some examples include the sharp escalation in most Retail series which occurs around December in response to the Christmas period, or an increase in water consumption in summer due to warmer weather. Other seasonal effects include trading day effects (the number of working or trading days in a given month differs from year to year which will impact upon the level of activity in that month) and moving holidays (the timing of holidays such as Easter varies, so the effects of the holiday will be experienced in different periods each year).

WHAT IS SEASONAL ADJUSTMENT AND WHY DO WE NEED IT?

Seasonal adjustment is the process of estimating and then removing from a time series influences that are systematic and calendar related. Observed data needs to be seasonally adjusted as seasonal effects can conceal both the true underlying movement in the series, as well as certain non-seasonal characteristics which may be of interest to analysts.


WHY CAN'T WE JUST COMPARE ORIGINAL DATA FROM THE SAME PERIOD IN EACH YEAR?

A comparison of original data from the same period in each year does not completely remove all seasonal effects. Certain holidays such as Easter and Chinese New Year fall in different periods in each year, hence they will distort observations. Also, year to year values will be biased by any changes in seasonal patterns that occur over time. For example, consider a comparison between two consecutive March months i.e. compare the level of the original series observed in March for 2000 and 2001. This comparison ignores the moving holiday effect of Easter. Easter occurs in April for most years but if Easter falls in March, the level of activity can vary greatly for that month for some series. This distorts the original estimates. A comparison of these two months will not reflect the underlying pattern of the data. The comparison also ignores trading day effects. If the two consecutive months of March have different composition of trading days, it might reflect different levels of activity in original terms even though the underlying level of activity is unchanged. In a similar way, any changes to seasonal patterns might also be ignored. The original estimates also contains the influence of the irregular component. If the magnitude of the irregular component of a series is strong compared with the magnitude of the trend component, the underlying direction of the series can be distorted.

However, the major disadvantage of comparing year to year original data, is lack of precision and time delays in the identification of turning points in a series. Turning points occur when the direction of underlying level of the series changes, for example when a consistently decreasing series begins to rise steadily. If we compare year apart data in the original series, we may miss turning points occurring during the year. For example, if March 2001 has a higher original estimate than March 2000, by comparing these year apart values, we might conclude that the level of activity has increased during the year. However, the series might have increased up to September 2000 and then started to decrease steadily.

WHEN IS SEASONAL ADJUSTMENT INAPPROPRIATE?

When a time series is dominated by the trend or irregular components, it is nearly impossible to identify and remove what little seasonality is present. Hence seasonally adjusting a non-seasonal series is impractical and will often introduce an artificial seasonal element.

WHAT IS SEASONALITY?

The seasonal component consists of effects that are reasonably stable with respect to timing, direction and magnitude. It arises from systematic, calendar related influences such as:
        • Natural Conditions
            weather fluctuations that are representative of the season
            (uncharacteristic weather patterns such as snow in summer would be considered irregular influences)
        • Business and Administrative procedures
            start and end of the school term
        • Social and Cultural behaviour
            Christmas

It also includes calendar related systematic effects that are not stable in their annual timing or are caused by variations in the calendar from year to year, such as:
        • Trading Day Effects
            the number of occurrences of each of the day of the week in a given month will differ from year to year
            - There were 4 weekends in March in 2000, but 5 weekends in March of 2002
        • Moving Holiday Effects
            holidays which occur each year, but whose exact timing shifts
            - Easter, Chinese New Year

HOW DO WE IDENTIFY SEASONALITY?

Seasonality in a time series can be identified by regularly spaced peaks and troughs which have a consistent direction and approximately the same magnitude every year, relative to the trend. The following diagram depicts a strongly seasonal series. There is an obvious large seasonal increase in December retail sales in New South Wales due to Christmas shopping. In this example, the magnitude of the seasonal component increases over time, as does the trend.
Figure 1: Monthly Retail Sales in New South Wales (NSW) Retail Department Stores Graph - Monthly Retail Sales in New South Wales (NSW) Retail Department Stores

WHAT IS AN IRREGULAR?

The irregular component (sometimes also known as the residual) is what remains after the seasonal and trend components of a time series have been estimated and removed. It results from short term fluctuations in the series which are neither systematic nor predictable. In a highly irregular series, these fluctuations can dominate movements, which will mask the trend and seasonality. The following graph is of a highly irregular time series:
Figure 2: Monthly Value of Building Approvals, Australian Capital Territory (ACT) Graph - Monthly Value of Building Approvals, Australian Capital Territory (ACT)
WHAT IS THE TREND?

The ABS trend is defined as the 'long term' movement in a time series without calendar related and irregular effects, and is a reflection of the underlying level. It is the result of influences such as population growth, price inflation and general economic changes. The following graph depicts a series in which there is an obvious upward trend over time:
Figure 3: Quarterly Gross Domestic Product
Graph - Quarterly Gross Domestic Product


WHAT ARE THE UNDERLYING MODELS USED TO DECOMPOSE THE OBSERVED TIME SERIES?

Decomposition models are typically additive or multiplicative, but can also take other forms such as pseudo-additive.

Additive Decomposition

In some time series, the amplitude of both the seasonal and irregular variations do not change as the level of the trend rises or falls. In such cases, an additive model is appropriate.

In the additive model, the observed time series (Ot) is considered to be the sum of three independent components: the seasonal St, the trend Tt and the irregular
It.


That is



Each of the three components has the same units as the original series. The seasonally adjusted series is obtained by estimating and removing the seasonal effects from the original time series. The estimated seasonal component is denoted by The seasonally adjusted estimates can be expressed by:



In symbols,

The following figure depicts a typically additive series. The underlying level of the series fluctuates but the magnitude of the seasonal spikes remains approximately stable.
Figure 4: General Government and Other Current Transfers to Other Sectors
Graph - General Government and Other Current Transfers to Other Sectors

Multiplicative Decomposition

In many time series, the amplitude of both the seasonal and irregular variations increase as the level of the trend rises.In this situation, a multiplicative model is usually appropriate.

In the multiplicative model, the original time series is expressed as the product of trend, seasonal and irregular components.


or



The seasonally adjusted data then becomes:


or

Under this model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1.

Most of the series analysed by the ABS show characteristics of a multiplicative model. As the underlying level of the series changes, the magnitude of the seasonal fluctuations varies as well.
Figure 5: Monthly NSW ANZ Job Advertisements Graph - Monthly NSW ANZ Job Advertisements
Pseudo-Additive Decomposition

The multiplicative model cannot be used when the original time series contains very small or zero values. This is because it is not possible to divide a number by zero. In these cases, a pseudo additive model combining the elements of both the additive and multiplicative models is used. This model assumes that seasonal and irregular variations are both dependent on the level of the trend but independent of each other.

The original data can be expressed in the following form:


The pseudo-additive model continues the convention of the multiplicative model to have both the seasonal factor St and the irregular factor It centred around one. Therefore we need to subtract one from St and It to ensure that the terms Tt x (St - 1) and Tt x (It - 1) are centred around zero. These terms can be interpreted as the additive seasonal and additive irregular components respectively and because they are centred around zero the original data Ot will be centred around the trend values Tt .

The seasonally adjusted estimate is defined to be:


where and are the trend and seasonal component estimates. In the pseudo-additve model, the trend has the same units as the original series, but the seasonal and irregular components are unitless factors, distributed around 1.

An example of series that requires a pseudo-additive decomposition model is shown below. This model is used as cereal crops are only produced during certain months, with crop production being virtually zero for one quarter each year.
Figure 6: Quarterly Gross Value for the Production of Cereal Crops
Graph - Quarterly Gross Value for the Production of Cereal Crops


Example: Shiskin Decomposition

The Shiskin decomposition gives graphs of the original series, seasonally adjusted series, trend series, residual (irregular) factors and the between month (seasonal) and within month (trading day) factors that are combined to form the combined adjustment factors. The residual (irregular) factors are found by dividing the seasonally adjusted series by the trend series. Figure 7 shows a Shiskin decomposition for the Australian Retail series.

Figure 7: Shiskin decomposition for Australian Total Retail Turnover, May 1990 to May 2000
Graph - Shiskin decomposition for Australian Total Retail Turnover, May 1990 to May 2000


HOW DO I KNOW WHICH DECOMPOSITION MODEL TO USE?

To choose an appropriate decomposition model, the time series analyst will examine a graph of the original series and try a range of models, selecting the one which yields the most stable seasonal component. If the magnitude of the seasonal component is relatively constant regardless of changes in the trend, an additive model is suitable. If it varies with changes in the trend, a multiplicative model is the most likely candidate. However if the series contains values close or equal to zero, and the magnitude of seasonal component appears to be dependent upon the trend level, then pseudo-additive model is most appropriate.

WHAT IS A SEASONAL AND IRREGULAR (SI) CHART?

Once the trend component is estimated, it can be removed from the original data, leaving behind the combined seasonal and irregular components or SIs. A seasonal and irregular or SI chart graphically presents the SI's for particular months or quarters in the series span.

The following graph is an SI chart for a monthly series, using a multiplicative decomposition model.
Figure 8: Seasonal and Irregular (SI) Chart - Value of Building Approvals, ACT
Graph - Seasonal/Irregular (S-I) Chart - Value of Building Approvals, ACT

The points represent the SIs obtained from the time series, while the solid line shows the seasonal component. The seasonal component is calculated by smoothing the SI's, to remove irregular influences.

SI charts are useful in determining whether short-term movements are caused by seasonal or irregular influences. In the graph above, the SIs can be seen to fluctuate erratically, which indicates the time series under analysis is dominated by its irregular component.

SI charts are also used to identify seasonal breaks , moving holiday patterns and extreme values in a time series.

Bookmark and Share. Opens in a new window

time series

TIME SERIES
Contents
Syllabus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Books . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Keywords . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
1 Models for time series 1
1.1 Time series data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Trend, seasonality, cycles and residuals . . . . . . . . . . . . . . . . . 1
1.3 Stationary processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.4 Autoregressive processes . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.5 Moving average processes . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 The turning point test . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Models of stationary processes 5
2.1 Purely indeterministic processes . . . . . . . . . . . . . . . . . . . . . 5
2.2 ARMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 ARIMA processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Estimation of the autocovariance function . . . . . . . . . . . . . . . 6
2.5 Identifying a MA(q) process . . . . . . . . . . . . . . . . . . . . . . . 7
2.6 Identifying an AR(p) process . . . . . . . . . . . . . . . . . . . . . . . 7
2.7 Distributions of the ACF and PACF . . . . . . . . . . . . . . . . . . 8
3 Spectral methods 9
3.1 The discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . 9
3.2 The spectral density . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Analysing the effects of smoothing . . . . . . . . . . . . . . . . . . . . 12
4 Estimation of the spectrum 13
4.1 The periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2 Distribution of spectral estimates . . . . . . . . . . . . . . . . . . . . 15
4.3 The fast Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . 16
5 Linear filters 17
5.1 The Filter Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Application to autoregressive processes . . . . . . . . . . . . . . . . . 17
5.3 Application to moving average processes . . . . . . . . . . . . . . . . 18
5.4 The general linear process . . . . . . . . . . . . . . . . . . . . . . . . 19
i
5.5 Filters and ARMA processes . . . . . . . . . . . . . . . . . . . . . . . 20
5.6 Calculating autocovariances in ARMA models . . . . . . . . . . . . . 20
6 Estimation of trend and seasonality 21
6.1 Moving averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.2 Centred moving averages . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.3 The Slutzky-Yule effect . . . . . . . . . . . . . . . . . . . . . . . . . . 22
6.4 Exponential smoothing . . . . . . . . . . . . . . . . . . . . . . . . . . 23
6.5 Calculation of seasonal indices . . . . . . . . . . . . . . . . . . . . . . 24
7 Fitting ARIMA models 25
7.1 The Box-Jenkins procedure . . . . . . . . . . . . . . . . . . . . . . . 25
7.2 Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
7.4 Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.5 Tests for white noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7.6 Forecasting with ARMA models . . . . . . . . . . . . . . . . . . . . . 28
8 State space models 29
8.1 Models with unobserved states . . . . . . . . . . . . . . . . . . . . . . 29
8.2 The Kalman filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
8.3 Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
8.4 Parameter estimation revisited . . . . . . . . . . . . . . . . . . . . . . 32
ii
Syllabus
Time series analysis refers to problems in which observations are collected at regular
time intervals and there are correlations among successive observations. Applications
cover virtually all areas of Statistics but some of the most important include economic
and financial time series, and many areas of environmental or ecological data.
In this course, I shall cover some of the most important methods for dealing with
these problems. In the case of time series, these include the basic definitions of
autocorrelations etc., then time-domain model fitting including autoregressive and
moving average processes, spectral methods, and some discussion of the effect of time
series correlations on other kinds of statistical inference, such as the estimation of
means and regression coefficients.
Books
1. P.J. Brockwell and R.A. Davis, Time Series: Theory and Methods, Springer
Series in Statistics (1986).
2. C. Chatfield, The Analysis of Time Series: Theory and Practice, Chapman and
Hall (1975). Good general introduction, especially for those completely new to
time series.
3. P.J. Diggle, Time Series: A Biostatistical Introduction, Oxford University Press
(1990).
4. M. Kendall, Time Series, Charles Griffin (1976).
iii
Keywords
ACF, 2
AR(p), 2
ARIMA(p,d,q), 6
ARMA(p,q), 5
autocorrelation function, 2
autocovariance function, 2, 5
autoregressive moving average
process, 5
autoregressive process, 2
Box-Jenkins, 18
classical decomposition, 1
estimation, 18
filter generating function, 12
Gaussian process, 5
identifiability, 14
identification, 18
integrated autoregressive moving
average process, 6
invertible process, 4
MA(q), 3
moving average process, 3
nondeterministic, 5
nonnegative definite sequence, 6
PACF, 11
periodogram, 15
sample partial autocorrelation
coefficient, 11
second order stationary, 2
spectral density function, 8
spectral distribution function, 8
strictly stationary, 1
strongly stationary, 1
turning point test, 4
verification, 20
weakly stationary, 2
white noise, 4
Yule-Walker equations, 3
iv
1 Models for time series
1.1 Time series data
A time series is a set of statistics, usually collected at regular intervals. Time series
data occur naturally in many application areas.
• economics - e.g., monthly data for unemployment, hospital admissions, etc.
• finance - e.g., daily exchange rate, a share price, etc.
• environmental - e.g., daily rainfall, air quality readings.
• medicine - e.g., ECG brain wave activity every 2−8 secs.
The methods of time series analysis pre-date those for general stochastic processes
and Markov Chains. The aims of time series analysis are to describe and summarise
time series data, fit low-dimensional models, and make forecasts.
We write our real-valued series of observations as . . . ,X−2,X−1,X0,X1,X2, . . ., a
doubly infinite sequence of real-valued random variables indexed by Z.
1.2 Trend, seasonality, cycles and residuals
One simple method of describing a series is that of classical decomposition. The
notion is that the series can be decomposed into four elements:
Trend (Tt) — long term movements in the mean;
Seasonal effects (It) — cyclical fluctuations related to the calendar;
Cycles (Ct) — other cyclical fluctuations (such as a business cycles);
Residuals (Et) — other random or systematic fluctuations.
The idea is to create separate models for these four elements and then combine
them, either additively
Xt = Tt + It + Ct + Et
or multiplicatively
Xt = Tt · It · Ct · Et .
1.3 Stationary processes
1. A sequence {Xt, t ∈ Z} is strongly stationary or strictly stationary if
(Xt1, . . . ,Xtk)D=(Xt1+h, . . . ,Xtk+h)
for all sets of time points t1, . . . , tk and integer h.
2. A sequence is weakly stationary, or second order stationary if
1
(a) E(Xt) = μ, and
(b) cov(Xt,Xt+k) = γk,
where μ is constant and γk is independent of t.
3. The sequence {γk, k ∈ Z} is called the autocovariance function.
4. We also define
ρk = γk/γ0 = corr(Xt,Xt+k)
and call {ρk, k ∈ Z} the autocorrelation function (ACF).
Remarks.
1. A strictly stationary process is weakly stationary.
2. If the process is Gaussian, that is (Xt1, . . . ,Xtk) is multivariate normal, for all
t1, . . . , tk, then weak stationarity implies strong stationarity.
3. γ0 = var(Xt) > 0, assuming Xt is genuinely random.
4. By symmetry, γk = γ−k, for all k.
1.4 Autoregressive processes
The autoregressive process of order p is denoted AR(p), and defined by
Xt =
Xp
r=1
φrXt−r + ǫt (1.1)
where φ1, . . . , φr are fixed constants and {ǫt} is a sequence of independent (or uncor-
related) random variables with mean 0 and variance σ2.
The AR(1) process is defined by
Xt = φ1Xt−1 + ǫt . (1.2)
To find its autocovariance function we make successive substitutions, to get
Xt = ǫt + φ1(ǫt−1 + φ1(ǫt−2 + · · · )) = ǫt + φ1ǫt−1 + φ21
ǫt−2 + · · ·
The fact that {Xt} is second order stationary follows from the observation that
E(Xt) = 0 and that the autocovariance function can be calculated as follows:
γ0 = E
􀀀
ǫt + φ1ǫt−1 + φ21
ǫt−2 + · · ·
2
=
􀀀
1 + φ21
+ φ41
+ · · ·

σ2 =
σ2
1 − φ21
γk = E

∞X
r=0
φr
1ǫt−r
∞X
s=0
φs
1ǫt+k−s
!
=
σ2φk1
1 − φ21
.
2
There is an easier way to obtain these results. Multiply equation (1.2) by Xt−k
and take the expected value, to give
E(XtXt−k) = E(φ1Xt−1Xt−k) + E(ǫtXt−k) .
Thus γk = φ1γk−1, k = 1, 2, . . .
Similarly, squaring (1.2) and taking the expected value gives
E(X2
t ) = φ1E(X2
t−1) + 2φ1E(Xt−1ǫt) + E(ǫ2t
) = φ21
E(X2
t−1) + 0 + σ2
and so γ0 = σ2/(1 − φ21
).
More generally, the AR(p) process is defined as
Xt = φ1Xt−1 + φ2Xt−2 + · · · + φpXt−p + ǫt . (1.3)
Again, the autocorrelation function can be found by multiplying (1.3) by Xt−k, taking
the expected value and dividing by γ0, thus producing the Yule-Walker equations
ρk = φ1ρk−1 + φ2ρk−2 + · · · + φpρk−p, k = 1, 2, . . .
These are linear recurrence relations, with general solution of the form
ρk = C1ω|k| 1 + · · · + Cpω|k| p ,
where ω1, . . . , ωp are the roots of
ωp − φ1ωp−1 − φ2ωp−2 − · · · − φp = 0
and C1, . . . ,Cp are determined by ρ0 = 1 and the equations for k = 1, . . . , p − 1. It
is natural to require γk → 0 as k → ∞, in which case the roots must lie inside the
unit circle, that is, |ωi| < 1. Thus there is a restriction on the values of φ1, . . . , φp
that can be chosen.
1.5 Moving average processes
The moving average process of order q is denoted MA(q) and defined by
Xt =
Xq
s=0
θsǫt−s (1.4)
where θ1, . . . , θq are fixed constants, θ0 = 1, and {ǫt} is a sequence of independent
(or uncorrelated) random variables with mean 0 and variance σ2.
It is clear from the definition that this is second order stationary and that
γk =

0, |k| > q
σ2Pq−|k| s=0 θsθs+k, |k| ≤ q
3
We remark that two moving average processes can have the same autocorrelation
function. For example,
Xt = ǫt + θǫt−1 and Xt = ǫt + (1/θ)ǫt−1
both have ρ1 = θ/(1 + θ2), ρk = 0, |k| > 1. However, the first gives
ǫt = Xt − θǫt−1 = Xt − θ(Xt−1 − θǫt−2) = Xt − θXt−1 + θ2Xt−2 − · · ·
This is only valid for |θ| < 1, a so-called invertible process. No two invertible
processes have the same autocorrelation function.
1.6 White noise
The sequence {ǫt}, consisting of independent (or uncorrelated) random variables with
mean 0 and variance σ2 is called white noise (for reasons that will become clear
later.) It is a second order stationary series with γ0 = σ2 and γk = 0, k 6= 0.
1.7 The turning point test
We may wish to test whether a series can be considered to be white noise, or whether
a more complicated model is required. In later chapters we shall consider various
ways to do this, for example, we might estimate the autocovariance function, say
{ˆγk}, and observe whether or not ˆγk is near zero for all k > 0.
However, a very simple diagnostic is the turning point test, which examines a
series {Xt} to test whether it is purely random. The idea is that if {Xt} is purely
random then three successive values are equally likely to occur in any of the six
possible orders.
In four cases there is a turning point in the middle. Thus in a series of n points
we might expect (2/3)(n − 2) turning points.
In fact, it can be shown that for large n, the number of turning points should
be distributed as about N(2n/3, 8n/45). We reject (at the 5% level) the hypothesis
that the series is unsystematic if the number of turning points lies outside the range
2n/3 ± 1.96
p
8n/45.
4
2 Models of stationary processes
2.1 Purely indeterministic processes
Suppose {Xt} is a second order stationary process, with mean 0. Its autocovariance
function is
γk = E(XtXt+k) = cov(Xt,Xt+k), k ∈ Z.
1. As {Xt} is stationary, γk does not depend on t.
2. A process is said to be purely-indeterministic if the regression of Xt on
Xt−q,Xt−q−1, . . . has explanatory power tending to 0 as q → ∞. That is, the
residual variance tends to var(Xt).
An important theorem due to Wold (1938) states that every purely-
indeterministic second order stationary process {Xt} can be written in the form
Xt = μ + θ0Zt + θ1Zt−1 + θ2Zt−2 + · · ·
where {Zt} is a sequence of uncorrelated random variables.
3. A Gaussian process is one for which Xt1, . . . ,Xtn has a joint normal distri-
bution for all t1, . . . , tn. No two distinct Gaussian processes have the same
autocovariance function.
2.2 ARMA processes
The autoregressive moving average process, ARMA(p, q), is defined by
Xt −
Xp
r=1
φrXt−r =
Xq
s=0
θsǫt−s
where again {ǫt} is white noise. This process is stationary for appropriate φ, θ.
Example 2.1
Consider the state space model
Xt = φXt−1 + ǫt ,
Yt = Xt + ηt .
Suppose {Xt} is unobserved, {Yt} is observed and {ǫt} and {ηt} are independent
white noise sequences. Note that {Xt} is AR(1). We can write
ξt = Yt − φYt−1
= (Xt + ηt) − φ(Xt−1 + ηt−1)
= (Xt − φXt−1) + (ηt − φηt−1)
= ǫt + ηt − φηt−1
5
Now ξt is stationary and cov(ξt, ξt+k) = 0, k ≥ 2. As such, ξt can be modelled as a
MA(1) process and {Yt} as ARMA(1, 1).
2.3 ARIMA processes
If the original process {Yt} is not stationary, we can look at the first order difference
process
Xt = ∇Yt = Yt − Yt−1
or the second order differences
Xt = ∇2Yt = ∇(∇Y )t = Yt − 2Yt−1 + Yt−2
and so on. If we ever find that the differenced process is a stationary process we can
look for a ARMA model of that.
The process {Yt} is said to be an autoregressive integrated moving average
process, ARIMA(p, d, q), if Xt = ∇dYt is an ARMA(p, q) process.
AR, MA, ARMA and ARIMA processes can be used to model many time series.
A key tool in identifying a model is an estimate of the autocovariance function.
2.4 Estimation of the autocovariance function
Suppose we have data (X1, . . . ,XT ) from a stationary time series. We can estimate
• the mean by ¯X = (1/T)
PT
1 Xt,
• the autocovariance by ck = ˆγk = (1/T)
PT
t=k+1(Xt − ¯X )(Xt−k − ¯X ), and
• the autocorrelation by rk = ˆρk = ˆγk/ˆγ0.
The plot of rk against k is known as the correlogram. If it is known that μ is 0
there is no need to correct for the mean and γk can be estimated by
ˆγk = (1/T)
PT
t=k+1XtXt−k .
Notice that in defining ˆγk we divide by T rather than by (T − k). When T is
large relative to k it does not much matter which divisor we use. However, for
mathematical simplicity and other reasons there are advantages in dividing by T.
Suppose the stationary process {Xt} has autocovariance function {γk}. Then
var

XT
t=1
atXt
!
=
XT
t=1
XT
s=1
atas cov(Xt,Xs) =
XT
t=1
XT
s=1
atasγ|t−s| ≥ 0.
A sequence {γk} for which this holds for every T ≥ 1 and set of constants (a1, . . . , aT )
is called a nonnegative definite sequence. The following theorem states that {γk}
is a valid autocovariance function if and only if it is nonnegative definite.
6
Theorem 2.2 (Blochner) The following are equivalent.
1. There exists a stationary sequence with autocovariance function {γk}.
2. {γk} is nonnegative definite.
3. The spectral density function,
f(ω) =
1
π
∞X
k=−∞
γkeikω =
1
π
γ0 +
2
π
∞X
k=1
γk cos(ωk) ,
is positive if it exists.
Dividing by T rather than by (T − k) in the definition of ˆγk
• ensures that {ˆγk} is nonnegative definite (and thus that it could be the autoco-
variance function of a stationary process), and
• can reduce the L2-error of rk.
2.5 Identifying a MA(q) process
In a later lecture we consider the problem of identifying an ARMA or ARIMA model
for a given time series. A key tool in doing this is the correlogram.
The MA(q) process Xt has ρk = 0 for all k, |k| > q. So a diagnostic for MA(q) is
that |rk| drops to near zero beyond some threshold.
2.6 Identifying an AR(p) process
The AR(p) process has ρk decaying exponentially. This can be difficult to recognise
in the correlogram. Suppose we have a process Xt which we believe is AR(k) with
Xt =
Xk
j=1
φj,kXt−j + ǫt
with ǫt independent of X1, . . . ,Xt−1.
Given the data X1, . . . ,XT , the least squares estimates of (φ1,k, . . . , φk,k) are ob-
tained by minimizing
1
T
XT
t=k+1

Xt −
Xk
j=1
φj,kXt−j
!2
.
This is approximately equivalent to solving equations similar to the Yule-Walker
equations,
ˆγj =
Xk
ℓ=1
ˆφℓ,kˆγ|j−ℓ|, j = 1, . . . , k
These can be solved by the Levinson-Durbin recursion:
7
Step 0. σ2
0 := ˆγ0, ˆφ1,1 = ˆγ1/ˆγ0, k := 0
Step 1. Repeat until ˆφk,k near 0:
k := k + 1
ˆφk,k :=

ˆγk −
Xk−1
j=1
ˆφj,k−1ˆγk−j
!,
σ2
k−1
ˆφj,k := ˆφj,k−1 − ˆφk,k ˆφk−j,k−1, for j = 1, . . . , k − 1
σ2
k := σ2
k−1(1 − ˆφ2
k,k)
We test whether the order k fit is an improvement over the order k−1 fit by looking
to see if ˆφk,k is far from zero.
The statistic ˆφk,k is called the kth sample partial autocorrelation coefficient
(PACF). If the process Xt is genuinely AR(p) then the population PACF, φk,k, is
exactly zero for all k > p. Thus a diagnostic for AR(p) is that the sample PACFs
are close to zero for k > p.
2.7 Distributions of the ACF and PACF
Both the sample ACF and PACF are approximately normally distributed about
their population values, and have standard deviation of about 1/√T, where T is the
length of the series. A rule of thumb it that ρk is negligible (and similarly φk,k) if
rk (similarly ˆφk,k) lies between ±2/√T. (2 is an approximation to 1.96. Recall that
if Z1, . . . , Zn ∼ N(μ, 1), a test of size 0.05 of the hypothesis H0 : μ = 0 against
H1 : μ 6= 0 rejects H0 if and only if ¯ Z lies outside ±1.96/√n).
Care is needed in applying this rule of thumb. It is important to realize
that the sample autocorrelations, r1, r2, . . ., (and sample partial autocorrelations,
ˆφ1,1, ˆφ2,2, . . .) are not independently distributed. The probability that any one rk
should lie outside ±2/√T depends on the values of the other rk.
A ‘portmanteau’ test of white noise (due to Box & Pierce and Ljung & Box) can
be based on the fact that approximately
Q′m = T(T + 2)
Xm
k=1
(T − k)−1r2
k ∼ χ2
m .
The sensitivity of the test to departure from white noise depends on the choice of
m. If the true model is ARMA(p, q) then greatest power is obtained (rejection of the
white noise hypothesis is most probable) when m is about p + q.
8
3 Spectral methods
3.1 The discrete Fourier transform
If h(t) is defined for integers t, the discrete Fourier transform of h is
H(ω) =
∞X
t=−∞
h(t)e−iωt, −π ≤ ω ≤ π
The inverse transform is
h(t) =
1

Z π
−π
eiωtH(ω) dω .
If h(t) is real-valued, and an even function such that h(t) = h(−t), then
H(ω) = h(0) + 2
∞X
t=1
h(t) cos(ωt)
and
h(t) =
1
π
Z π
0
cos(ωt)H(ω) dω .
3.2 The spectral density
The Wiener-Khintchine theorem states that for any real-valued stationary process
there exists a spectral distribution function, F(·), which is nondecreasing and
right continuous on [0, π] such that F(0) = 0, F(π) = γ0 and
γk =
Z π
0
cos(ωk) dF(ω) .
The integral is a Lebesgue-Stieltges integral and is defined even if F has disconti-
nuities. Informally, F(ω2) − F(ω1) is the contribution to the variance of the series
made by frequencies in the range (ω1, ω2).
F(·) can have jump discontinuities, but always can be decomposed as
F(ω) = F1(ω) + F2(ω)
where F1(·) is a nondecreasing continuous function and F2(·) is a nondecreasing
step function. This is a decomposition of the series into a purely indeterministic
component and a deterministic component.
Suppose the process is purely indeterministic, (whiP ch happens if and only if
k |γk| < ∞). In this case F(·) is a nondecreasing continuous function, and dif-
ferentiable at all points (except possibly on a set of measure zero). Its derivative
f(ω) = F′(ω) exists, and is called the spectral density function. Apart from a
9
multiplication by 1/π it is simply the discrete Fourier transform of the autocovariance
function and is given by
f(ω) =
1
π
∞X
k=−∞
γke−ikω =
1
π
γ0 +
2
π
∞X
k=1
γk cos(ωk) ,
with inverse
γk =
Z π
0
cos(ωk)f(ω) dω .
Note. Some authors define the spectral distribution function on [−π, π]; the use of
negative frequencies makes the interpretation of the spectral distribution less intuitive
and leads to a difference of a factor of 2 in the definition of the spectra density.
Notice, however, that if f is defined as above and extended to negative frequencies,
f(−ω) = f(ω), then we can write
γk =
Z π
−π
1
2eiωkf(ω) dω .
Example 3.1
(a) Suppose {Xt} is i.i.d., γ0 = var(Xt) = σ2 > 0 and γk = 0, k ≥ 1. Then
f(ω) = σ2/π. The fact that the spectral density is flat means that all frequencies
are equally present accounts for our calling this sequence white noise.
(b) As an example of a process which is not purely indeterministic, consider Xt =
cos(ω0t + U) where ω0 is a value in [0, π] and U ∼ U[−π, π]. The process has
zero mean, since
E(Xt) =
1

Z π
−π
cos(ω0t + u) du = 0
and autocovariance
γk = E(Xt,Xt+k)
=
1

Z π
−π
cos(ω0t + u) cos(ω0t + ω0k + u)du
=
1

Z π
−π
1
2 [cos(ω0k) + cos(2ω0t + ω0k + 2u)] du
=
1

1
2[2π cos(ω0k) + 0]
=
1
2
cos(ω0k) .
Hence Xt is second order stationary and we have
γk =
1
2
cos(ω0k), F(ω) =
1
2
I[ω≥ω0] and f(ω) =
1
2
δω0(ω) .
10
Note that F is a nondecreasing step function.
More generally, the spectral density
f(ω) =
Xn
j=1
1
2
ajδωj(ω)
corresponds to the process Xt =
Pn
j=1 aj cos(ωjt + Uj) where ωj ∈ [0, π] and
U1, . . . , Un are i.i.d. U[−π, π].
(c) The MA(1) process, Xt = θ1ǫt−1 + ǫt, where {ǫt} is white noise. Recall γ0 =
(1 + θ2
1)σ2, γ1 = θ1σ2, and γk = 0, k > 1. Thus
f(ω) =
σ2(1 + 2θ1 cos ω + θ2
1)
π
.
(d) The AR(1) process, Xt = φ1Xt−1 + ǫt, where {ǫt} is white noise. Recall
var(Xt) = φ21
var(Xt−1) + σ2 =⇒ γ0 = φ21
γ0 + σ2 =⇒ γ0 = σ2/(1 − φ21
)
where we need |φ1| < 1 for Xt stationary. Also,
γk = cov(Xt,Xt−k) = cov(φ1Xt−1 + ǫt,Xt−k) = φ1γk−1.
So γk = φ|k| 1 γ0, k ∈ Z. Thus
f(ω) =
γ0
π
+
2
π
∞X
k=1
φk1
γ0 cos(kω) =
γ0
π
(
1 +
∞X
k=1
φk1

eiωk + e−iωk
)
=
γ0
π

1 +
φ1eiω
1 − φ1eiω +
φ1e−iω
1 − φ1e−iω

=
γ0
π
1 − φ21
1 − 2φ1 cos ω + φ21
=
σ2
π(1 − 2φ1 cos ω + φ21
)
.
Note that φ > 0 has power at low frequency, whereas φ < 0 has power at high
frequency.
0
0
0
0
1 2 3 1 2 3
4 4
ω ω
f(ω) f(ω)
φ1 = 1
2 φ1 = −1
2
11
Plots above are the spectral densities for AR(1) processes in which {ǫt} is Gaussian
white noise, with σ2/π = 1. Samples for 200 data points are shown below.
1 50 100 150 200
1 = 1
2
1 50 100 150 200
1 = −
1
2
3.3 Analysing the effects of smoothing
Let {as} be a sequence of real numbers. A linear filter of {Xt} is
Yt =
∞X
s=−∞
asXt−s .
In Chapter 5 we show that the spectral density of {Yt} is given by
fY (ω) = |a(ω)|2 fX(ω) ,
where a(z) is the transfer function
a(ω) =
∞X
s=−∞
aseiωs .
This result can be used to explore the effect of smoothing a series.
Example 3.2
Suppose the AR(1) series above, with φ1 = −0.5, is smoothed by a moving average
on three points, so that smoothed series is
Yt = 1
3 [Xt+1 + Xt + Xt−1] .
Then |a(ω)|2 = |1
3e−iω + 1
3 + 1
3eiω|2 = 1
9(1 + 2 cos ω)2.
Notice that γX(0) = 4π/3, γY (0) = 2π/9, so {Yt} has 1/6 the variance of {Xt}.
Moreover, all components of frequency ω = 2π/3 (i.e., period 3) are eliminated in
the smoothed series.
0 0 1
1
2 3
ω
|a(ω)|2
0 0 1
1
2
2
3
3
4
5
ω
= |a(ω)|2fX(ω)
fY (ω)
12
4 Estimation of the spectrum
4.1 The periodogram
Suppose we have T = 2m + 1 observations of a time series, y1, . . . , yT . Define the
Fourier frequencies, ωj = 2πj/T, j = 1, . . . ,m, and consider the regression model
yt = α0 +
Xm
j=1
αj cos(ωjt) +
Xm
j=1
βj sin(ωjt) ,
which can be written as a general linear model, Y = Xθ + ǫ, where
Y =


y1
...
yT

, X =


1 c11 s11 · · · cm1 sm1
...
...
...
...
...
1 c1T s1T · · · cmT smT

, θ =


α0
α1
β1
...
αm
βm


, ǫ =


ǫ1
... ǫT

 ′
cjt = cos(ωjt), sjt = sin(ωjt) .
The least squares estimates in this model are given by
ˆθ = (X⊤X)−1X⊤Y .
Note that
XT
t=1
eiωj t =
eiωj(1 − eiωjT )
1 − eiωj
= 0
=⇒
XT
t=1
cjt + i
XT
t=1
sjt = 0 =⇒
XT
t=1
cjt =
XT
t=1
sjt = 0
and
XT
t=1
cjtsjt = 1
2
XT
t=1
sin(2ωjt) = 0 ,
XT
t=1
c2j
t = 1
2
XT
t=1
{1 + cos(2ωjt)} = T/2 ,
XT
t=1
s2j
t = 1
2
XT
t=1
{1 − cos(2ωjt)} = T/2 ,
XT
t=1
cjtskt =
XT
t=1
cjtckt =
XT
t=1
sjtskt = 0, j 6= k .
13
Using these, we have
ˆθ =


ˆα0
ˆα1
...
ˆ βm


=


T 0 · · · 0
0 T/2 · · · 0
...
...
...
0 0 · · · T/2


−1

P
P t yt
t c1tyt
...
P
t smtyt


=


¯y
(2/T)
P
t c1tyt
...
(2/T)
P
t smtyt


and the regression sum of squares is
ˆ Y ⊤ ˆ Y = Y ⊤X(X⊤X)−1X⊤Y = T ¯y2 +
Xm
j=1
2
T


(
XT
t=1
cjtyt
)2
+
(
XT
t=1
sjtyt
)2

 .
Since we are fitting T unknown parameters to T data points, the model fits with no
residual error, i.e., ˆ Y = Y . Hence
XT
t=1
(yt − ¯y)2 =
Xm
j=1
2
T


(
XT
t=1
cjtyt
)2
+
(
XT
t=1
sjtyt
)2

 .
This motivates definition of the periodogram as
I(ω) =
1
πT


(
XT
t=1
yt cos(ωt)
)2
+
(
XT
t=1
yt sin(ωt)
)2

 .
A factor of (1/2π) has been introduced into this definition so that the sample variance,
ˆγ0 = (1/T)
PT
t=1(yt − ¯y)2, equates to the sum of the areas of m rectangles, whose
heights are I(ω1), . . . , I(ωm), whose widths are 2π/T, and whose bases are centred
at ω1, . . . , ωm. I.e., ˆγ0 = (2π/T)
Pm
j=1 I(ωj). These rectangles approximate the area
under the curve I(ω), 0 ≤ ω ≤ π.
0
I(ω)
I(ω5)
ω5
2π/T
π
14
Using the fact that
PT
t=1 cjt =
PT
t=1 sjt = 0, we can write
πTI(ωj) =
(
XT
t=1
yt cos(ωjt)
)2
+
(
XT
t=1
yt sin(ωjt)
)2
=
(
XT
t=1
(yt − ¯y) cos(ωjt)
)2
+
(
XT
t=1
(yt − ¯y) sin(ωjt)
)2
=

XT
t=1
(yt − ¯y)eiωjt

2
=
XT
t=1
(yt − ¯y)eiωjt
XT
s=1
(ys − ¯y)e−iωjs
=
XT
t=1
(yt − ¯y)2 + 2
XT−1
k=1
XT
t=k+1
(yt − ¯y)(yt−k − ¯y) cos(ωjk) .
Hence
I(ωj) =
1
π
ˆγ0 +
2
π
XT−1
k=1
ˆγk cos(ωjk) .
I(ω) is therefore a sample version of the spectral density f(ω).
4.2 Distribution of spectral estimates
If the process is stationary and the spectral density exists then I(ω) is an almost
unbiased estimator of f(ω), but it is a rather poor estimator without some smoothing.
Suppose {yt} is Gaussian white noise, i.e., y1, . . . , yT are iid N(0, σ2). Then for
any Fourier frequency ω = 2πj/T,
I(ω) =
1
πT

A(ω)2 + B(ω)2
, (4.1)
where
A(ω) =
XT
t=1
yt cos(ωt) , B(ω) =
XT
t=1
yt sin(ωt) . (4.2)
Clearly A(ω) and B(ω) have zero means, and
var[A(ω)] = σ2
XT
t=1
cos2(ωt) = Tσ2/2 ,
var[B(ω)] = σ2
XT
t=1
sin2(ωt) = Tσ2/2 ,
15
cov[A(ω),B(ω)] = E
"
XT
t=1
XT
s=1
ytys cos(ωt) sin(ωs)
#
= σ2
XT
t=1
cos(ωt) sin(ωt) = 0 .
Hence A(ω)
p
2/Tσ2 and B(ω)
p
2/Tσ2 are independently distributed as N(0, 1), and
2

A(ω)2 + B(ω)2

/(Tσ2) is distributed as χ22
. This gives I(ω) ∼ (σ2/π)χ22
/2. Thus
we see that I(w) is an unbiased estimator of the spectrum, f(ω) = σ2/π, but it is not
consistent, since var[I(ω)] = σ4/π2 does not tend to 0 as T → ∞. This is perhaps
surprising, but is explained by the fact that as T increases we are attempting to
estimate I(ω) for an increasing number of Fourier frequencies, with the consequence
that the precision of each estimate does not change.
By a similar argument, we can show that for any two Fourier frequencies, ωj and
ωk the estimates I(ωj) and I(ωk) are statistically independent. These conclusions
hold more generally.
Theorem 4.1 Let {Yt} be a stationary Gaussian process with spectrum f(ω). Let
I(·) be the periodogram based on samples Y1, . . . , YT , and let ωj = 2πj/T, j < T/2,
be a Fourier frequency. Then in the limit as T → ∞,
(a) I(ωj) ∼ f(ωj)χ22
/2.
(b) I(ωj) and I(ωk) are independent for j 6= k.
Assuming that the underlying spectrum is smooth, f(ω) is nearly constant over a
small range of ω. This motivates use of an estimator for the spectrum of
ˆ f(ωj) =
1
2p + 1
Xp
ℓ=−p
I(ωj+ℓ) .
Then ˆ f(ωj) ∼ f(ωj)χ22
(2p+1)/[2(2p+1)], which has variance f(ω)2/(2p+1). The idea
is to let p → ∞ as T → ∞.
4.3 The fast Fourier transform
I(ωj) can be calculated from (4.1)–(4.2), or from
I(ωj) =
1
πT

XT
t=1
yteiωjt

2
.
Either way, this requires of order T multiplications. Hence to calculate the complete
periodogram, i.e., I(ω1), . . . , I(ωm), requires of order T2 multiplications. Computa-
tion effort can be reduced significantly by use of the fast Fourier transform, which
computes I(ω1), . . . , I(ωm) using only order T log2 T multiplications.
16
5 Linear filters
5.1 The Filter Theorem
A linear filter of one random sequence {Xt} into another sequence {Yt} is
Yt =
∞X
s=−∞
asXt−s . (5.1)
Theorem 5.1 (the filter theorem) Suppose Xt is a stationary time series with spec-
tral density fX(ω). Let {at} be a sequence of real numbers such that
P
∞t
=−∞ |at| < ∞.
Then the process Yt =
P
∞s
=−∞
asXt−s is a stationary time series with spectral density
function
fY (ω) =

A(eiω)

2
fX(ω) = |a(ω)|2 fX(ω) ,
where A(z) is the filter generating function
A(z) =
∞X
s=−∞
aszs, |z| ≤ 1 .
and a(ω) = A(eiω) is the transfer function of the linear filter.
Proof.
cov(Yt, Yt+k) =
X
r∈Z
X
s∈Z
aras cov(Xt−r,Xt+k−s)
=
X
r,s∈Z
arasγk+r−s
=
X
r,s∈Z
aras
Z π
−π
1
2eiω(k+r−s)fX(ω)dω
=
Z π
−π
A(eiω)A(e−iω)1
2eiωkfX(ω)dω
=
Z π
−π
1
2eiωk

A(eiω)

2
fX(ω)dω
=
Z π
−π
1
2eiωkfY (ω)dω .
Thus fY (ω) is the spectral density for Y and Y is stationary.
5.2 Application to autoregressive processes
Let us use the notation B for the backshift operator
B0 = I, (B0X)t = Xt, (BX)t = Xt−1, (B2X)t = Xt−2, . . .
17
Then the AR(p) process can be written as
(I −
Pp
r=1 φrBr)X = ǫ
or φ(B)X = ǫ, where φ is the function
φ(z) = 1 −
Pp
r=1 φrzr .
By the filter theorem, fǫ(ω) = |φ
􀀀
eiω)|2fX(ω

, so since fǫ(ω) = σ2/π,
fX(ω) =
σ2
π|φ(eiω)|2 . (5.2)
As fX(ω) = (1/π)
P
∞k
=−∞
γke−iωk, we can calculate the autocovariances by ex-
panding fX(ω) as a power series in eiω. For this to work, the zeros of φ(z) must lie
outside the unit circle in C. This is the stationarity condition for the AR(p) process.
Example 5.2
For the AR(1) process, Xt − φ1Xt−1 = ǫt, we have φ(z) = 1 − φ1z, with its zero at
z = 1/φ1. The stationarity condition is |φ1| < 1. Using (5.2) we find
fX(ω) =
σ2
π|1 − φeiω|2 =
σ2
π(1 − 2φ cos ω + φ2)
,
which is what we found by other another method in Example 3.1(c). To find the
autocovariances we can write, taking z = eiω,
1
|φ1(z)|2 =
1
φ1(z)φ1(1/z)
=
1
(1 − φ1z)(1 − φ1/z)
=
∞X
r=0
φr
1zr ∞X
s=0
φs
1z−s
=
∞X
k=−∞
zk(φ|k| 1 (1 + φ21
+ φ41
+ · · · )) =
∞X
k=−∞
zkφ|k| 1
1 − φ21
=⇒ fX(ω) =
1
π
∞X
k=−∞
σ2φ|k| 1
1 − φ21
eiωk
and so γk = σ2φ|k| 1 /(1 − φ21
) as we saw before.
In general, it is often easier to calculate the spectral density function first, using
filters, and then deduce the autocovariance function from it.
5.3 Application to moving average processes
The MA(q) process Xt = ǫt +
Pq
s=1 θsǫt−s can be written as
X = θ(B)ǫ
where θ(z) =
Pq
s=0 θsBs. By the filter theorem, fX(ω) = |θ(eiω)|2(σ2/π).
18
Example 5.3
For the MA(1), Xt = ǫt + θ1ǫt−1, θ(z) = 1 + θ1z and
fX(ω) =
σ2
π
􀀀
1 + 2θ1 cos ω + θ2
1

.
As above, we can obtain the autocovariance function by expressing fX(ω) as a power
series in eiω. We have
fX(ω) =
σ2
π
􀀀
θ1e−iω + (1 + θ2
1) + θ1eiω
=
σ2
π
θ(eiω)θ(e−iω)
So γ0 = σ2(1 + θ2
1), γ1 = θ1σ2, γ2 = 0, |k| > 1.
As we remarked in Section 1.5, the autocovariance function of a MA(1) process
with parameters (σ2, θ1) is identical to one with parameters (θ2
1σ2, θ−1
1 ). That is,
γ∗0 = θ2
1σ2(1 + 1/θ2
1) = σ2(1 + θ2
1) = γ0
ρ∗1 = θ−1
1 /(1 + θ−2
1 ) = θ1/(1 + θ2
1) = ρ1 .
In general, the MA(q) process can be written as X = θ(B)ǫ, where
θ(z) =
Xq
k=0
θkzk =
Yq
k=1
(ωk − z) .
So the autocovariance generating function is
g(z) =
Xq
k=−q
γkzk = σ2θ(z)θ(z−1) = σ2
Yq
k=1
(ωk − z)(ωk − z−1) . (5.3)
Note that (ωk−z)(ωk−z−1) = ω2
k(ω−1
k −z)(ω−1
k −z−1). So g(z) is unchanged in (5.3)
if (for any k such that ωk is real) we replace ωk by ω−1
k and multiply σ2 by ω2
k. Thus
(if all roots of θ(z) = 0 are real) there can be 2q different MA(q) processes with the
same autocovariance function. For identifiability, we assume that all the roots of
θ(z) lie outside the unit circle in C. This is equivalent to the invertibility condition,
that ǫt can be written as a convergent power series in {Xt,Xt−1, . . .}.
5.4 The general linear process
A special case of (5.1) is the general linear process,
Yt =
∞X
s=0
asXt−s ,
where {Xt} is white noise. This has
cov(Yt, Yt+k) = σ2 ∞X
s=0
asas+k ≤ σ2 ∞X
s=0
a2s
,
19
where the inequality is an equality when k = 0. Thus {Yt} is stationary if and
only if
P
∞s
=0 a2s
< ∞. In practice the general linear model is useful when the as are
expressible in terms of a finite number of parameters which can be estimated. A rich
class of such models are the ARMA models.
5.5 Filters and ARMA processes
The ARMA(p, q) model can be written as φ(B)X = θ(B)ǫ. Thus
|φ(eiω)|2fX(ω) = |θ(eiω)|2σ2
π
=⇒ fX(ω) =

θ(eiω)
φ(eiω)

2 σ2
π
.
This is subject to the conditions that
• the zeros of φ lie outside the unit circle in C for stationarity.
• the zeros of θ lie outside the unit circle in C for identifiability.
• φ(z) and θ(z) have no common roots.
If there were a common root, say 1/α, so that (I − αB)φ1(B)X = (I − αB)θ1(B)ǫ,
then we could multiply both sides by
P
∞n
=0 αnBn and deduce φ1(B)X = θ1(B)ǫ, and
thus that a more economical ARMA(p − 1, q − 1) model suffices.
5.6 Calculating autocovariances in ARMA models
As above, the filter theorem can assist in calculating the autocovariances of a model.
These can be compared with autocovariances estimated from the data. For example,
an ARMA(1, 2) has
φ(z) = 1 − φz, θ(z) = 1 + θ1z + θ2z2, where |φ| < 1.
Then X = C(B)ǫ, where
C(z) = θ(z)/φ(z) =
􀀀
1 + θ1z + θ2z2 ∞X
n=0
φnzn =
∞X
n=0
cnzn ,
with c0 = 1, c1 = φ + θ1, and
cn = φn + φn−1θ1 + φn−1θ2 = φn−2 􀀀
φ2 + φθ1 + θ2
, n ≥ 2.
So Xt =
P
∞n
=0 cnǫt−n and we can compute covariances as
γk = cov(Xt,Xt+k) =
∞X
n,m=0
cncm cov(ǫt−n, ǫt+k−m) =
∞X
n=0
cncn+kσ2 .
For example, γk = φγk−1, k ≥ 3. As a test of whether the model is ARMA(1, 2)
we might look to see if the sample autocovariances decay geometrically, for k ≥ 2,
20
6 Estimation of trend and seasonality
6.1 Moving averages
Consider a decomposition into trend, seasonal, cyclic and residual components.
Xt = Tt + It + Ct + Et .
Thus far we have been concerned with modelling {Et}. We have also seen that the
periodogram can be useful for recognising the presence of {Ct}.
We can estimate trend using a symmetric moving average,
ˆ Tt =
Xk
s=−k
asXt+s ,
where as = a−s. In this case the transfer function is real-valued.
The choice of moving averages requires care. For example, we might try to esti-
mate the trend with
ˆ Tt = 1
3 (Xt−1 + Xt + Xt+1) .
But suppose Xt = Tt + ǫt, where trend is the quadratic Tt = a + bt + ct2. Then
ˆ Tt = Tt + 2
3c + 1
3(ǫt−1 + ǫt + ǫt+1) ,
so E ˆ Tt = EXt + 2
3c and thus ˆ T is a biased estimator of the trend.
This problem is avoided if we estimate trend by fitting a polynomial of sufficient
degree, e.g., to find a cubic that best fits seven successive points we minimize
X3
t=−3
􀀀
Xt − b0 − b1t − b2t2 − b3t3 2
.
So P
Xt = 7ˆb0 + 28ˆb P 2
tXt = 28ˆb1 + 196ˆb P 3
t2Xt = 28ˆb0 + 196ˆb P 2
t3Xt = 196ˆb1 + 1588ˆb3
Then
ˆb
0 = 1
21
􀀀
7
P
Xt −
P
t2Xt

= 1
21 (−2X−3 + 3X−2 + 6X−1 + 7X0 + 6X1 + 3X2 − 2X3) .
We estimate the trend at time 0 by ˆ T0 =ˆb0, and similarly,
ˆ Tt = 1
21 (−2Xt−3 + 3Xt−2 + 6Xt−1 + 7Xt + 6Xt+1 + 3Xt+2 − 2Xt+3) .
21
A notation for this moving average is 1
21[−2, 3, 6, 7, 6, 3,−2]. Note that the weights
sum to 1. In general, we can fit a polynomial of degree q to 2q+1 points by applying a
symmetric moving average. (We fit to an odd number of points so that the midpoint
of fitted range coincides with a point in time at which data is measured.)
A value for q can be identified using the variate difference method: if {Xt} is
indeed a polynomial of degree q, plus residual error {ǫt}, then the trend in rXt is
a polynomial of degree q − r and
qXt = constant + qǫt = constant + ǫt −

q
1

ǫt−1 +

q
2

ǫt−2 − · · · + (−1)qǫt−q .
The variance of qXt is therefore
var( qǫt) =
"
1 +

q
1
2
+

q
2
2
+ · · · + 1
#
σ2 =

2q
q

σ2 ,
where the simplification in the final line comes from looking at the coefficient of zq
in expansions of both sides of
(1 + z)q(1 + z)q = (1 + z)2q .
Define Vr = var( rXt)/
􀀀2r
r

. The fact that the plot of Vr against r should flatten out
at r ≥ q can be used to identify q.
6.2 Centred moving averages
If there is a seasonal component then a centred-moving average is useful. Sup-
pose data is measured quarterly, then applying twice the moving average 1
4 [1, 1, 1, 1]
is equivalent to applying once the moving average 1
8[1, 2, 2, 2, 1]. Notice that this so-
called centred average of fours weights each quarter equally. Thus if Xt = It+ǫt,
where It has period 4, and I1 + I2 + I3 + I4 = 0, then ˆ Tt has no seasonal com-
ponent. Similarly, if data were monthly we use a centred average of 12s, that is,
1
24[1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1].
6.3 The Slutzky-Yule effect
To remove both trend and seasonal components we might successively apply a number
of moving averages, one or more to remove trend and another to remove seasonal
effects. This is the procedure followed by some standard forecasting packages.
However, there is a danger that application of successive moving averages can
introduce spurious effects. The Slutzky-Yule effect is concerned with the fact that
a moving average repeatedly applied to a purely random series can introduce artificial
cycles. Slutzky (1927) showed that some trade cycles of the nineteenth century were
no more than artifacts of moving averages that had been used to smooth the data.
22
To illustrate this idea, suppose the moving average 1
6[−1, 2, 4, 2,−1] is applied k
times to a white noise series. This moving average has transfer function, a(ω) = 1
6(4+
4 cos ω−2 cos 2ω), which is maximal at ω = π/3. The smoothed series has a spectral
density, say fk(ω), proportional to a(ω)2k, and hence for ω 6= π/3, fk(ω)/fk(π/3) → 0
as k → ∞. Thus in the limit the smoothed series is a periodic wave with period 6.
6.4 Exponential smoothing
Single exponential smoothing
Suppose the mean level of a series drifts slowly over time. A naive one-step-ahead
forecast is Xt(1) = Xt. However, we might let all past observations play a part in
the forecast, but give greater weights to those that are more recent. Choose weights
to decrease exponentially and let
Xt(1) =
1 − ω
1 − ωt
􀀀
Xt + ωXt−1 + ω2Xt−2 + · · · + ωt−1X1

,
where 0 < ω < 1. Define St as the right hand side of the above as t → ∞, i.e.,
St = (1 − ω)
∞X
s=0
ωsXt−s .
St can serve as a one-step-ahead forecast, Xt(1). St is known as simple exponential
smoothing. Let α = 1 − ω. Simple algebra gives
St = αXt + (1 − α)St−1
Xt(1) = Xt−1(1) + α[Xt − Xt−1(1)] .
This shows that the one-step-ahead forecast at time t is the one-step-ahead forecast
at time t − 1, modified by α times the forecasting error incurred at time t − 1.
To get things started we might set S0 equal to the average of the first few data
points. We can play around with α, choosing it to minimize the mean square fore-
casting error. In practice, α in the range 0.25–0.5 usually works well.
Double exponential smoothing
Suppose the series is approximately linear, but with a slowly varying trend. If it
were true that Xt = b0 + b1t + ǫt, then
St = (1 − ω)
∞X
s=0
ωs (b0 + b1(t − s) + ǫt)
= b0 + b1t − b1(1 − ω)
∞X
s=0
ωss + b1(1 − ω)
∞X
s=0
ωsǫt−s ,
23
and hence
ESt = b0 + b1t − b1ω/(1 − ω) = EXt+1 − b1/(1 − ω) .
Thus the forecast has a bias of −b1/(1−ω). To eliminate this bias let S1
t = St be the
first smoothing, and S2
t = αS1
t +(1−α)S2
t−1 be the simple exponential smoothing of
S1
t . Then
ES2
t = ES1
t − b1ω/(1 − ω) = EXt − 2b1ω/(1 − ω) ,
E(2S1
t − S2
t ) = b0 + b1t, E(S1
t − S2
t ) = b1(1 − α)/α .
This suggests the estimates ˆb0 +ˆb1t = 2S1
t − S2
t and ˆb1 = α(S1
t − S2
t )/(1 − α). The
forecasting equation is then
Xt(s) =ˆb0 +ˆb1(t + s) = (2S1
t − S2
t ) + sα(S1
t − S2
t )/(1 − α) .
As with single exponential smoothing we can experiment with choices of α and find
S1
0 and S2
0 by fitting a regression line, Xt = ˆ β0 + ˆβ1t, to the first few points of the
series and solving
S1
0 = ˆ β0 − (1 − α) ˆ β1/α, S2
0 = ˆ β0 − 2(1 − α) ˆ β1/α .
6.5 Calculation of seasonal indices
Suppose data is quarterly and we want to fit an additive model. Let ˆ I1 be the
average of X1,X5,X9, . . ., let ˆI2 be the average of X2,X6,X10, . . ., and so on for ˆI3
and ˆI4. The cumulative seasonal effects over the course of year should cancel, so that
if Xt = a + It, then Xt + Xt+1 + Xt+2 + Xt+3 = 4a. To ensure this we take our final
estimates of the seasonal indices as I∗t = ˆIt − 1
4(ˆI1 + · · · + ˆI4).
If the model is multiplicative and Xt = aIt, we again wish to see the cumulative
effects over a year cancel, so that Xt+Xt+1+Xt+2+Xt+3 = 4a. This means that we
should take I∗t = ˆIt − 1
4(ˆI1 + · · · + ˆI4) + 1, adjusting so the mean of I∗1 , I∗2 , I∗3 , I∗4 is 1.
When both trend and seasonality are to be extracted a two-stage procedure is
recommended:
(a) Make a first estimate of trend, say ˆ T1
t .
Subtract this from {Xt} and calculate first estimates of the seasonal indices, say
I1
t , from Xt − ˆ T1
t .
The first estimate of the deseasonalized series is Y 1
t = Xt − I1
t .
(b) Make a second estimate of the trend by smoothing Y 1
t , say ˆ T2
t .
Subtract this from {Xt} and calculate second estimates of the seasonal indices,
say I2
t , from Xt − ˆ T2
t .
The second estimate of the deseasonalized series is Y 2
t = Xt − I2
t .
24
7 Fitting ARIMA models
7.1 The Box-Jenkins procedure
A general ARIMA(p, d, q) model is φ(B)∇(B)dX = θ(B)ǫ, where ∇(B) = I − B.
The Box-Jenkins procedure is concerned with fitting an ARIMA model to data.
It has three parts: identification, estimation, and verification.
7.2 Identification
The data may require pre-processing to make it stationary. To achieve stationarity
we may do any of the following.
• Look at it.
• Re-scale it (for instance, by a logarithmic or exponential transform.)
• Remove deterministic components.
• Difference it. That is, take ∇(B)dX until stationary. In practice d = 1, 2 should
suffice.
We recognise stationarity by the observation that the autocorrelations decay to
zero exponentially fast.
Once the series is stationary, we can try to fit an ARMA(p, q) model. We consider
the correlogram rk = ˆγk/ˆγ0 and the partial autocorrelations ˆφk,k. We have already
made the following observations.
• An MA(q) process has negligible ACF after the qth term.
• An AR(p) process has negligible PACF after the pth term.
As we have noted, very approximately, both the sample ACF and PACF have stan-
dard deviation of around 1/√T, where T is the length of the series. A rule of thumb
is that ACF and PACF values are negligible when they lie between ±2/√T. An
ARMA(p, q) process has kth order sample ACF and PACF decaying geometrically
for k > max(p, q).
7.3 Estimation
AR processes
To fit a pure AR(p), i.e., Xt =
Pp
r=1 φrXt−r + ǫt we can use the Yule-Walker
equations γk =
Pp
r=1 φrγ|k−r|. We fit φ by solving ˆγk =
Pp
1 φrˆγ|k−r|, k = 1, . . . , p.
These can be solved by a Levinson-Durbin recursion, (similar to that used to solve
for partial autocorrelations in Section 2.6). This recursion also gives the estimated
25
residual variance ˆσ2
p, and helps in choice of p through the approximate log likelihood
−2 log L ≃ T log(ˆσ2
p).
Another popular way to choose p is by minimizing Akaike’s AIC (an information
criterion), defined as AIC = −2 log L + 2k, where k is the number of parameters
estimated, (in the above case p). As motivation, suppose that in a general modelling
context we attempt to fit a model with parameterised likelihood function f(X | θ),
θ ∈ , and this includes the true model for some θ0 ∈ . Let X = (X1, . . . ,Xn) be a
vector of n independent samples and let ˆθ(X) be the maximum likelihood estimator
of θ. Suppose Y is a further independent sample. Then
−2nEY EX log f

Y | ˆθ(X)

= −2EX log f

X | ˆθ(X)

+ 2k + O
􀀀
1/√n

,
where k = | |. The left hand side is 2n times the conditional entropy of Y given
ˆθ(X), i.e., the average number of bits required to specify Y given ˆθ(X). The right
hand side is approximately the AIC and this is to be minimized over a set of models,
say (f1, 1), . . . , (fm, m).
ARMA processes
Generally, we use the maximum likelihood estimators, or at least squares numerical
approximations to the MLEs. The essential idea is prediction error decomposition.
We can factorize the joint density of (X1, . . . ,XT ) as
f(X1, . . . ,XT ) = f(X1)
YT
t=2
f(Xt | X1, . . . ,Xt−1) .
Suppose the conditional distribution of Xt given (X1, . . . ,Xt−1) is normal with mean
ˆX
t and variance Pt−1, and suppose also that X1 is normal N( ˆX1, P0). Here ˆXt and
Pt−1 are functions of the unknown parameters φ1, . . . , φp, θ1, . . . , θq and the data.
The log likelihood is
−2 logL = −2 log f =
XT
t=1
"
log(2π) + log Pt−1 +
(Xt − ˆXt)2
Pt−1
#
.
We can minimize this with respect to φ1, . . . , φp, θ1, . . . , θq to fit ARMA(p, q).
Additionally, the second derivative matrix of −log L (at the MLE) is the observed
information matrix, whose inverse is an approximation to the variance-covariance
matrix of the estimators.
In practice, fitting ARMA(p, q) the log likelihood (−2 logL) is modified to sum
only over the range {m + 1, . . . , T}, where m is small.
Example 7.1
For AR(p), take m = p so ˆXt =
Pp
r=1 φrXt−r, t ≥ m + 1, Pt−1 = σ2
ǫ .
26
Note. When using this approximation to compare models with different numbers
of parameters we should always use the same m.
Again we might choose p and q by minimizing the AIC of −2 logL + 2k, where
k = p + q is the total number of parameters in the model.
7.4 Verification
The third stage in the Box-Jenkins algorithm is to check whether the model fits the
data. There are several tools we may use.
• Overfitting. Add extra parameters to the model and use likelihood ratio test or
t-test to check that they are not significant.
• Residuals analysis. Calculate the residuals from the model and plot them. The
autocorrelation functions, ACFs, PACFs, spectral densities, estimates, etc., and
confirm that they are consistent with white noise.
7.5 Tests for white noise
Tests for white noise include the following.
(a) The turning point test (explained in Lecture 1) compares the number of peaks
and troughs to the number that would be expected for a white noise series.
(b) The Box–Pierce test is based on the statistic
Qm = T
Xm
k=1
r2
k ,
where rk is the kth sample autocorrelation coefficient of the residual series, and
p + q < m ≪ T. It is called a ‘portmanteau test’, because it is based on the
all-inclusive statistic. If the model is correct then Qm ∼ χ2
m−p−q approximately.
In fact, rk has variance (T −k)/(T(T + 2)), and a somewhat more powerful test
uses the Ljung-Box statistic quoted in Section 2.7,
Q′m = T(T + 2)
Xm
k=1
(T − k)−1r2
k ,
where again, Q′m ∼ χ2
m−p−q approximately.
(c) Another test for white noise can be constructed from the periodogram. Recall
that I(ωj) ∼ (σ2/π)χ22
/2 and that I(ω1), . . . , I(ωm) are mutually independent.
Define Cj =
Pj
k=1 I(ωk) and Uj = Cj/Cm. Recall that χ22
is the same as the expo-
nential distribution and that if Y1, . . . , Ym are i.i.d. exponential random variables,
27
then (Y1 + · · · + Yj)/(Y1 + · · · + Ym), j = 1, . . . ,m − 1, have the distribution of
an ordered sample of m − 1 uniform random variables drawn from [0, 1]. Hence
under the hypothesis that {Xt} is Gaussian white noise Uj , j = 1, . . . ,m − 1
have the distribution of an ordered sample of m − 1 uniform random variables
on [0, 1]. The standard test for this is the Kolomogorov-Smirnov test, which uses
as a test statistic, D, defined as the maximum difference between the theoret-
ical distribution function for U[0, 1], F(u) = u, and the empirical distribution
ˆ F(u) = {#(Uj ≤ u)}/(m− 1). Percentage points for D can be found in tables.
7.6 Forecasting with ARMA models
Recall that φ(B)X = θ(B)ǫ, so the power series coefficieP nts of C(z) = θ(z)/φ(z) =
∞r
=0 crzr give an expression for Xt as Xt =
P
∞r
=0 crǫt−r.
But also, ǫ = D(B)X, where D(z) = φ(z)/θ(z) =
P
∞r
=0 drzr — as long as the
zeros of θ lie strictly outside the unit circle and thus ǫt =
P
∞r
=0 drXt−r.
The advantage of the representation above is that given (. . . ,Xt−1,Xt) we can
calculate values for (. . . , ǫt−1, ǫt) and so can forecast Xt+1.
In general, if we want to forecast XT+k from (. . . ,XT−1,XT ) we use
ˆX
T,k =
∞X
r=k
crǫT+k−r =
∞X
r=0
ck+rǫT−r ,
which has the least mean squared error over all linear combinations of (. . . , ǫT−1, ǫT ).
In fact,
E

( ˆXT,k − XT+k)2

= σ2
ǫ
Xk−1
r=0
c2r
.
In practice, there is an alternative recursive approach. Define
ˆX
T,k =

XT+k, −(T − 1) ≤ k ≤ 0 ,
optimal predictor of XT+k given X1, . . . ,XT , 1 ≤ k .
We have the recursive relation
ˆX
T,k =
Xp
r=1
φr ˆXT,k−r + ˆǫT+k +
Xq
s=1
θsˆǫT+k−s
For k = −(T − 1),−(T − 2), . . . , 0 this gives estimates of ˆǫt for t = 1, . . . , T.
For k > 0, this gives a forecast ˆXT,k for XT+k. We take ˆǫt = 0 for t > T.
But this needs to be started off. We need to know (Xt, t ≤ 0) and ǫt, t ≤ 0.
There are two standard approaches.
1. Conditional approach: take Xt = ǫt = 0, t ≤ 0.
2. Backcasting: we forecast the series in the reverse direction to determine estima-
tors of X0,X−1, . . . and ǫ0, ǫ−1, . . . .
2

time seires

Time series

From Wikipedia, the free encyclopedia
Time series: random data plus trend, with best-fit line and different applied filters
A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average and the annual flow volume of the Nile River at Aswan. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering.
Time series analysis comprises methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data. Time series forecasting is the use of a model to predict future values based on previously observed values. While regression analysis is often employed in such a way as to test theories that the current values of one or more independent time series affect the current value of another time series, this type of analysis of time series is not called "time series analysis", which focuses on comparing values of a single time series or multiple dependent time series at different points in time.[1]
Time series data have a natural temporal ordering. This makes time series analysis distinct from other common data analysis problems, in which there is no natural ordering of the observations (e.g. explaining people's wages by reference to their respective education levels, where the individuals' data could be entered in any order). Time series analysis is also distinct from spatial data analysis where the observations typically relate to geographical locations (e.g. accounting for house prices by the location as well as the intrinsic characteristics of the houses). A stochastic model for a time series will generally reflect the fact that observations close together in time will be more closely related than observations further apart. In addition, time series models will often make use of the natural one-way ordering of time so that values for a given period will be expressed as deriving in some way from past values, rather than from future values (see time reversibility.)
Time series analysis can be applied to real-valued, continuous data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters, such as letters and words in the English language.[2]).

Methods for time series analyses

Methods for time series analyses may be divided into two classes: frequency-domain methods and time-domain methods. The former include spectral analysis and recently wavelet analysis; the latter include auto-correlation and cross-correlation analysis. In time domain correlation analyses can be made in a filter-like manner using scaled correlation, thereby mitigating the need to operate in frequency domain.
Additionally, time series analysis techniques may be divided into parametric and non-parametric methods. The parametric approaches assume that the underlying stationary stochastic process has a certain structure which can be described using a small number of parameters (for example, using an autoregressive or moving average model). In these approaches, the task is to estimate the parameters of the model that describes the stochastic process. By contrast, non-parametric approaches explicitly estimate the covariance or the spectrum of the process without assuming that the process has any particular structure.
Methods of time series analysis may also be divided into linear and non-linear, and univariate and multivariate.

Analysis

There are several types of motivation and data analysis available for time series which are appropriate for different purposes.

Motivation

In the context of statistics, econometrics, quantitative finance, seismology, meteorology, and geophysics the primary goal of time series analysis is forecasting. In the context of signal processing, control engineering and communication engineering it is used for signal detection and estimation, while in the context of data mining, pattern recognition and machine learning time series analysis can be used for clustering, classification, query by content, anomaly detection as well as forecasting.

Exploratory analysis

Tuberculosis incidence US 1953-2009
The clearest way to examine a regular time series manually is with a line chart such as the one shown for tuberculosis in the United States, made with a spreadsheet program. The number of cases was standardized to a rate per 100,000 and the percent change per year in this rate was calculated. The nearly steadily dropping line shows that the TB incidence was decreasing in most years, but the percent change in this rate varied by as much as +/- 10%, with 'surges' in 1975 and around the early 1990s. The use of both vertical axes allows the comparison of two time series in one graphic. Other techniques include:

Prediction and forecasting

  • Fully formed statistical models for stochastic simulation purposes, so as to generate alternative versions of the time series, representing what might happen over non-specific time-periods in the future
  • Simple or fully formed statistical models to describe the likely outcome of the time series in the immediate future, given knowledge of the most recent outcomes (forecasting).
  • Forecasting on time series is usually done using automated statistical software packages and programming languages, such as R, S, SAS, SPSS, Minitab, Pandas (Python) and many others.

Classification

  • Assigning time series pattern to a specific category, for example identify a word based on series of hand movements in sign language
See main article: Statistical classification

Regression analysis (method of prediction)

  • Estimating future value of a signal based on its previous behavior, e.g. predict the price of AAPL stock based on its previous price movements for that hour, day or month, or predict position of Apollo 11 spacecraft at a certain future moment based on its current trajectory (i.e. time series of its previous locations).[5]
  • Regression analysis is usually based on statistical interpretation of time series properties in time domain, pioneered by statisticians George Box and Gwilym Jenkins in the 50s: see Box–Jenkins

Signal estimation

Segmentation

  • Splitting a time-series into a sequence of segments. It is often the case that a time-series can be represented as a sequence of individual segments, each with its own characteristic properties. For example, the audio signal from a conference call can be partitioned into pieces corresponding to the times during which each person was speaking. In time-series segmentation, the goal is to identify the segment boundary points in the time-series, and to characterize the dynamical properties associated with each segment. One can approach this problem using change-point detection, or by modeling the time-series as a more sophisticated system, such as a Markov jump linear system.

Models

Models for time series data can have many forms and represent different stochastic processes. When modeling variations in the level of a process, three broad classes of practical importance are the autoregressive (AR) models, the integrated (I) models, and the moving average (MA) models. These three classes depend linearly on previous data points.[6] Combinations of these ideas produce autoregressive moving average (ARMA) and autoregressive integrated moving average (ARIMA) models. The autoregressive fractionally integrated moving average (ARFIMA) model generalizes the former three. Extensions of these classes to deal with vector-valued data are available under the heading of multivariate time-series models and sometimes the preceding acronyms are extended by including an initial "V" for "vector", as in VAR for vector autoregression. An additional set of extensions of these models is available for use where the observed time-series is driven by some "forcing" time-series (which may not have a causal effect on the observed series): the distinction from the multivariate case is that the forcing series may be deterministic or under the experimenter's control. For these models, the acronyms are extended with a final "X" for "exogenous".
Non-linear dependence of the level of a series on previous data points is of interest, partly because of the possibility of producing a chaotic time series. However, more importantly, empirical investigations can indicate the advantage of using predictions derived from non-linear models, over those from linear models, as for example in nonlinear autoregressive exogenous models.
Among other types of non-linear time series models, there are models to represent the changes of variance over time (heteroskedasticity). These models represent autoregressive conditional heteroskedasticity (ARCH) and the collection comprises a wide variety of representation (GARCH, TARCH, EGARCH, FIGARCH, CGARCH, etc.). Here changes in variability are related to, or predicted by, recent past values of the observed series. This is in contrast to other possible representations of locally varying variability, where the variability might be modelled as being driven by a separate time-varying process, as in a doubly stochastic model.
In recent work on model-free analyses, wavelet transform based methods (for example locally stationary wavelets and wavelet decomposed neural networks) have gained favor. Multiscale (often referred to as multiresolution) techniques decompose a given time series, attempting to illustrate time dependence at multiple scales. See also Markov switching multifractal (MSMF) techniques for modeling volatility evolution.
A Hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. An HMM can be considered as the simplest dynamic Bayesian network. HMM models are widely used in speech recognition, for translating a time series of spoken words into text.

Notation

A number of different notations are in use for time-series analysis. A common notation specifying a time series X that is indexed by the natural numbers is written
X = {X1, X2, ...}.
Another common notation is
Y = {Yt: tT},
where T is the index set.

Conditions

There are two sets of conditions under which much of the theory is built:
However, ideas of stationarity must be expanded to consider two important ideas: strict stationarity and second-order stationarity. Both models and applications can be developed under each of these conditions, although the models in the latter case might be considered as only partly specified.
In addition, time-series analysis can be applied where the series are seasonally stationary or non-stationary. Situations where the amplitudes of frequency components change with time can be dealt with in time-frequency analysis which makes use of a time–frequency representation of a time-series or signal.[7]

Models

Main article: Autoregressive model
The general representation of an autoregressive model, well known as AR(p), is
 Y_t =\alpha_0+\alpha_1 Y_{t-1}+\alpha_2 Y_{t-2}+\cdots+\alpha_p Y_{t-p}+\varepsilon_t\,
where the term εt is the source of randomness and is called white noise. It is assumed to have the following characteristics:
  •  E[\varepsilon_t]=0 \, ,
  •  E[\varepsilon^2_t]=\sigma^2 \, ,
  •  E[\varepsilon_t\varepsilon_s]=0 \quad \text{ for all } t\not=s \, .
With these assumptions, the process is specified up to second-order moments and, subject to conditions on the coefficients, may be second-order stationary.
If the noise also has a normal distribution, it is called normal or Gaussian white noise. In this case, the AR process may be strictly stationary, again subject to conditions on the coefficients.
Tools for investigating time-series data include:

Measures

Time series metrics or features that can be used for time series classification or regression analysis:[9]
  • Bivariate non-linear measures

Visualization

Time series can be visualized with two categories of chart:Overlapping Charts and Separated Charts. Overlapping Charts display all time series on the same layout while Separated Charts presents them on different layouts (but aligned for comparison purpose)[13]

Overlapping Charts

  • Braided Graphs
  • Line Charts
  • Slope Graphs
  • GapChart

Separated Charts

  • Horizon Graphs
  • Reduced Line Charts (small multiples)
  • Silhouette Graph
  • Circular Silhouette Graph

See also

References

  1. Imdadullah. "Time Series Analysis". Basic Statistics and Data Analysis. itfeature.com. Retrieved 2 January 2014.
  2. Lin, Jessica; Keogh, Eamonn; Lonardi, Stefano; Chiu, Bill (2003). "A symbolic representation of time series, with implications for streaming algorithms". Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery. New York: ACM Press. doi:10.1145/882082.882086.
  3. Bloomfield, P. (1976). Fourier analysis of time series: An introduction. New York: Wiley. ISBN 0471082562.
  4. Shumway, R. H. (1988). Applied statistical time series analysis. Englewood Cliffs, NJ: Prentice Hall. ISBN 0130415006.
  5. Lawson, Charles L.; Hanson, Richard J. (1995). Solving Least Squares Problems. Philadelphia: Society for Industrial and Applied Mathematics. ISBN 0898713560.
  6. Gershenfeld, N. (1999). The Nature of Mathematical Modeling. New York: Cambridge University Press. pp. 205–208. ISBN 0521570956.
  7. Boashash, B. (ed.), (2003) Time-Frequency Signal Analysis and Processing: A Comprehensive Reference, Elsevier Science, Oxford, 2003 ISBN ISBN 0-08-044335-4
  8. Nikolić, D.; Muresan, R. C.; Feng, W.; Singer, W. (2012). "Scaled correlation analysis: a better way to compute a cross-correlogram". European Journal of Neuroscience 35 (5): 742–762. doi:10.1111/j.1460-9568.2011.07987.x.
  9. Mormann, Florian; Andrzejak, Ralph G.; Elger, Christian E.; Lehnertz, Klaus (2007). "Seizure prediction: the long and winding road". Brain 130 (2): 314–333. doi:10.1093/brain/awl241.
  10. Land, Bruce; Elias, Damian. "Measuring the ‘Complexity’ of a time series".
  11. Malakooti, Behnam (2013). Operations and Production Systems with Multiple Objectives. John Wiley & Sons. ISBN 978-1-118-58537-5.
  12. Ropella, G. E. P.; Nag, D. A.; Hunt, C. A. (2003). "Similarity measures for automated comparison of in silico and in vitro experimental results". Engineering in Medicine and Biology Society 3: 2933–2936. doi:10.1109/IEMBS.2003.1280532.
  13. Tominski, Christian; Aigner, Wolfgang. "The TimeViz Browser:A Visual Survey of Visualization Techniques for Time-Oriented Data". Retrieved 1 June 2014.TimeViz

Further reading

External links