Time Series Analysis

Article Info

Contributed by
4 authors

Last updated on
2022-02-15 11:51:35

Predictive Analytics
ARIMA Model
Regression Modelling
Exploratory Data Analysis
Time Series Smoothing
Time Series Database

Article Versions

36 2022-02-15 11:51:35
3341,3194 36,3341

By devbot5S

Migrating blockquotes to markdown syntax
35 2022-01-26 09:22:47
3194,2219 35,3194

By arvindpdmn

Removing br html tag and reformatting in markdown.
34 2020-08-20 11:14:03
2219,2218 34,2219

By arvindpdmn

Improved formatting for ARIMA equation.
33 2020-08-20 10:47:13
2218,2217 33,2218

By arvindpdmn

Updated Summary. More milestones added with images and citations. Removed all warnings.
32 2020-08-19 10:03:47
2217,2032 32,2217

By arvindpdmn

Added missing citations in Discussion, still pending for Milestones. Replaced images/videos that could not be traced to source. Link to YouTube rather than use downloaded video.

Chat Room

Submitting ...

You are editing an existing chat message.
2022-01-26 09:23:44
-

By devbot5S

[URL Check] The following URLs in this article are outdated. Please update.

Missing URLs:
References: 404 HTTP response: http://userwww.sfsu.edu/efc/classes/biol710/timeseries/timeseries1.htm
References: 404 HTTP response: https://nwfsc-timeseries.github.io/atsa-labs/sec-tslab-correlation-within-and-among-time-series.html

Redirected URLs:
References: https://www.intechopen.com/books/recent-trends-in-artificial-neural-networks-from-training-to-prediction/encountered-problems-of-time-series-with-neural-networks-models-and-architectures → https://www.intechopen.com/chapters/68706
References: http://www.forecastingsolutions.com/arima.html → https://www.hugedomains.com/domain_profile.cfm?d=forecastingsolutions.com
2020-08-20 10:48:05
-

By devbot5S

[URL Check] The following URLs in this article are outdated. Please update.

Missing URLs:
References: 403 HTTP response: https://www.jstor.org/stable/3008764
References: 403 HTTP response: https://www.jstor.org/stable/2669408
Further Reading: 403 HTTP response: https://www.jstor.org/stable/2681090

Redirected URLs:
References: https://towardsdatascience.com/anomaly-detection-with-time-series-forecasting-c34c6d04b24a → https://towardsdatascience.com/anomaly-detection-with-time-series-forecasting-c34c6d04b24a?gi=1094747941a3
2020-08-19 10:04:23
-

By devbot5S

[URL Check] The following URLs in this article are outdated. Please update.

Missing URLs:
References: 429 HTTP response: https://www.youtube.com/watch?v=k_HN0wOKDd0
References: 429 HTTP response: https://www.youtube.com/watch?v=re6N30kBMes
Further Reading: 429 HTTP response: https://www.youtube.com/watch?v=Oc1UezhDLcE
Further Reading: 429 HTTP response: https://www.youtube.com/watch?v=yZ0g-DIfVpc
Further Reading: 429 HTTP response: https://www.youtube.com/watch?v=sdvJAjL4thw

Redirected URLs:
References: https://towardsdatascience.com/achieving-stationarity-with-time-series-data-abd59fd8d5a0 → https://towardsdatascience.com/achieving-stationarity-with-time-series-data-abd59fd8d5a0?gi=fa81344d6389
2018-07-29 05:20:09
-

By arvindpdmn

Suggestions from hackathon: combine multiple models, citations, holt winters, unit root test, multivariate analysis, reorder the questions, etc.

Time series data is an ordered sequence of observations of well-defined data items at regular time intervals. Examples include daily exchange rates, bank interest rates, monthly sales, heights of ocean tides, or humidity. Time Series Analysis (TSA) finds hidden patterns and obtains useful insights from time series data. TSA is useful in predicting future values or detecting anomalies across a variety of application areas.

Historically, TSA was divided into time domain versus frequency domain approaches. The time domain approach used autocorrelation function whereas the frequency domain approach used Fourier transform of the autocorrelation function. Likewise, there are also Bayesian and non-Bayesian approaches. Today these differences are of less importance. Analysts use whatever suits the problem.

While most methods of TSA are from classical statistics, since the 1990s artificial neural networks have been used. However, these can excel only when sufficient data is available.

Discussion

What are the main objectives of time series analysis?
TSA has the following objectives:
- Describe: Describe the important features of the time series data. The first step is to plot the data to look for the possible presence of trends, seasonal variations, outliers and turning points.
- Model: Investigate and find out the generating process of the time series.
- Predict: Forecast future values of an observed time series. Applications are in predicting stock prices or product sales.
What are some applications of time series analysis?
Time series analysis for anomaly detection. Source: Krishnan 2019.
TSA used in numerous practical fields such as business, economics, finance, science, or engineering. Some typical use cases are Economic Forecasting, Sales Forecasting, Budgetary Analysis, Stock Market Analysis, Yield Projections, Process and Quality Control, Inventory Studies, Workload Projections, Utility Studies, and Census Analysis.
In TSA, we collect and study past observations of a time series data. We then develop an appropriate model that describes the inherent structure of the series. This model is then used to generate future values for the series, that is, to make forecasts. Time series analysis can be termed as the act of predicting the future by understanding the past.
Forecasting is a common need in business and economics. Besides forecasting, TSA is also useful to see how a single event affects the time series. TSA can also help towards quality control by pointing out data points that are deviating too much from the norm. Control and monitoring applications of TSA are more common in science and industry.
What are the main components of time series data?
Components of time series data. Source: Zhao 2011.
There are many factors that result in variations in time series data. The effects of these factors are studied by following four major components:
- Trends: A trend exists when there is a long-term increase or decrease in the data. It doesn't have to be linear. Sometimes we will refer to a trend as "changing direction" when it goes from an increasing trend to a decreasing trend.
- Seasonal: A seasonal pattern exists when a series is influenced by seasonal factors (quarterly, monthly, half-yearly). Seasonality is always of a fixed and known period.
- Cyclic Variation: A cyclic pattern exists when data exhibits rises and falls that are not of fixed period. The duration of these cycles is more than a year. For example, stock prices cycle between periods of high and low values but there's no set amount of time between those fluctuations.
- Irregular: The variation of observations in a time series which is unusual or unexpected. It's also termed as a Random Variation and is usually unpredictable. Floods, fires, revolutions, epidemics, and strikes are some examples.
What is a stationary series and how important is it?
Stationary vs non-stationary series. Source: Mitrani 2020.
Given a series of data points, if the mean and variance of all the data points remain constant with time, then we call it a stationary series. If these vary with time, we call it a non-stationary series.
Most prices (such as stock prices or price of Bitcoins) are not stationary. They are either drifting upward or downward. Non-stationary data are unpredictable and cannot be modeled or forecasted. The results obtained by using non-stationary time series may be spurious in that they may indicate a relationship between two variables where one doesn't exist. In order to receive consistent, reliable results, non-stationary data needs to be transformed into stationary data.
Given a non-stationary series, how can I make it stationary?
Differencing time series. Source: Shmueli 2016.
The two most common ways to make a non-stationary time series curve stationary are:
- Differencing: In order to make a series stationary, we take a difference between the data points. Suppose the original time series is \(X_1, X_2, X_3, \ldots X_n\). Series with difference of degree 1 becomes \(X_2-X_1, X_3-X_2, X_4-X_3, \ldots, X_n-X_{n-1}\). If this transformation is done only once to a series, we say that the data has been first differenced. This process essentially eliminates the trend if the series is growing at a fairly constant rate. If it's growing at an increasing/decreasing rate, we can apply the same procedure and difference the data again. The data would then be second differenced.
- Transformation: If the series can't be made stationary, we can try transforming the variables. Log transform is probably the most commonly used transformation for a diverging time series. However, it's normally suggested to use transformation only when differencing is not working.
What are the different models used in Time Series Analysis?
Some commonly used models for TSA are:
- Auto-Regressive (AR): A regression model, such as linear regression, models an output value based on a linear combination of input values. \(y = \beta_0 + \beta_1x + \epsilon\). In TSA, input variables are observations from previous time steps, called lag variables. For p=2, where p is the order of the AR model, AR(p) is \( x_t = \beta_0 + \beta_1 x_{t-1} + \beta_2 x_{t-2}\)
- Moving Average (MA): This uses past forecast errors in a regression-like model. For q=2, MA(q) is \(x_t = \theta_0 + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2}\)
- Auto-Regressive Moving Average (ARMA): This combines both AR and MA models. ARMA(p,q) is \(\begin{align}x_t = &\beta_0 + \beta_1 x_{t-1} + \beta_2 x_{t-2} + \ldots + \beta_p x_{t-p} + \\ &\theta_0 + \theta_1 \epsilon_{t-1} + \theta_2 \epsilon_{t-2} + \ldots + \theta_q \epsilon_{t-q} \end{align}\)
- Auto-Regressive Integrated Moving Average (ARIMA): The above models can't handle non-stationary data. ARIMA(p,d,q) handles the conversion of non-stationary data to stationary: I refers to the use of differencing, p is lag order, d is degree of differencing, q is averaging window size.
What are autocorrelations in the context of time series analysis?
Time series plot of a sine wave and its correlogram. Source: Holmes et al. 2020, fig. 4.13.
Autocorrelations are numerical values that indicate how a data series is related to itself over time. It measures how strongly data values separated by a specified number of periods (called the lag) are correlated to each other. Auto-Correlation Function (ACF) defines autocorrelation for a specific lag.
Autocorrelations may range from +1 to -1. A value close to +1 indicates a high positive correlation while a value close to -1 implies a high negative correlation. These measures are most often evaluated through graphical plots called correlogram. A correlogram plots the auto-correlation values against lag. Such a plot helps us choose the order parameters for ARIMA model.
In addition to suggesting the order of differencing, ACF plots can help in determining the order of MA(q) models. Partial Auto-Correlation Function (PACF) correlates a variable with its lags, conditioned on the values in between. PACF plots are useful when determining the order of AR(p) models.
How do I build a time series model?
Three-stage Box-Jenkins methodology. Source: San-Juan et al. 2012, fig. 4.
ARMA or ARIMA are standard statistical models for time series forecast and analysis. Along with its development, the authors Box and Jenkins also suggested a process for identifying, estimating, and checking models. This process is now referred to as the Box-Jenkins (BJ) Method. It's an iterative approach that consists of the following three steps:
- Identification: Involves determining the order (p, d, q) of the model in order to capture the salient dynamic features of the data. This mainly leads to use graphical procedures such as time series plot, ACF, PACF, etc.
- Estimation: The estimation procedure involves using the model with p, d and q orders to fit the actual time series and minimize the loss or error term.
- Diagnostic checking: Evaluate the fitted model in the context of the available data and check for areas where the model may be improved.
How do we handle random variations in data?
Exponential smoothing explained with an example. Source: Emmanuel 2015.
Whenever we collect data over some period of time there's some form of random variations. Smoothing is the technique to reduce the effect of such variations and thereby bring out trends and cyclic components. There are two distinct groups of smoothing methods:
Averaging Methods
- Moving Average: we forecast the next value by averaging 'p' previous values.
- Weighted Average: we assign weights to each of the previous observations and then take the average. The sum of all the weights should be equal to 1.
Exponential Smoothing Methods: It assigns exponentially decreasing weights as the observation get older. In other words, recent observations are given relatively more weight in forecasting than the older observations. There are several varieties of this method:
- Simple exponential smoothing for series with no trend and seasonality: the basic formula for simple exponential smoothing is \(S_{t+1} = \alpha y_t + (1-\alpha)S_t, 0 < \alpha <=1, t > 0\)
- Double exponential smoothing for series with a trend and no seasonality.
- Triple exponential smoothing for series with both trend and seasonality.

Milestones

1662

Births and deaths for the year 1605-1606. Source: Morris et al. 1759.

John Graunt publishes a book titled Natural and Political Observations … Made upon the Bills of Mortality. The book contains the number of births and deaths recorded weekly for many years starting from early 17th century. It also includes the probability that a person dies by a certain age. Such tables of life expectancy later become known as actuarial tables. This is one of the earliest examples of time series style of thinking applied to medicine.

1861

Robert FitzRoy coins the term "weather forecast". Such forecasts start appearing in The Times from August 1861. Atmospheric data collected from many parts of England are relayed by telegraph to London, where FitzRoy analyzes the data (along with past data) to make forecasts. His forecasts forewarn sailors of impending storms and directly contribute to reducing shipwrecks.

1887

Augustus D. Waller, a doctor by profession, records what is possibly the first electrocardiogram (ECG). As practical ECG machines arrive in the early 20th century, TSA is applied to estimate the risk of cardiac arrests. In the 1920s, electroencephalogram (EEG) is introduced to measure brain activity. This gives doctors more opportunities to apply TSA.

1927

Yule applies harmonic analysis and regression to determine the periodicity of sunspots. He separates periodicity from superposed fluctuations and disturbances. Yule's work starts the use of statistics in TSA. In general, application of autoregressive models is due to Yule and Walker in the 1920s and 1930s.

1960

Muth establishes a statistical foundation for Simple Exponential Smoothing (SES) by showing that it's optimal for a random walk plus noise. Further advances to exponential smoothing happen in 1985: Gardner gives a comprehensive review of the topic; Snyder links SES to innovation state space model, where innovation refers to the forecast error.

1969

Bates and Granger show that by combining forecasts from two independent models, we can achieve a lower mean squared error. They also propose how to derive the weights in which the two original forecasts are to be combined. The same year, David Reid publishes his PhD thesis that's probably the first non-trivial study of time series forecast accuracy.

1970

Box and Jenkins publish a book titled Time Series Analysis: Forecasting and Control. This work popularizes the ARIMA model with an iterative modelling procedure. Once a suitable model is built, forecasts are conditional expectations of the model using mean squared error (MSE) criterion. In time, this model is called the Box-Jenkins Model.

1978

Through the 1970s, many statisticians continue to believe that there's a single model waiting to be discovered that can best fit any given time series data. However, empirical evidence show that an ensemble of models give better results. These debates cause George Box to famously remark,

All models are wrong but some are useful

1979

Makridakis and Hibon use 111 time series data and compare the performance of many forecasting methods. Their results claim that a combination of simpler methods can outperform a sophisticated method. This causes a stir within the research community. To prove the point, Makridakis and Hibon organize a competition, called M-Competition starting from 1982: 1001 series (1982), 29 series (1993), 3003 series (2000), 100,000 series (2018), and 42,840 series (2020).

1980

Although Kalman filtering was invented in the 1960, it's only in the 1980s that statisticians use state-space parameterization and Kalman filtering for TSA. The recursive form of the filter enables efficient forecasting. An ARIMA model can be put into a state-space model. Similarly, a state-space model suggests an ARIMA model.

1982

Robert Engle develops the Autoregressive Conditional Heteroskedasticity (ARCH) model to account for time-varying volatility observed in economics time series data. In 1986, his student Time Bollerslev develops the Generalized ARCH (GARCH) model. In general, variance of the error term depends on past error terms and their variance. ARCH and GARCH are non-linear generalizations of the Box-Jenkins model.

1987

Engle and Grange propose cointegration as a technique for multivariate TSA. Cointegration is a linear combination of marginally unit-root nonstationary series to yield a stationary series. This becomes a popular method in econometrics due to long-term relationship between variables. An earlier method of multivariable TSA is Vector Autoregressive (VAR) model.

1998

Zhang et al. publish a survey of neural networks applied to forecasting. They note an early work by Lapedes and Farber (1987) who proposed multi-layer feedforward networks. However, the use of ANNs for forecasting happens mostly in the 1990s. In general, feedforward or recurrent networks are preferred. At most two hidden layers are used. Number of input nodes correspond to the number of lagged observations needed to discover patterns in data. Number of output nodes correspond to the forecasting horizon.

2019

Sánchez-Sánchez et al. highlight many issues in using neural networks for TSA. There's no clarity on how to select the number of input or hidden neurons. There's no guidance on how best to partition the data into training and validation sets. It's not clear if data needs to be preprocessed or if seasonal/trend components have to be removed before data goes into the model. In 2018, Hyndman commented that neural networks perform poorly due to insufficient data. This is likely to change as data becomes more easily available.