Next Article in Journal
Modeling Volatility Characteristics of Epileptic EEGs using GARCH Models
Previous Article in Journal
Welcome to SIGNALS: A New Open-Access Scientific Journal on Signal Analysis, Retrieval and Processing
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Effect of Data Transformation on Singular Spectrum Analysis for Forecasting

by
Hossein Hassani
1,*,†,
Mohammad Reza Yeganegi
2,†,
Atikur Khan
3,† and
Emmanuel Sirimal Silva
4,†
1
Department of Business & Management, Webster Vienna Private University, 1020 Vienna, Austria
2
Department of Accounting, Islamic Azad University, Central Tehran Branch, Tehran 477893855, Iran
3
Qantares, 97 Broadway, Nedlands 6009 (Perth), Western Australia, Australia
4
Centre for Fashion Business and Innovation Research, Fashion Business School, London College of Fashion, University of the Arts London, London W1G 0BJ, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Signals 2020, 1(1), 4-25; https://doi.org/10.3390/signals1010002
Submission received: 14 January 2020 / Revised: 12 April 2020 / Accepted: 17 April 2020 / Published: 7 May 2020

Abstract

:
Data transformations are an important tool for improving the accuracy of forecasts from time series models. Historically, the impact of transformations have been evaluated on the forecasting performance of different parametric and nonparametric forecasting models. However, researchers have overlooked the evaluation of this factor in relation to the nonparametric forecasting model of Singular Spectrum Analysis (SSA). In this paper, we focus entirely on the impact of data transformations in the form of standardisation and logarithmic transformations on the forecasting performance of SSA when applied to 100 different datasets with different characteristics. Our findings indicate that data transformations have a significant impact on SSA forecasts at particular sampling frequencies.

1. Introduction

Amidst the emergence of Big Data and Data Mining techniques, forecasting continues to remain an important tool for planning and resource allocation in all industries. Accordingly, researchers, academics, and forecasters alike invest time and resources into methods for improving the accuracy of forecasts from both parametric and nonparametric forecasting models. One approach to improving the accuracy of forecasts is via data transformations prior to fitting time series models. For example, it is noted in [1] that data transformations can simplify the forecasting task, whilst evidence from other research indicates that, in economic analysis, taking logarithms can provide forecast improvements if it results in stabilising the variance of a series [2]. However, studies also indicate that data transformations will not always improve forecasts [3] and that they could complicate time series analysis models [4,5].
In fact, a key challenge for forecasting under data transformation is to transform the data back to its original scale, a process which could result in a forecasting bias [6,7]. Historically, most studies have focused on the impact of data transformations on parametric models such as Regression and Autoregressive Integrated Moving Average (ARIMA) models [8,9]. More recently, authors have resorted to evaluating the impact of data transformations on several other forecasting models [10,11], further highlighting the relevance and importance of the topic. Our interest is focused on the evaluation of the impact of data transformations on a time series analysis and forecasting technique called Singular Spectrum Analysis (SSA).
In brief, the SSA technique is a popular denoising, forecasting, and missing value prediction technique with both univariate and multivariate capabilities [12,13]. Recently, its diverse applications have focused on forecasting solutions for varied industries and fields, from tourism [14,15] and economics [16,17,18] to fashion [19], climate [20,21], and several other sectors [22,23,24]. Regardless of its wide and varied applications, researchers have yet to explore the effect of data transformations on the forecasting performance of this nonparametric forecasting technique. Previously, in [25], the authors evaluated the forecasting performance of the two basic SSA algorithms under different data structures. However, their work did not extend to evaluating the impact of data transformations to provide empirical evidence for future research. Accordingly, through this paper, we aim to contribute to the existing research gap by studying the effects of different data transformation options on the forecasting behaviour of SSA.
Logarithmic transformation is the most commonly used transformation in time series analysis. It has been used to convert multiplicative time series structures to additive structures or to reduce the time series skewness volatility and increase stability [2,26]. The autocorrelation structure in the time series may change under different transformations that may affect the model, and different transformations may result in different specifications for ARIMA models [6,27]. Like ARIMA models, SSA too can be greatly influenced by transformations. For instance, if data transformation makes noise uncorrelated or reduces the complexity of the time series, it can improve SSA performance [21,26]. As data standardisation and logarithmic transformations are the easiest in terms of interpretability and back-transformation to the original scale, we explore the effect of these data transformations on the forecasting performance of SSA.
The remainder of this paper is organised as follows. In Section 2, we provide a detailed exposition of SSA and its recurrent and vector forecasting algorithms. In Section 3, we present data transformation techniques and their effect on forecasting accuracy. Procedures for examining the effect of transformation based on different characteristics of time series are presented in Section 4. In Section 5, we analyse different datasets of varied characteristics and present our results for an evidence-based exploration of the effect of data transformations on SSA forecasts. Finally, we present our concluding remarks in Section 6.

2. SSA Forecasting

There are two different algorithms for forecasting with SSA, namely recurrent forecasting and vector forecasting [12,28]. Those interested in a comparison of the performance of both algorithms are referred to [25]. Both of these forecasting algorithms require that one follows two common steps of SSA, the decomposition and reconstruction of a time series [12,28]. In what follows, we provide a brief description of forecasting processes in SSA.

2.1. Decomposition and Reconstruction of Time Series

In SSA, we embed the time series { x 1 , x 2 , , x N } into a high-dimensional space by constructing a Hankel structured trajectory matrix of the form:
X = ( x 1 x 2 x 3 x n x 2 x 3 x 4 x n + 1 x m x m + 1 x m + 2 x N ) = [ x 1     x i x n ] ,
where m is the window length, the m lagged vector x i = ( x i , x i + 1 , , x i + m 1 ) is the ith column of the trajectory matrix X , n = N m + 1 , and m n .
The singular value decomposition (SVD) of the trajectory matrix X can be expressed as
X = S k + E k = j = 1 k λ j u j v j + j = k + 1 m λ j u j v j
where u j is the jth eigenvector of X X corresponding to the eigenvalue λ j and v j = X u j / λ j .
If k is the number of signal components, S k = j = 1 k λ j u j v j represents a matrix of signal, and E k = j = k + 1 m λ j u j v j is the matrix of noise. We apply the diagonal averaging procedure to S k to reconstruct the signal series x ˜ t such that the observed series can be expressed as
x t = x ˜ t + e ˜ t ,
where x ˜ t is the less noisy, filtered series. A detailed explanation of decomposition in Equation (3) can be found in [28,29].
To construct the trajectory matrix X in Equation (1) and to conduct the SVD in Equation (2), we have to select the Window Length m and the number of signal components k. Since our aim is not to demonstrate the selection of SSA choices (m and k), we opt not to reproduce the selection procedures for SSA choices, as these are already covered in depth in [12,28]. As our interest is in examining the effect of transformation on the forecasting performance of SSA, we select m and k such that the Root Mean Squared Error (RMSE) in forecasting is minimised.

2.2. Recurrent Forecasting

Recurrent forecasting in SSA is also known as R-forecasting, and the findings in [25] indicate that R-forecasting is best when dealing with large samples. If u j = ( u 1 j , , u ( m 1 ) j ) is the vector of the first m 1 elements of the jth eigenvector u j , and u m j is the last element of u j . The coefficients of linear recurrent equation can be estimated as
a = ( a ( m 1 ) , , a 1 ) = 1 1 j = 1 k u m j 2 j = 1 k u m j u j .
With the parameters in Equation (4), a linear recurrent equation of the form
x ˜ t = i = 1 m 1 a ( m i ) x ˜ t m + i
is used to obtain a one-step-ahead recursive forecast [29]. This linear recurrent formula in Equation (5) is used to forecast the signal at time t + 1 given the signal at time t , t 1 , , t m + 2 [28] (Section 2.1, Equations (1)–(6)), and the one-step-ahead recursive forecast of x N + j is
x ^ N + j = { i = 1 j 1 a i x ^ N + j i + i = 1 m j a m i x ˜ N + j m + i    for   j m 1 ; i = 1 m 1 a i x ^ N + j i    for   j > m 1 .
We apply the recursive forecasting method in Equation (6) to obtain a one-step-ahead forecast.

2.3. Vector Forecasting

In contrast, the SSA Vector forecasting algorithm has proven to be more robust than the R-forecasting algorithm in most cases [25]. Let us define U k = [ u 1     u k ] as the ( m 1 ) × k matrix consisting of the first m 1 elements of k eigenvectors. The vector forecasting algorithm computes m lagged vectors z i ^ and constructs a trajectory matrix Z = [ z ^ 1     z ^ n   z ^ n + 1     z ^ n + h ] such that
z ^ i = { s i for   i = 1 , 2 , , n ; ( ( U k U k + ( 1 j = 1 k u m j 2 ) a a ) z ^ i 1 a z ^ i 1 ) for   i = n + 1 , , n + h .
where s i is the ith column of the reconstructed signal matrix S k = j = 1 k λ j u j v j , and z ^ i is the last m 1 elements of the vector z ^ i .
After a diagonal averaging of the matrix Z = [ z ^ 1     z ^ n   z ^ n + 1     z ^ n + h ] constructed by employing Equation (7), we obtain a time series { z ^ 1 , , z ^ N , z ^ N + 1 , , z ^ N + h } , as has also been explained in [28] (Section 2.3). Thus, x ^ N + j = z ^ N + j produces a forecast corresponding to x N + j for j = 1 , , h .

3. Transformation of Time Series

Data transformation is useful when the variation increases or decreases with the level of the series [1]. Whilst logarithmic transformation and standardisation are the most commonly used data transformation techniques in time series analysis, it is noteworthy that there are other transformations from the family of power transformation such as square root and cube root transformations. However, the interpretability is not as simple and common as that for standardisation and logarithmic transformation.

3.1. Standardisation

Standardisation of time series { x t } is formulated as
y t = x t x ¯ σ x ,
where x ¯ and σ x are the mean and standard deviation of the series { x t } , respectively. Data standardisation is another common data transformation in preprocessing. Standardisation is mostly common in machine learning techniques to reduce training time and error. In time series forecasting, standardisation has proven advantages when we are using machine learning algorithms (e.g., neural networks and deep neural networks) [30]. In terms of SSA, the theoretical literature does not investigate the effect of standardisation on SSA forecasts in detail. However, in Golyandina and Zhigljavski [26], the authors addressed the effect of centering the time series as preprocessing. In theory, if the time series can be expressed as an oscillation around a linear trend, centering will increase the SSA’s accuracy [26].

3.2. Logarithmic Transformation

In this paper, the following logarithmic transformation is applied on time series { x t } :
y t = log ( C + x t ) ,
where C is a constant value, large enough to guarantee that the term inside the logarithm is positive. As mentioned before, log-transform is a common preprocessing to handle variance instability or right skewness. Furthermore, one may use log-transform as a form of preprocessing to convert a time series with a multiplicative structure to an additive one. Given that SSA can be applied to time series with both additive and multiplicative structures, it does not necessarily need log-transform pre-processing [26]. However, the authors in Golyandina and Zhigljavski [26] showed that using log-transform could affect SSA’s forecasting accuracy. In fact, SSA’s forecasting accuracy will increase if the rank of a transformed series is smaller than the original one.

4. Comparison between Transformations

Time series with different characteristics will behave differently after transformation. For instance, forecasting accuracy in time series, with positive skewness, non-stationarity, and non-normality, may improve with logarithmic transformation. Furthermore, in time series with large observations or large variance, standardisation can improve the forecasting accuracy. Sampling frequency is another potential factor affecting forecasting accuracy. Time series with high sampling frequency (e.g., hourly or daily) usually have an oscillation frequency close to its noise frequency and consequently show instable and noisy behaviour. On the other hand, time series with larger sampling frequency are smoother. These characteristics of time series may affect forecasting and accuracy as well. As such, to investigate the practical effect of data transformation in SSA forecasting, we should consider “Sampling Frequency,” “Skewness,” “Normality,” and “Stationarity” as control factors.
To observe the effectiveness of data transformation prior to the application of SSA, we may compare the forecasting performance of SSA under different transformations and control factors: firstly, by comparing the Root Mean Squared Forecast Error (RMSFE), and secondly, by employing a nonparametric test to examine the treatment effect (data transformation).

4.1. Root Mean Squared Forecast Error (RMSFE)

The most commonly adopted approach for comparing the predictive accuracy between forecasts is to compute and compare the RMSFE from out of sample forecasts. The RMSFE can be defined as
R M S E h = 1 h t = N + 1 N + h ( x t x ^ t ) 2 ,
where h is the forecast horizon, N is the number of observations, x t is observed value of time series, and x ^ t is the forecasted value.
The application of data transformation prior to forecasting with SSA may significantly affect the forecasting outcome and the affect may vary based on the properties of a time series. Thus, we need to examine the effect of data transformation on RMSFE along with the differing properties of time series. Comparisons between the RMSFE of the original and transformed time series can be used to learn about the forecasting performance of a model for a given time series. However, comparison of RMSFE for a pool of time series with different characteristics is not straightforward. We compute R M S E h for h = 1 , 3 , 6 , 12 ( h = 1 for a short-term forecast, h = 12 for a long-term forecast, and h = 3 , 6 as a medium-term forecasting horizon) for each of the time series in the pool and examine the effect of transformation by using statistical tests.

4.2. Nonparametric Repeated Measure Factorial Test

Treatment effects in the presence of factors can be examined by employing the nonparametric repeated measure factorial test [31,32] for a pool of time series of different characteristics. Thus, the effect of data transformation (treatment) can be examined by using this test under different characteristics of a time series.
Let us assume that we have K time series in the pool with series code A k , k = 1 , , K and for each of the series R M S E h for h = 1 , 3 , 6 , 12 are computed. If the interest lies on exploring the effect of transformation for the skewness property of time series, we essentially perform the test for treatment effect (transformation) for categories of skewness properties of these time series. There are three factor levels of the factor Skewness, namely Skew Negative, Skew Positive, and Skew Symmetric. Similarly, we will have two levels for the factor Normality (Yes = normal; No = not normal) and two levels for the factor Stationarity (Yes = stationary; No = nonstationary). To test the effect of transformation (No transformation, Standardisation, and Logarithmic transformation), we follow the procedures described below.
First, we learn some basic characteristics of a time series such as normality, stationarity, skewness, and frequency. For example, the frequency of a time series can be learnt by examining the time of measurement: hourly, daily, weekly, monthly, or annually. We also classify time series into different categories via a series of statistical tests such as the Jarque-Bera test for normality [33], the KPSS test for stationarity [34], and the D’Agostino test for skewness [35].
Secondly, the nonparametric repeated measure factorial test [31,32] is used to test the effect of the transformation on RMSFE, across different categories where categories are defined based on Frequency, Normality, Skewness, and Stationarity.

5. Data Analysis

We used the same set of time series employed by Ghodsi et al. [25] to test the effect of data transformation on SSA forecasting accuracy, with different characteristics. The dataset contains 100 real time series with different sampling frequencies and stationarity, normality, and skewness characteristics, representing various fields and categories, obtained via the Data Market (http://datamarket.com). Table 1 presents the number of time series with each feature. It is evident that the real data includes data recorded at varying frequencies (annual, monthly, weekly, daily, and hourly) alongside varying distributions (normal distribution, skewed, stationary, and non-stationary). Interestingly, the majority of the data are non-stationary overtime, which resonates with expectations within real-life scenarios.
The name and description of each time series and their codes assigned to improve presentation are presented in Table A1. Table A2 presents descriptive statistics for all time series to enable the reader to obtain a rich understanding of the nature of the real data. This also includes skewness statistics, and results from the normality (Shapiro-Wilk) and stationarity (Augmented Dickey-Fuller) tests. As visible in Table A1, the data comes from different fields such as energy, finance, health, tourism, housing market, crime, agriculture, economics, chemistry, ecology, and production.
Figure 1 shows the time series for a selection of 9/100 series used in this study. This enables the reader to obtain a further understanding of the different structures underlying the data considered in the analysis. For example, A007 is representative of an asymmetric non-stationary time series for the labour market in a U.S. county. This monthly series shows seasonality with an increasing non-linear trend. In contrast, A022 is related to a meteorological variable that is asymmetric, yet stationary and highly seasonal in nature. An example of a time series that is both asymmetric and non-stationary is A038, which represents the production of silver. Here, structural breaks are visible throughout. A055 is an annual time series, which is stationary and asymmetric, and relates to the production of coloured fox fur. An example of a quarterly time series representing the energy sector is shown via A061. This time series is non-stationary and asymmetric with a non-linear trend and an increasing seasonality over time. Another example focuses on the airline industry (A075) and is also asymmetric and non-stationary in nature. It appears to showcase a linear and increasing trend along with seasonality. A skewed and non-stationary sales series is shown via A081, with the trend indicating increasing seasonality with major drops in the time series between each season. A time series for house sales (A082) can be found to be normally distributed and non-stationary over time. It also shows a slightly curved non-linear trend and a sine wave that is disrupted by noise. Finally, the labour market is drawn on again via A094, but this is an example of a time series affected by several structural breaks leading to a non-stationary, asymmetric series, which also has seasonal periods and a clear non-linear trend.
R packages “Rssa” [36,37,38] and “nparLD” [39] are employed to implement SSA forecasting and the nonparametric repeated measure factorial test, respectively. We apply SSA to three versions of a dataset: a dataset without any transformation, a standardised dataset, and a log-transformed dataset. For each of the three datasets, we obtain RMSFE from out-of-sample forecasting at forecast horizons h = 1 , 3 , 6 , 12 . It is noteworthy that our aim in this paper is to examine the effect of transformation in SSA forecasting. Thus, we consider the best forecast based on the RMSFE of the last 12 data points regardless of whether the forecast is from the recurrent or vector-based approach.
We also know that the window length m, the number of components k, and the forecasting methods (recurrent and vector) affect the forecasting outcome. Thus, we adopt a computationally intensive approach by considering combinations of m and k, and methods that provide the minimum RMSFE for the out-of-sample forecast for the last 12 data points. The RMSFEs obtained from the computationally intensive approach are given in Table A3, Table A4, Table A5 and Table A6.
Given that the best forecasting results are achieved by util ising a computationally intensive approach, we seek to identify the factors that can affect the RMSFE. In order to address this, we employ statistical tests described in Section 4.2. For each of the series with RMSFE reported in Table A3, Table A4, Table A5 and Table A6, we examine the characteristics of the time series by employing a statistical test, as described in Section 4.2. At this stage, we are ready with the inputs for nonparametric repeated measure factorial test to conduct testing on the treatment effect (data transformation) under different characteristics of these time series. Results obtained from the Wald type tests are provided in Table 2.
Based on the Wald-type test results in Table 2, we may conclude that, at the α = 0.05 significance level,
  • normality does not affect SSA forecasting performance;
  • stationarity affects SSA forecasting performance in long-term forecasting (h = 12) but not at shorter horizons;
  • skewness and sampling (observation) frequency affect SSA forecasting performance;
  • transformation does not affect SSA forecasting performance, but the interaction between sampling frequencies and transformation is significant, which means the SSA performance is affected by transformation at some sampling frequencies.
The above findings are important in the practice for several reasons. First, in the real world, it is well known that most time series do not meet the assumption of normality. However, as the effect of normality and its interactions with transformations are not significant, when faced with normally distributed data, our findings indicate that there is no impact on the forecasting accuracy of SSA with or without data transformations. Furthermore, these findings also indicate that data transformations do not improve the forecast accuracy in non-normal data either. Secondly, we find that, when series are stationary, it affects the long-term forecasting accuracy of SSA. However, when generating short-term forecasts, the forecasting accuracy of SSA is not affected by stationarity. Thirdly, in reality, as most time series are skewed and increasingly found at varying frequencies (especially following the emergence of Big Data), these findings show that forecasters should remember that varying skewness and frequency of data are features indicative of the need for careful exploration of the use of SSA as the forecasts are sensitive to these features. In general, transformations are not required when forecasting with SSA, as there is no evidence of transformations impacting the SSA forecasting performance; however, there could be a significant impact at certain sampling frequencies. This indicates that, when modelling data with different frequencies, the sensitivity of SSA forecasts to such frequencies could potentially be controlled by transforming the input data.
Since the interaction between sampling frequency and transformation is significant, we explore the relative effect of frequencies on RMSFE. Figure 2 shows the effect plot of treatment (transformation) for different forecast horizons h = 1 , 3 , 6 , 12 .
To explore the relative effects of sampling frequency for different forecast horizons, we plot the relative effect of frequencies in Figure 3 and Figure 4.
Sampling frequencies under investigation are hourly (F H), daily (F D), weekly (F W), monthly (F M), and annual (F A). When the relative effect plots in Figure 3 and Figure 4 are compared with the effect plots in Figure 2, we can evaluate how the hourly (F H), weekly (F W), quarterly (F Q), and annual (F A) sampling frequencies are affecting the forecasting performance of SSA. Moreover, the change in shape of the transform’s relative effects (e.g., see the difference between the shapes of “F Q” and “F H” lines in Figure 3 and Figure 4) suggests an interaction between transformation and sampling frequency.
We analyse the results by forecasting horizon. It can be seen in Figure 3 that, in very short-term forecasting ( h = 1 ), the standardisation produces a comparatively large RMSFE in quarterly frequencies, while the log transformation reports a slightly larger RMSFE at daily, quarterly, hourly, and annual frequencies. This indicates that users should certainly avoid transforming data with quarterly frequencies when forecasting at h = 1 step ahead with SSA. In the short-term forecasting horizon ( h = 3 ) (see Figure 3), the smallest RMSFE belongs to standardisation for monthly frequencies, while standardisation has the largest RMSFE at quarterly frequencies. In mid- and long-term forecasting horizons ( h = 6 and 12), which are visible in Figure 4, the following can be seen. At h = 6 steps ahead, standardisation produces the lowest RMSFE at monthly sampling frequencies, whilst it has the largest RMSFE in quarterly and weekly time series data. The log transformation produces higher RMSFEs at daily, hourly, and annual frequencies. Accordingly, the only instance when standardisation could produce better forecasts with SSA at this horizon is when faced with monthly data. At h = 12 steps ahead, standardisation leads to better forecasts at daily frequencies, whilst log transformations can provide better forecasts with SSA at weekly frequencies.
Finally, these findings indicate that standardisation should only be used to transform data when forecasting with SSA at h = 12 steps ahead at the daily frequency, at h = 3 or h = 6 steps ahead when dealing with a monthly frequency, and at h = 1 step ahead when forecasting data with monthly or weekly frequencies. At the same time, standardisation should not be employed when forecasting quarterly data at any frequency, as it worsens the forecasting accuracy by comparatively larger margins. Interestingly, log transformations are only suggested when dealing with forecasting weekly data at h = 6 or h = 12 steps ahead. In the majority of the instances, SSA is able to provide superior forecasts without the need for data transformations when compared with time series following varied frequencies.

6. Concluding Remarks

This paper focused on evaluating the impact of data transformations on the forecasting performance of SSA, a nonparametric filtering and forecasting technique. Following a concise introduction, the paper introduces the SSA forecasting approaches followed by the transformation techniques considered here. Regardless of its popularity (and in contrast to other methods such as ARIMA and neural networks), there has been no empirical attempt to quantify the impact of data transformations on the forecasting capabilities of SSA. Accordingly, we consider the impact of standardisation and logarithmic transformations on the forecasting performance of both vector and recurrent forecasting in SSA. In order to ensure robustness within the analysis, we not only compare the forecasts using the RMSFE but also rely on a nonparametric repeated measure factorial test.
The forecast evaluation is based on 100 time series with varying characteristics in terms of frequencies, skewness, normality, and stationarity. Following the application of SSA to three versions of the same dataset, i.e. the original data, standardised data, and log transformed data, we generate out-of-sample forecasts at horizons of 1, 3, 6, and 12 steps ahead. Our findings indicate that, in general, data transformations do not affect SSA forecasts. However, the interaction between sampling frequency and transformations are found to be significant, indicating that data transformations are significant at certain sampling frequencies.
According to the results of this study, in time series with a higher sampling frequency (i.e. daily or hourly data), standardisation can improve SSA forecasting accuracy in the very long term at daily frequencies only. On the other hand, in time series with low sampling frequencies (i.e. quarterly and annual), neither logarithmic transformation nor standardisation is suitable across all horizons. In other time series’ sampling frequencies (weekly and monthly), data transformation with standardisation can affect all forecasting horizons (except h = 12 ) when faced with monthly data and at h = 1 step ahead when faced with weekly data. The results also show improvement in forecasting accuracy in weekly data with logarithmic transformations at h = 6 and h = 12 steps ahead. These findings provide additional guidance to forecasters, researchers, and practitioners alike in terms of improving the accuracy of forecasts when modelling data with SSA.
Future research should consider the relative gains of suggested data transformations at different sampling frequencies in relation to other benchmark forecasting models as well as theories explaining the mechanism of these effects in detail. Moreover, the development of automated SSA forecasting algorithms could be informed by the findings of this paper to ensure that data transformations are conducted prior to forecasting at selected sample frequencies.

Author Contributions

Conceptualisation, H.H.; methodology, H.H. and M.R.Y.; software, A.K. and M.R.Y.; validation, A.K. and M.R.Y.; formal analysis, M.R.Y. and E.S.S.; investigation, H.H.; data curation, E.S.S.; writing—original draft preparation, all authors contributed equally; writing—review and editing, all authors contributed equally; visualisation, M.R.Y. and A.K.; supervision, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare that there is no conflict of interest.

Appendix A

Table A1. List of 100 real time series.
Table A1. List of 100 real time series.
CodeName of Time Series
A001US Economic Statistics: Capacity Utilization.
A002Births by months 1853–2012.
A003Electricity: electricity net generation: total (all sectors).
A004Energy prices: average retail prices of electricity.
A005Coloured fox fur returns, Hopedale, Labrador, 1834–1925.
A006Alcohol demand (log spirits consumption per head), UK, 1870–1938.
A007Monthly Sutter county workforce, Jan. 1946–Dec. 1966 priesema (1979).
A008Exchange rates—monthly data: Japanese yen.
A009Exchange rates—monthly data: Pound sterling.
A010Exchange rates—monthly data: Romanian leu.
A011HICP (2005 = 100)—monthly data (annual rate of change): European Union (27 countries).
A012HICP (2005 = 100)—monthly data (annual rate of change): UK.
A013HICP (2005 = 100)—monthly data (annual rate of change): US.
A014New Homes Sold in the United States.
A015Goods, Value of Exports for United States.
A016Goods, Value of Imports for United States.
A017Market capitalisation—monthly data: UK.
A018Market capitalisation—monthly data: US.
A019Average monthly temperatures across the world (1701–2011): Bournemouth.
A020Average monthly temperatures across the world (1701–2011): Eskdalemuir.
A021Average monthly temperatures across the world (1701–2011): Lerwick.
A022Average monthly temperatures across the world (1701–2011): Valley.
A023Average monthly temperatures across the world (1701–2011): Death Valley.
A024US Economic Statistics: Personal Savings Rate.
A025Economic Policy Uncertainty Index for United States (Monthly Data).
A026Coal Production, Total for Germany.
A027Coke, Beehive Production (by Statistical Area).
A028Monthly champagne sales (in 1000’s) (p. 273: Montgomery: Fore. and T.S.).
A029Domestic Auto Production.
A030Index of Cotton Textile Production for France.
A031Index of Production of Chemical Products (by Statistical Area).
A032Index of Production of Leather Products (by Statistical Area).
A033Index of Production of Metal Products (by Statistical Area).
A034Index of Production of Mineral Fuels (by Statistical Area).
A035Industrial Production Index.
A036Knit Underwear Production (by Statistical Area).
A037Lubricants Production for United States.
A038Silver Production for United States.
A039Slab Zinc Production (by Statistical Area).
A040Annual domestic sales and advertising of Lydia E, Pinkham Medicine, 1907 to 1960.
A041Chemical concentration readings.
A042Monthly Boston armed robberies January 1966-October 1975 Deutsch and Alt (1977).
A043Monthly Minneapolis public drunkenness intakes Jan.’66–Jul’78.
A044Motor vehicles engines and parts/CPI, Canada, 1976–1991.
A045Methane input into gas furnace: cu. ft/min. Sampling interval 9 s.
A046Monthly civilian population of Australia: thousand persons. February 1978–April 1991.
A047Daily total female births in California, 1959.
A048Annual immigration into the United States: thousands. 1820–1962.
A049Monthly New York City births: unknown scale. January 1946–December 1959.
A050Estimated quarterly resident population of Australia: thousand persons.
A051Annual Swedish population rates (1000’s) 1750–1849 Thomas (1940).
A052Industry sales for printing and writing paper (in Thousands of French francs).
A053Coloured fox fur production, Hebron, Labrador, 1834–1925.
A054Coloured fox fur production, Nain, Labrador, 1834–1925.
A055Coloured fox fur production, oak, Labrador, 1834–1925.
A056Monthly average daily calls to directory assistance Jan.’62–Dec’76.
A057Monthly Av. residential electricity usage Iowa city 1971–1979.
A058Montly av. residential gas usage Iowa (cubic feet) * 100 ’71–’79.
A059Monthly precipitation (in mm), January 1983–April 1994. London, United Kingdom.
A060Monthly water usage (ml/day), London Ontario, 1966–1988.
A061Quarterly production of Gas in Australia: million megajoules. Includes natural gas from July 1989. March 1956–September 1994.
A062Residential water consumption, Jan 1983–April 1994. London, United Kingdom.
A063The total generation of electricity by the U.S. electric industry (monthly data for the period Jan. 1985–Oct. 1996).
A064Total number of water consumers, January 1983–April 1994. London, United Kingdom.
A065Monthly milk production: pounds per cow. January 62–December 75.
A066Monthly milk production: pounds per cow. January 62–December 75, adjusted for month length.
A067Monthly total number of pigs slaughtered in Victoria. January 1980–August 1995.
A068Monthly demand repair parts large/heavy equip. Iowa 1972–1979.
A069Number of deaths and serious injuries in UK road accidents each month. January 1969–December 1984.
A070Passenger miles (Mil) flown domestic U.K. Jul. ’62–May ’72.
A071Monthly hotel occupied room av. ’63–’76 B.L.Bowerman et al.
A072Weekday bus ridership, Iowa city, Iowa (monthly averages).
A073Portland Oregon average monthly bus ridership (/100).
A074U.S. airlines: monthly aircraft miles flown (Millions) 1963–1970.
A075International airline passengers: monthly totals in thousands. January 49–December 60.
A076Sales: souvenir shop at a beach resort town in Queensland, Australia. January 1987–December 1993.
A077Der Stern: Weekly sales of wholesalers A, ’71–’72.
A078Der Stern: Weekly sales of wholesalers B, ’71–’72’
A079Der Stern: Weekly sales of wholesalers ’71–’72.
A080Monthly sales of U.S. houses (thousands) 1965–1975.
A081CFE specialty writing papers monthly sales.
A082Monthly sales of new one-family houses sold in USA since 1973.
A083Wisconsin employment time series, food and kindred products, January 1961–October 1975.
A084Monthly gasoline demand Ontario gallon millions 1960–1975.
A085Wisconsin employment time series, fabricated metals, January 1961–October 1975.
A086Monthly empolyees wholes./retail Wisconsin ’61–’75 R.B.Miller.
A087US monthly sales of chemical related products. January 1971–December 1991.
A088US monthly sales of coal related products. January 1971–December 1991.
A089US monthly sales of petrol related products. January 1971–December 1991.
A090US monthly sales of vehicle related products. January 1971–December 1991.
A091Civilian labour force in Australia each month: thousands of persons. February 1978–August 1995.
A092Numbers on Unemployment Benefits in Australia: monthly January 1956–July 1992.
A093Monthly Canadian total unemployment figures (thousands) 1956–1975.
A094Monthly number of unemployed persons in Australia: thousands. February 1978–April 1991.
A095Monthly U.S. female (20 years and over) unemployment figures 1948–1981.
A096Monthly U.S. female (16–19 years) unemployment figures (thousands) 1948–1981.
A097Monthly unemployment figures in West Germany 1948–1980.
A098Monthly U.S. male (20 years and over) unemployment figures 1948–1981.
A099Wisconsin employment time series, transportation equipment, January 1961–October 1975.
A100Monthly U.S. male (16–19 years) unemployment figures (thousands) 1948–1981.
Table A2. Descriptives for the 100 time series.
Table A2. Descriptives for the 100 time series.
CodeFNMeanMed.SDCVSkew.SW(p)ADFCodeFNMeanMed.SDCVSkew.SW(p)ADF
A001M539808056−0.55<0.01−0.60 A002M192027124988330.16<0.01−1.82
A003M4842.59 × 10 5 2.61 × 10 5 6.88 × 10 5 270.15<0.01−0.90 A004M31077228−0.24<0.010.56
A005D9247.6331.0047.3399.362.27<0.01−3.16A006Q2071.951.980.2512.78−0.58<0.010.46
A007M25229782741111137.320.79<0.01−0.80 A008M16012812819150.34<0.01−0.59
A009M1600.720.690.10130.66<0.010.53 A010M1603.413.610.8324−0.92<0.011.58
A011M2014.72.65.01062.24<0.01−2.66A012M1992.11.91.0490.92<0.01−0.79
A013M1762.52.41.666−0.52<0.01−2.27 A014M606555320350.79<0.01−1.41
A015M6723.391.893.481031.09<0.012.46 A016M6725.182.895.781111.13<0.011.91
A017M24913013024190.35<0.010.24 A018M2491121142522−0.010.01*0.06
A019M60510.19.64.5440.05<0.01−4.77A020M6057.36.94.3590.04<0.01−6.07
A021M6057.26.83.3460.13<0.01−4.93A022M60510.39.93.8370.04<0.01−4.19
A023M60524241040−0.02<0.01−7.15A024M6366.97.42.638−0.29<0.01−1.18
A025M34310810033300.99<0.01−1.23 A026M27711.711.92.320−0.160.06 *−0.40
A027M1710.210.130.19881.26<0.01−1.81 A028M9648014084264054.991.55<0.01−1.66
A029M24839138511630−0.030.08 *−1.22 A030M13989921213−0.82<0.01−0.28
A031M12113413827200.05<0.011.51 A032M153113114109−0.290.45 *−0.52
A033M1151171181715−0.290.03 *−0.46 A034M1151101111110−0.530.02 *0.30
A035M1137403431780.56<0.015.14 A036M1651.081.100.2018.37−1.15<0.01−0.59
A037M4793.042.831.0233.600.46<0.010.61 A038M2839.3910.022.2724.15−0.80<0.01−1.01
A039M45254521936−0.15<0.010.08 A040Q1081382120668449.550.83<0.01−0.80
A041H19717.0617.000.392.340.150.21 *0.09 A042M118196.3166.0128.065.20.45<0.010.41
A043M151391.1267.0237.4960.720.43<0.01−1.17 A044M18813441425479.135.6−0.41<0.01−1.28
A045H296−0.050.001.07−1887−0.050.55 *−7.66A046M1591189011830882.937.420.12<0.015.71
A047D36541.9842.007.3417.500.44<0.01−1.07 A048A1432.5 × 10 5 2.2 × 10 5 2.1 × 10 5 83.191.06<0.01−2.63
A049M16825.0524.952.319.25−0.020.02 *0.07 A050Q89152741518413588.890.19<0.019.72
A051A1006.697.505.8887.87−2.45<0.01−3.06A052M12071373317424.39−1.09<0.01−0.78
A053A9181.5846.00102.07125.112.80<0.01−3.44A054A91101.8077.0092.1490.511.43<0.01−3.38
A055A9159.4539.0060.42101.631.56<0.01−3.99A056M180492.50521.50189.5438.48−0.17<0.01−0.65
A057M106489.73465.0093.3419.060.92<0.01−1.21 A058M106124.7194.5084.1567.480.52<0.01−3.88
A059M13685.6680.2537.5443.830.91<0.01−1.88 A060M276118.61115.6326.3922.240.86<0.01−0.47
A061Q15561728479765390787.330.44<0.010.06 A062M1365.72 × 10 7 5.53 × 10 7 1.2 × 10 7 21.511.13<0.01−0.84
A063M142231.09226.7324.3710.550.520.01−0.39 A064M1363138831251323210.300.250.22 *−0.16
A065M156754.71761.00102.2013.540.010.04 *0.04 A066M156746.49749.1598.5913.210.080.04 *−0.38
A067M18890640916611392615.36−0.380.01 *−0.38 A068M9415401532474.3530.790.380.05 *0.54
A069M19216701631289.6117.340.53<0.01−0.74 A070M11991.0986.2032.8036.010.34<0.01−1.93
A071M168722.30709.50142.6619.750.72<0.01−0.52 A072W13659135500178430.170.67<0.01−0.68
A073M11411201158270.8924.17−0.37<0.010.76 A074M961038510401220221.210.330.18 *−0.13
A075M144280.30265.50119.9742.800.57<0.01−0.35 A076M84143158771157481103.37<0.01−0.29
A077W1041190911640123110.340.60<0.01−0.16 A078W104746367360047376.350.64<0.01−0.59
A079W1041020101271.787.030.600.01 *−0.41 A080M13245.3644.0010.3822.880.170.15 *−0.81
A081M14717451730479.5227.47−0.39<0.01−1.15 A082M27552.2953.0011.9422.830.180.13 *−1.30
A083M17858.7955.806.6811.360.93<0.01−0.92 A084M1921.62 × 10 5 1.57 × 10 5 4166125.710.32<0.010.25
A085M17840.9741.505.1112.47−0.07<0.011.45 A086M178307.56308.3546.7615.200.17<0.011.51
A087M25213.7014.086.1344.730.16<0.011.13 A088M25265.6768.2014.2521.70−0.53<0.01−0.53
A089M25210.7610.925.1147.50−0.19<0.01−0.05 A090M25211.7411.055.1143.540.38<0.01−0.88
A091M2117661762181910.700.03<0.013.27 A092M4392.21 × 10 5 5.67 × 10 4 2.35 × 10 5 106.320.77<0.011.61
A093M240413.28396.50152.8436.980.36<0.01−1.60 A094M21167876528604.628.910.56<0.012.69
A095M40813731132686.0549.960.91<0.010.60 A096M408422.38342.00252.8659.870.65<0.01−1.95
A097M3967.14 × 10 5 5.57 × 10 5 5.64 × 10 5 78.970.79<0.01−2.51 A098M4081937182579441.040.64<0.01−1.15
A099M17840.6040.504.9512.19−0.65<0.01−0.10 A100M408520.28425.50261.2250.210.64<0.01−1.65
Note: * indicates data is normally distributed based on a Shapiro-Wilk test at p = 0.01. † indicates a nonstationary time series based on the augmented Dickey-Fuller test at p = 0.01. A indicates annual, M indicates monthly, Q indicates quarterly, W indicates weekly, D indicates daily and H indicates hourly. N indicates series length.
Table A3. Out-of-sample forecasting RMSFE.
Table A3. Out-of-sample forecasting RMSFE.
Series’h = 1h = 3
CodeNTStdLogNTStdLog
A0011.2830.5421.1441.8841.1571.715
A00236.27535.01928.84436.99135.90030.741
A00312,521.68813,643.06713,616.73716,041.25016,584.22817,449.138
A0040.2500.1500.1390.7920.3540.333
A00561.62561.54860.47653.90653.26858.074
A0060.0680.0630.0670.1000.1070.099
A007338.358511.055288.753511.033560.970331.925
A0087.1295.6677.50519.20016.09617.845
A0090.0420.0400.0420.0510.0510.051
A0100.1220.1070.1550.2680.3060.417
A0110.3380.2290.2860.8310.4070.560
A0120.9840.9631.0491.3741.4101.386
A0131.3451.1011.3953.1412.9717.484
A0148.0966.8296.4109.5159.8109.638
A0157.24 × 10 9 6.45 × 10 9 6.31 × 10 9 1.1 × 10 10 8.45 × 10 9 7.08 × 10 9
A0161.28 × 10 10 1.46 × 10 10 1.56 × 10 10 1.76 × 10 10 1.74 × 10 10 1.81 × 10 10
A01712.4239.066Inf19.78215.435Inf
A0187.9508.09310.20515.13212.98316.137
A0191.4291.4251.3751.5311.5101.469
A0201.3191.3891.6691.3631.4821.429
A0211.0701.0761.0511.1291.1471.122
A0221.1331.2091.1521.2801.2701.275
A0236.0975.9365.3096.5516.6745.980
A0240.9590.7710.9541.0670.9711.096
A02522.68926.92456.52926.05643.19649.542
A0261.1741.2122.4901.6861.7873.475
A0270.0500.1000.0640.1140.5090.226
A0284137.5764218.1294038.1434474.7564199.9674183.622
A02959.12444.47452.39062.49069.34978.321
A03015.20731.17516.75524.38851.21832.464
A0318.7835.6628.63380.1188.46418.103
A0329.77910.3159.97212.43113.09312.748
A0335.8205.4325.7919.7298.52710.148
A0343.0612.7853.3205.7965.2866.157
A0350.9651.4555.9731.5362.1556.234
A0360.1510.1750.1860.1690.2790.249
A0370.2930.3100.3080.4170.3950.368
A0381.9231.2433.4622.4271.3702.474
A0394.8533.5085.1077.4946.0999.125
A040489.909614.577717.710815.463785.927929.787
A0410.3290.3220.3280.3900.4080.389
A04268.45982.18267.108132.417212.367118.468
A04333.08133.06633.75041.99640.18943.350
A044420.634389.750545.116538.590552.070726.264
A0450.5220.5220.8860.9990.9981.297
A04615.5521.9061.16918.7215.2753.773
A0478.2068.22211.1168.6798.64010.166
A0483.15 × 10 5 1.66 × 10 5 1.79 × 10 7 3.82 × 10 5 1.95 × 10 5 595,729.790
A0491.1891.2481.1991.2771.3771.285
A05018.038128.25417.56237.219295.98035.731
NT = No Transformation, Std = Standardisation, and Log = Logarithmic.
Table A4. Out-of-sample forecasting RMSFE (Continuation).
Table A4. Out-of-sample forecasting RMSFE (Continuation).
Series’h = 1h = 3
CodeNTStdLogNTStdLog
A0513.9833.9764.0035.6945.6125.605
A052272.279276.113574.713268.784271.246445.832
A05335.55939.68036.96326.79532.50031.927
A054124.51989.800125.412110.60688.796107.684
A05543.12137.09044.80834.71537.30240.039
A056266.33399.5021.43E+12287.931214.5569.42 × 10 88
A057125.60084.462126.023131.25392.122129.780
A05838.47435.38471.104119.96499.107139.656
A05944.95041.24045.69645.07940.22445.094
A0607.5988.0857.8458.2489.0908.709
A0616819.1167597.05223,730.34810,097.87711,645.53516,058.889
A0628.44 × 10 6 7.04 × 10 6 1.37 × 10 7 1.42 × 10 7 8.94 × 10 6 1.76 × 10 7
A06321.82921.83113.58326.60026.65510.258
A0644393.0383077.3104376.0775016.4372925.2114980.827
A06528.98211.40527.43030.71716.66230.903
A06612.03310.13115.85419.19616.70328.192
A06711,923.55411,039.52210,617.13217,077.20813,448.76213,328.422
A068362.752357.340369.231462.893433.739473.690
A069160.579203.037208.287203.002208.562230.166
A07014.48313.74114.15229.63526.20629.278
A07126.79327.21723.64727.38133.93025.245
A0721379.2001382.3481472.3251565.4641624.6871401.969
A07369.32769.14168.699122.183114.652115.324
A0743294.8832015.2253445.8293741.5242288.0093749.168
A07548.90159.57458.50741.848117.86064.366
A07625,153.66729,044.83119,684.33935,607.57958,525.28221,322.355
A077394.752387.456395.114873.390813.589836.260
A078701.7411275.259790.6501805.6094921.6741802.354
A07935.70934.06435.66145.10843.55945.010
A0808.9477.1839.72513.50511.50519.930
A081498.376530.862473.551380.003447.889438.681
A0829.2337.2925.20411.2629.3426.710
A0831.2911.1371.2251.6211.4771.518
A08421,495.1859111.16211,832.14332,355.0279641.01611,414.744
A0850.8830.8620.6412.0541.6401.273
A0863.7252.8743.6135.0164.5004.665
A0871.0351.2730.7681.4081.9581.148
A0887.1097.6726.2585.3857.0105.581
A0890.8621.1701.0252.2482.2822.331
A0902.1642.4282.0812.7552.6092.373
A091240.568124.286129.0861376.708148.271160.964
A0923.35 × 10 31 31,233.89116,483.6272.38× 10 32 71,880.79840,209.373
A09363.1195.79× 10 25 54.632300.8931.35× 10 26 76.301
A09444,254.67066,245.62166,414.58876,182.03486,009.42291,714.035
A095136.663139.571144.039287.480311.372265.696
A09658.55880.57867.88965.71579.42970.496
A0971.42× 10 5 144,364.409143,654.990192,501.733182,442.168192,581.617
A098441.676476.749173.231691.051595.127372.177
A0993.1993.1684.4783.2363.0755.052
A10079.93190.46779.684132.074118.099109.238
NT = No Transformation, Std = Standardisation, and Log = Logarithmic.
Table A5. Out-of-sample forecasting RMSFE (Continuation).
Table A5. Out-of-sample forecasting RMSFE (Continuation).
Series’h = 6h = 12
CodeNTStdLogNTStdLog
A0013.0832.3262.9195.5934.3555.503
A00237.59336.76933.31839.84738.22137.346
A00316,770.67217,357.86316,657.42015,925.41418,493.30316,868.789
A0040.7090.4550.4460.7150.6390.585
A00563.20861.15763.06561.79260.27461.740
A0060.1400.1440.1380.2090.1940.204
A007642.282522.970388.967613.790550.802482.934
A00836.75722.65732.05431.67825.32531.028
A0090.0630.0640.0630.0910.0920.091
A0100.3810.4890.5150.4920.9082.268
A0110.9640.8170.6890.9291.5920.977
A0121.8561.9941.7822.5362.9472.197
A0134.5613.983142.1093.9013.6242.37 × 10 7
A01410.3979.91710.10613.58012.91513.602
A0151.92 × 10 10 1.12 × 10 10 8.94 × 10 9 2.86 × 10 10 1.65 × 10 10 1.14 × 10 10
A0162.44 × 10 10 2.09 × 10 10 2.15 × 10 10 4.10 × 10 10 2.70 × 10 10 2.80 × 10 10
A01730.28623.902Inf46.36828.383Inf
A01821.45019.14620.34234.72121.98828.244
A0191.5551.4361.4471.5171.4761.511
A0201.3301.3911.4351.3871.4401.557
A0211.1381.1341.0921.1261.1341.166
A0221.2651.2391.2871.3211.2651.273
A0236.8616.8136.2787.8707.7507.283
A0241.1981.2931.2831.3961.9431.555
A02529.94744.07778.26633.72657.839467.347
A0262.5153.0764.6512.8474.4755.937
A0270.15213.9160.4860.18012187.7880.889
A0284436.7274208.1363995.6652687.6453283.8762860.657
A02970.063104.764108.98180.046153.812222.842
A03040.923103.10282.01050.1631302.044200.370
A0311557.63112.7519.3382.16E+2516.890348,877.932
A03215.13613.36414.78120.47111.58619.519
A03316.61911.81114.619338.296212.22131,730.543
A03410.1009.15111.13627.06616.20324.326
A0352.5543.2836.6234.4155.5137.378
A0360.1900.1790.1990.2590.2410.237
A0370.5420.4940.4670.7060.7950.771
A0382.0771.5882.5044.1532.1123.248
A0399.9587.75015.53812.3309.61527.556
A0401185.420967.9181187.4961781.2421087.9551476.007
A0410.4370.4910.4370.5370.6300.536
A042282.364652.016211.1251844.9724.31 × 10 6 488.603
A04368.25065.16382.580114.347100.176263.026
A044467.834637.165587.869511.228585.946626.670
A0451.4221.4191.6611.3341.3291.570
A04623.72211.9229.53635.32828.66919.088
A0478.8838.55710.1919.1158.8499.983
A0486.35 × 10 5 2.25 × 10 5 Inf2.73 × 10 6 2.71 × 10 5 Inf
A0491.3531.3201.3551.3261.4241.338
A05059.765528.42856.831103.999935.57699.881
NT = No Transformation, Std = Standardisation, and Log = Logarithmic.
Table A6. Out-of-sample forecasting RMSFE (Continuation).
Table A6. Out-of-sample forecasting RMSFE (Continuation).
Series’h = 6h = 12
CodeNTStdLogNTStdLog
A0516.6456.6466.6898.25939.6678.384
A052327.886333.472349.271519.432455.743539,109.574
A05355.65671.07056.37377.76088.06880.306
A054135.441107.388114.467121.277114.368111.057
A05547.05247.07549.96244.94749.41144.323
A056318.035442.579Inf369.9351397.180Inf
A057111.70097.869110.43076.521123.16378.379
A05893.67973.906161.79876.61772.19833.833
A05947.99943.07749.70647.20038.38250.650
A0609.0659.9298.9159.77511.3119.225
A06121,401.30821,029.66434,763.97847,497.76942,578.59243,718.789
A0621.53 × 10 7 9.22 × 10 6 1.43 × 10 7 1.04 × 10 7 9.77 × 10 6 1.33 × 10 8
A06328.56128.55810.21825.35525.5989.908
A0644121.2172945.8533866.33221881.0523065.2846688.179
A06531.50724.46427.82231.16139.85930.524
A06633.90724.50126.52385.87039.73825.640
A06724,790.11114,696.43716,013.91240,325.31212,620.24018,179.972
A068490.894450.505499.947327.430426.795335.514
A069233.149233.000217.660261.487235.576212.738
A07021.05517.47436.29918.09215.44516.239
A07130.03330.92228.06323.33537.39028.972
A072991.9181186.0831013.9851022.7951148.5041004.258
A073191.317173.546170.028371.023236.600288.816
A0744012.0152191.1423290.4468470.5872279.0843402.155
A07540.891115.55133.46744.112228.58543.708
A07669,298.2302.26 × 10 5 24,927.1582.63 × 10 5 3.64 × 10 6 7571.182
A0771714.2261532.8121561.1123608.9452515.0703097.836
A0784173.555654,416.0593581.8741.25 × 10 4 1.01 × 10 9 7095.955
A07958.26052.79358.02397.73097.05696.553
A08016.26813.18914.74712.15813.09615.346
A081450.450494.004450.436523.863609.279614.195
A08210.66510.6207.96110.3627.75710.242
A0831.8711.6981.9587.3861.9673.098
A08474,374.86111,949.86415,030.1894.54 × 10 5 15,064.14835,040.170
A0852.9722.3752.4435.3943.8674.246
A0866.0895.9035.6417.3249.1077.144
A0871.5172.5521.5212.5223.0602.358
A0885.6166.7725.8064.9167.0635.706
A0893.8822.9423.0455.5973.7094.223
A0902.8663.3982.6592.8303.7632.913
A09128,312.543254.885326.9471.39 × 10 7 369.785724.020
A0927.46 × 10 32 1.41 × 10 5 73816.3949.10 × 10 32 3.81 × 10 5 1.36 × 10 5
A0937814.4121.30 × 10 26 95.8427.95 × 10 6 2.84 × 10 25 128.056
A0941.02 × 10 5 9.46 × 10 4 1.10 × 10 5 1.41 × 10 5 1.30 × 10 5 1.75 × 10 5
A095406.105441.419404.087503.204588.870604.329
A09678.44890.12675.130100.969104.19983.458
A0972.06 × 10 5 1.92 × 10 5 2.06 × 10 5 2.44 × 10 5 2.43 × 10 5 2.42 × 10 5
A098858.043625.258751.2711077.612849.1841205.582
A0993.7613.3374.9144.3703.2535.528
A100140.073132.262141.576188.195173.609194.600
NT = No Transformation, Std = Standardisation, and Log = Logarithmic.

References

  1. Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2014. [Google Scholar]
  2. Lütkepohl, H.; Xu, F. The role of the log transformation in forecasting economic variables. Empir. Econ. 2012, 42, 619–638. [Google Scholar] [CrossRef] [Green Version]
  3. Bowden, G.J.; Dandy, G.C.; Maier, H.R. Data transformation for neural network models in water resources applications. J. Hydroinform. 2003, 5, 245–258. [Google Scholar] [CrossRef]
  4. Kling, J.L.; Bessler, D.A. A comparison of multivariate forecasting procedures for economic time series. Int. J. Forecast. 1985, 1, 5–24. [Google Scholar] [CrossRef]
  5. Chatfield, F.; Faraway, J. Time series forecasting with neural networks: A comparative study using the airline data. J. R. Stat. Soc. Ser. 1998, 47, 231–250. [Google Scholar] [CrossRef]
  6. Granger, C.; Newbold, P. Forecasting Economic Time Series, 2nd ed.; Academic Press: Cambridge, MA, USA, 1986. [Google Scholar]
  7. Chatfield, C.; Prothero, D. Box-Jenkins seasonal forecasting: Problems in a case study. J. R. Stat. Soc. Ser. 1973, 136, 295–336. [Google Scholar] [CrossRef]
  8. Haida, T.; Muto, S. Regression based peak load forecasting using a transformation technique. IEEE Trans. Power Syst. 1994, 9, 1788–1794. [Google Scholar] [CrossRef]
  9. Nelson, H.L., Jr.; Granger, C.W.J. Experience with using the Box-Cox transformation when forecasting economic time series. J. Econom. 1979, 10, 57–69. [Google Scholar] [CrossRef]
  10. Chen, S.; Wang, J.; Zhang, H. A hybrid PSO-SVM model based on clustering algorithm for short-term atmospheric pollutant concentration forecasting. Technol. Forecast. Soc. Chang. 2019, 146, 41–54. [Google Scholar] [CrossRef]
  11. Brave, S.A.; Butters, R.A.; Justiniano, A. Forecasting economic activity with mixed frequency BVARs. Int. J. Forecast. 2019, 35, 1692–1707. [Google Scholar] [CrossRef]
  12. Sanei, S.; Hassani, H. Singular Spectrum Analysis of Biomedical Signals; CRC Press: Boca Raton, FL, USA, 2015. [Google Scholar] [CrossRef]
  13. Golyandina, N.; Osipov, E. The ‘Caterpillar’-SSA method for analysis of time series with missing values. J. Stat. Plan. Inference 2007, 137, 2642–2653. [Google Scholar] [CrossRef]
  14. Silva, E.S.; Hassani, H.; Heravi, S.; Huang, X. Forecasting tourism demand with denoised neural networks. Ann. Tour. Res. 2019, 74, 134–154. [Google Scholar] [CrossRef]
  15. Silva, E.S.; Ghodsi, Z.; Ghodsi, M.; Heravi, S.; Hassani, H. Cross country relations in European tourist arrivals. Ann. Tour. Res. 2017, 63, 151–168. [Google Scholar] [CrossRef]
  16. Silva, E.S.; Hassani, H. On the use of singular spectrum analysis for forecasting U.S. trade before, during and after the 2008 recession. Int. Econ. 2015, 141, 34–49. [Google Scholar] [CrossRef] [Green Version]
  17. Silva, E.S.; Hassani, H.; Heravi, S. Modeling European industrial production with multivariate singular spectrum analysis: A cross-industry analysis. J. Forecast. 2018, 37, 371–384. [Google Scholar] [CrossRef]
  18. Hassani, H.; Silva, E.S. Forecasting UK consumer price inflation using inflation forecasts. Res. Econ. 2018, 72, 367–378. [Google Scholar] [CrossRef] [Green Version]
  19. Silva, E.S.; Hassani, H.; Gee, L. Googling Fashion: Forecasting fashion consumer behaviour using Google trends. Soc. Sci. 2019, 8, 111. [Google Scholar] [CrossRef] [Green Version]
  20. Hassani, H.; Silva, E.S.; Gupta, R.; Das, S. Predicting global temperature anomaly: A definitive investigation using an ensemble of twelve competing forecasting models. Phys. Stat. Mech. Appl. 2018, 509, 121–139. [Google Scholar] [CrossRef]
  21. Ghil, M.; Allen, R.M.; Dettinger, M.D.; Ide, K.; Kondrashov, D.; Mann, M.E.; Robertson, A.W.; Saunders, A.; Tian, Y.; Varadi, F.; et al. Advanced spectral methods for climatic time series. Rev. Geophys 2002, 40, 3.1–3.41. [Google Scholar] [CrossRef] [Green Version]
  22. Xu, S.; Hu, H.; Ji, L.; Wang, P. Embedding Dimension Selection for Adaptive Singular Spectrum Analysis of EEG Signal. Sensors 2018, 18, 697. [Google Scholar] [CrossRef] [Green Version]
  23. Mao, X.; Shang, P. Multivariate singular spectrum analysis for traffic time series. Phys. Stat. Mech. Appl. 2019, 526, 121063. [Google Scholar] [CrossRef]
  24. Golyandina, N.; Korobeynikov, A.; Zhigljavsky, A. Singular Spectrum Analysis with R. Use R; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar] [CrossRef]
  25. Ghodsi, M.; Hassani, H.; Rahmani, D.; Silva, E.S. Vector and recurrent singular spectrum analysis: Which is better at forecasting? J. Appl. Stat. 2018, 45, 1872–1899. [Google Scholar] [CrossRef]
  26. Singular Spectrum Analysis for Time Series; Springer: Berlin/Heidelberg, Germany, 2013. [CrossRef]
  27. Guerrero, V.M. Time series analysis supported by power transformations. J. Forecast. 1993, 12, 37–48. [Google Scholar] [CrossRef]
  28. Golyandina, N.; Nekrutkin, V.; Zhigljavski, A. Analysis of Time Series Structure: SSA and Related Techniques; CRC Press: Boca Raton, FL, USA, 2001. [Google Scholar]
  29. Khan, M.A.R.; Poskitt, D. Forecasting stochastic processes using singular spectrum analysis: Aspects of the theory and application. Int. J. Forecast. 2017, 33, 199–213. [Google Scholar] [CrossRef]
  30. Ioffe, S.; Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the ICML’15: 32nd International Conference on International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
  31. Akritas, M.G.; Arnold, S.F. Fully nonparametric hypotheses for factorial designs I: Multivariate repeated measures designs. J. Am. Stat. Assoc. 1994, 89, 336–343. [Google Scholar] [CrossRef]
  32. Brunner, E.; Domhof, S.; Langer, F. Nonparametric Analysis of Longitudinal Data in Factorial Experiments; John Wiley: New York, NY, USA, 2002. [Google Scholar]
  33. Jarque, C.M.; Bera, A.K. Efficient tests for normality, homoscedasticity and serial independence of regression residuals. Econ. Lett. 1980, 6, 255–259. [Google Scholar] [CrossRef]
  34. Kwiatkowski, D.; Phillips, P.C.; Schmidt, P.; Shin, Y. Testing the null hypothesis of stationarity against the alternative of a unit root: How sure are we that economic time series have a unit root? J. Econom. 1992, 54, 159–178. [Google Scholar] [CrossRef]
  35. D’Agostino, R.B. Transformation to normality of the null distribution of g1. Biometrika 1970, 57, 679–681. [Google Scholar] [CrossRef]
  36. Korobeynikov, A. Computation- and space-efficient implementation of SSA. Stat. Interface 2010, 3, 257–368. [Google Scholar] [CrossRef] [Green Version]
  37. Golyandina, N.; Korobeynikov, A. Basic singular spectrum analysis and forecasting with R. Comput. Stat. Data Anal. 2014, 71, 934–954. [Google Scholar] [CrossRef] [Green Version]
  38. Golyandina, N.; Korobeynikov, A.; Shlemov, A.; Usevich, K. Multivariate and 2D extensions of singular spectrum analysis with the Rssa package. J. Stat. Softw. 2015, 67, 1–78. [Google Scholar] [CrossRef] [Green Version]
  39. Noguchi, K.; Gel, Y.R.; Brunner, E.; Konietschke, F. nparLD: An R software package for the nonparametric analysis of longitudinal data in factorial experiments. J. Stat. Softw. 2012, 50, 1–23. [Google Scholar] [CrossRef] [Green Version]
Figure 1. A selection of nine real time series.
Figure 1. A selection of nine real time series.
Signals 01 00002 g001
Figure 2. Effect plot: RMSFE∼Tr.
Figure 2. Effect plot: RMSFE∼Tr.
Signals 01 00002 g002
Figure 3. Effect plot: RMSFE ∼ Tr + Freq + Tr × Freq for forecast horizons h = 1 and h = 3 .
Figure 3. Effect plot: RMSFE ∼ Tr + Freq + Tr × Freq for forecast horizons h = 1 and h = 3 .
Signals 01 00002 g003
Figure 4. Effect plot: RMSFE ∼ Tr + Freq + Tr × Freq for forecast horizons h = 6 and h = 12 .
Figure 4. Effect plot: RMSFE ∼ Tr + Freq + Tr × Freq for forecast horizons h = 6 and h = 12 .
Signals 01 00002 g004
Table 1. Number of time series with each feature.
Table 1. Number of time series with each feature.
FactorLevels
Sampling FrequencyAnnualMonthlyQuarterlyWeeklyDailyHourly
5834422
SkewnessPositive SkewNegative SkewSymmetric
612118
NormalityNormalNon-normal
1882
StationarityStationaryNon-Stationary
1486
Table 2. Wald-type test results.
Table 2. Wald-type test results.
ModelFactorP-Value
h = 1h = 3h = 6h = 12
RMSFE ∼ Tr + SkewSkew0.00370.00430.00560.0131
    + Tr × SkewTr0.07180.14470.41860.2098
Tr × Skew0.41770.51060.21200.1482
RMSFE ∼ Tr + StationarityStationarity0.09970.0530.05010.0248
    + Tr × StationarityTr0.23510.37540.76070.5276
Tr × Stationarity0.51600.68080.76780.3792
RMSFE ∼ Tr + NormalityNormality0.50520.53200.49540.5820
    + Tr × NormalityTr0.07470.11520.58490.4892
Tr × Normality0.24920.35760.40420.4549
RMSFE ∼ Tr + FreqFreq0.00000.00000.00000.0000
    + Tr × FreqTr0.08410.11940.13550.1143
Tr × Freq0.00000.00000.00000.0000
RMSFE ∼ TrTr0.42710.67400.95350.4860
Here, Freq, Skew, and Tr represent frequency, skewness, and transformation, respectively. Bold values show the significant effects at the α = 0.05 significance level.

Share and Cite

MDPI and ACS Style

Hassani, H.; Yeganegi, M.R.; Khan, A.; Silva, E.S. The Effect of Data Transformation on Singular Spectrum Analysis for Forecasting. Signals 2020, 1, 4-25. https://doi.org/10.3390/signals1010002

AMA Style

Hassani H, Yeganegi MR, Khan A, Silva ES. The Effect of Data Transformation on Singular Spectrum Analysis for Forecasting. Signals. 2020; 1(1):4-25. https://doi.org/10.3390/signals1010002

Chicago/Turabian Style

Hassani, Hossein, Mohammad Reza Yeganegi, Atikur Khan, and Emmanuel Sirimal Silva. 2020. "The Effect of Data Transformation on Singular Spectrum Analysis for Forecasting" Signals 1, no. 1: 4-25. https://doi.org/10.3390/signals1010002

Article Metrics

Back to TopTop