Time Series Forecasting and Classification Models Based on Recurrent with Attention Mechanism and Generative Adversarial Networks
Abstract
:1. Introduction
1.1. Time Series Forecasting and Classification
1.2. Related Research
1.3. Pros and Cons of the Models
- LSTM with autoencoder and with attention mechanism is proposed and applied to the time series forecast modeling. Gaussian sliding window is proposed for the weights initialization in LSTM attention model;
- Performances of TCN-based NN to model for time series forecasting and classification are evaluated. The proposed models outperform the classification methods such as 1NN-DTW, BOSS and WEASEL;
- A GAN model with LSTM as the G and MLP as the D for time series forecasting task is proposed and the performances are evaluated. Comparisons are made with the statistical ARIMA models.
2. Methodology
2.1. Basics for the Proposed Models
2.2. Datasets
2.3. Long Short-Term Memory with Autoencoder and Attention
2.4. Temporal Convolutional Network Architectures for Forecasting and Classification
2.5. Comparisons with Statistical and Machine Learning Methods
2.6. Generative Adversarial Architecture for Forecasting
3. Results and Discussion
3.1. Long Short-Term Memory with Autoencoder
3.2. Attention-Based Long Short-Term Memory
3.3. Temporal Convolutional Network
3.4. Generative Adversarial Network
4. Discussions
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
Abbreviations
MLP | Multiple Layer Perceptron |
DNN | Deep Neural Network |
TCN | Temporal Convolutional Networks |
LSTM | Long Short-Term Memory |
GAN | Generative Adversarial Network |
RNN | Recurrent Neural Network |
NLP | Natural Language Processing |
ARIMA | Auto Regressive Integrated Moving Average |
ETS | Exponential Smoothing |
CNN | Covolutional Neural Network |
ECG | Electrocardiogram |
VPN | Virtual Private Network |
FCN | Fully Connected Network |
SSE | Sum of Squared Error |
RMSE | Root Mean Squared Error |
RF | Random Forest |
GB | Gradient Boosting |
ET | Extra Tree |
BA | BAgging method |
BS | Brier Score |
RF | Generative Adversarial Network |
References
- Kleinbaum, D.G.; Dietz, K.; Gail, M.; Klein, M.; Klein, M. Logistic Regression; Springer: Berlin/Heidelberg, Germany, 2002. [Google Scholar]
- Rish, I. An empirical study of the naive Bayes classifier. In Proceedings of the IJCAI 2001 Workshop on Empirical Methods in Artificial Intelligence, Seattle, WA, USA, 4–6 August 2001; Volume 3, pp. 41–46. [Google Scholar]
- Hearst, M.A.; Dumais, S.T.; Osuna, E.; Platt, J.; Scholkopf, B. Support vector machines. IEEE Intell. Syst. Their Appl. 1998, 13, 18–28. [Google Scholar] [CrossRef] [Green Version]
- Peterson, L.E. K-nearest neighbor. Scholarpedia 2009, 4, 1883. [Google Scholar] [CrossRef]
- Adam, S.P.; Alexandropoulos, S.A.N.; Pardalos, P.M.; Vrahatis, M.N. No free lunch theorem: A review. In Approximation and Optimization; Springer: Berlin/Heidelberg, Germany, 2019; pp. 57–82. [Google Scholar]
- Armstrong, J.S.; Green, K.C.; Graefe, A. Golden rule of forecasting: Be conservative. J. Bus. Res. 2015, 68, 1717–1731. [Google Scholar] [CrossRef] [Green Version]
- Ismail Fawaz, H.; Forestier, G.; Weber, J.; Idoumghar, L.; Muller, P.A. Deep learning for time series classification: A review. Data Min. Knowl. Discov. 2019, 33, 917–963. [Google Scholar] [CrossRef] [Green Version]
- Wang, Z.; Yan, W.; Oates, T. Time series classification from scratch with deep neural networks: A strong baseline. In Proceedings of the International Joint Conference on Neural Networks, Anchorage, AK, USA, 14–19 May 2017; pp. 1578–1585. [Google Scholar]
- Fawaz, H.I.; Lucas, B.; Forestier, G.; Pelletier, C.; Schmidt, D.F.; Weber, J.; Webb, G.I.; Idoumghar, L.; Muller, P.A.; Petitjean, F. InceptionTime: Finding AlexNet for Time Series Classification. Data Min. Knowl. Discov. 2020, 34, 1936–1962. [Google Scholar] [CrossRef]
- Time Series Classification Repository. Available online: http://timeseriesclassification.com/index.php (accessed on 15 December 2020).
- Zhang, G. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
- Koehler, J.; Kuenzer, C. Forecasting Spatio-Temporal Dynamics on the Land Surface Using Earth Observation Data—A Review. Remote Sens. 2020, 12, 3513. [Google Scholar] [CrossRef]
- Ghaderpour, E.; Vujadinovic, T. The Potential of the Least-Squares Spectral and Cross-Wavelet Analyses for Near-Real-Time Disturbance Detection within Unequally Spaced Satellite Image Time Series. Remote Sens. 2020, 12, 2446. [Google Scholar] [CrossRef]
- Cho, K.; Courville, A.; Bengio, Y. Describing Multimedia Content Using Attention-Based Encoder-Decoder Networks. IEEE Trans. Multimed. 2015, 17, 1875–1886. [Google Scholar] [CrossRef] [Green Version]
- Bahdanau, D.; Cho, K.H.; Bengio, Y. Neural machine translation by jointly learning to align and translate. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015—Conference Track Proceedings, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
- Cheng, J.; Dong, L.; Lapata, M. Long short-term memory-networks for machine reading. In Proceedings of the EMNLP 2016—Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 551–561. [Google Scholar]
- Graves, A.; Mohamed, A.R.; Hinton, G. Speech recognition with deep recurrent neural networks. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, Vancouver, BC, Canada, 26–31 May 2013; pp. 6645–6649. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5999–6009. [Google Scholar]
- Ju, Y.; Sun, G.; Chen, Q.; Zhang, M.; Zhu, H.; Rehman, M.U. A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access 2019, 7, 28309–28318. [Google Scholar] [CrossRef]
- Berardi, V.L.; Zhang, G.P. An empirical investigation of bias and variance in time series forecasting: Modeling considerations and error evaluation. IEEE Trans. Neural Networks 2003, 14, 668–679. [Google Scholar] [CrossRef] [PubMed]
- Fulcher, B.D.; Jones, N.S. Highly Comparative Feature-Based Time-Series Classification. IEEE Trans. Knowl. Data Eng. 2014, 26, 3026–3037. [Google Scholar] [CrossRef] [Green Version]
- Mei, J.; Liu, M.; Wang, Y.; Gao, H. Learning a Mahalanobis Distance-Based Dynamic Time Warping Measure for Multivariate Time Series Classification. IEEE Trans. Cybern. 2016, 46, 1363–1374. [Google Scholar] [CrossRef] [PubMed]
- Amin, S.U.; Alsulaiman, M.; Muhammad, G.; Bencherif, M.A.; Hossain, M.S. Multilevel Weighted Feature Fusion Using Convolutional Neural Networks for EEG Motor Imagery Classification. IEEE Access 2019, 7, 18940–18950. [Google Scholar] [CrossRef]
- Mori, U.; Mendiburu, A.; Lozano, J.A. Similarity Measure Selection for Clustering Time Series Databases. IEEE Trans. Knowl. Data Eng. 2016, 28, 181–195. [Google Scholar] [CrossRef]
- Liu, C.; Hsaio, W.; Tu, Y. Time Series Classification With Multivariate Convolutional Neural Network. IEEE Trans. Ind. Electron. 2019, 66, 4788–4797. [Google Scholar] [CrossRef]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 3, 2672–2680. [Google Scholar] [CrossRef] [Green Version]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Yu, L.; Zhang, W.; Wang, J.; Yu, Y. SeqGAN: Sequence generative adversarial nets with policy gradient. In Proceedings of the 31st AAAI Conference on Artificial Intelligence (AAAI 2017), San Francisco, CA, USA, 4–10 February 2017; pp. 2852–2858. [Google Scholar]
- Gong, X.; Chang, S.; Jiang, Y.; Wang, Z. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 3224–3234. [Google Scholar]
- Zhang, H.; Xu, T.; Li, H.; Zhang, S.; Wang, X.; Huang, X.; Metaxas, D.N. StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 1947–1962. [Google Scholar] [CrossRef] [Green Version]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved training of wasserstein GANs. Adv. Neural Inf. Process. Syst. 2017, 30, 5768–5778. [Google Scholar]
- Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training GANs. Adv. Neural Inf. Process. Syst. 2016, 29, 2234–2242. [Google Scholar]
- Che, T.; Li, Y.; Jacob, A.P.; Bengio, Y.; Li, W. Mode regularized generative adversarial networks. arXiv 2016, arXiv:1612.02136. [Google Scholar]
- Mescheder, L.; Geiger, A.; Nowozin, S. Which training methods for GANs do actually converge? In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018; pp. 5589–5626. [Google Scholar]
- Arora, S.; Ge, R.; Liang, Y.; Ma, T.; Zhang, Y. Generalization and equilibrium in generative adversarial nets (GANs). In Proceedings of the 34th International Conference on Machine Learning (ICML 2017), Sydney, Australia, 6–11 August 2017; pp. 322–349. [Google Scholar]
- Che, T.; Li, Y.; Zhang, R.; Hjelm, R.D.; Li, W.; Song, Y.; Bengio, Y. Maximum-likelihood augmented discrete generative adversarial networks. arXiv 2017, arXiv:1702.07983. [Google Scholar]
- Kusner, M.J.; Hernández-Lobato, J.M. Gans for sequences of discrete elements with the gumbel-softmax distribution. arXiv 2016, arXiv:1611.04051. [Google Scholar]
- Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Kaggle Web Traffic Competition. Available online: https://www.kaggle.com/c/web-traffic-time-series-forecasting (accessed on 15 December 2020).
- VPN-nonVPN Dataset from Canadian Institute of Cyber-Security. Available online: https://www.unb.ca/cic/datasets/vpn.html (accessed on 15 December 2020).
- Le, Q.V.; Jaitly, N.; Hinton, G.E. A simple way to initialize recurrent networks of rectified linear units. arXiv 2015, arXiv:1504.00941. [Google Scholar]
- Zhang, S.; Wu, Y.; Che, T.; Lin, Z.; Memisevic, R.; Salakhutdinov, R.R.; Bengio, Y. Architectural complexity measures of recurrent neural networks. In Proceedings of the Advances in Neural Information Processing Systems, Barcelona, Spain, 5–10 December 2016; pp. 1822–1830. [Google Scholar]
- Press, O.; Smith, N.A.; Levy, O. Improving Transformer Models by Reordering their Sublayers. arXiv 2019, arXiv:1911.03864. [Google Scholar]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Dehghani, M.; Gouws, S.; Vinyals, O.; Uszkoreit, J.; Kaiser, Ł. Universal transformers. arXiv 2018, arXiv:1807.03819. [Google Scholar]
- Draguns, A.; Ozoliņš, E.; Šostaks, A.; Apinis, M.; Freivalds, K. Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences. arXiv 2020, arXiv:2004.04662. [Google Scholar]
Optimizer | LSTM | LSTM-auto | LSTM-att | TCN |
---|---|---|---|---|
Adadelta | 57.98/49.23 (10 m 34 s) | 51.57/39.34 (12 m 30 s) | 46.77/40.12 (1 m 50 s) | 49.34/40.15 (41 s) |
Adagrad | 59.21/51.34 (11 m 02 s) | 45.74/40.83 (12 m 29 s) | 46.77/41.29 (1 m 48 s) | 46.24/39.94 (42 s) |
Adam | 51.20/46.91 (10 m 29 s) | 45.21/40.60 (11 m 53 s) | 45.12/40.40 (1 m 29 s) | 45.82/39.61 (31 s) |
Rmsprop | 55.23/51.29 (11 m 38 s) | 45.81/40.38 (13 m 42 s) | 46.34/40.52 (1 m 39 s) | 45.85/40.27 (45 s) |
SGD | 56.54/52.38 (11 m 44 s) | 51.79/39.47 (13 m 36 s) | 50.32/40.54 (1 m 52 s) | 52.92/41.76 (44 s) |
Sequence Modeling | Model Size | LSTM | LSTM-auto | LSTM-Att | TCN |
---|---|---|---|---|---|
Adding problem (loss) | 70 K | 0.175 | |||
MNIST (accuracy) | 70 K | 87.2 | 89.7 | 95.2 | 98.7 |
Music MIDI data (loss) | 500 K | 0.0822 | 0.0755 | 0.0671 | 0.0635 |
Copying memory (loss) | 16 K | 0.0301 | 0.0204 | 0.0198 | 0.0182 |
Kaggle web traffic (RMSE) | 10 K | 49.83 | 48.81 | 46.92 | 47.12 |
ECG classification (accuracy) | 10 K | 95.8 | 98.6 | 98.2 | 99.5 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, K.; Wang, W.; Hu, T.; Deng, K. Time Series Forecasting and Classification Models Based on Recurrent with Attention Mechanism and Generative Adversarial Networks. Sensors 2020, 20, 7211. https://doi.org/10.3390/s20247211
Zhou K, Wang W, Hu T, Deng K. Time Series Forecasting and Classification Models Based on Recurrent with Attention Mechanism and Generative Adversarial Networks. Sensors. 2020; 20(24):7211. https://doi.org/10.3390/s20247211
Chicago/Turabian StyleZhou, Kun, Wenyong Wang, Teng Hu, and Kai Deng. 2020. "Time Series Forecasting and Classification Models Based on Recurrent with Attention Mechanism and Generative Adversarial Networks" Sensors 20, no. 24: 7211. https://doi.org/10.3390/s20247211
APA StyleZhou, K., Wang, W., Hu, T., & Deng, K. (2020). Time Series Forecasting and Classification Models Based on Recurrent with Attention Mechanism and Generative Adversarial Networks. Sensors, 20(24), 7211. https://doi.org/10.3390/s20247211