Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers †
Abstract
:1. Introduction
2. Methodologies
Outlier Detection and Treatment Procedures
- 1
- Linear interpolation (LI)
- Outlier detection: Observations are considered outliers if they are less than or greater than , where and denote the first and third quartiles, respectively, and (interquartile range) is the difference between the third and first quartiles ( rule).
- Outlier treatment: Any outliers that are identified are replaced by LI using the neighbouring observations [13].
- 2
- Iterative method based on the robust Kalman filter (RKF)
- Outlier detection: Outlier detection is performed by applying the rule on the standardized residuals after fitting a state-space model to the data.
- Outlier treatment: An alternative to the state estimator , inspired by the work by [14] and subsequently by [15], is proposed. In this approach, the state prediction is replaced by
- 3
- Iterative method based on the Kalman filter for time series with missing values (naKF)
- Outlier detection: Outlier detection is performed by applying the rule to the standardized residuals after fitting a state-space model to the data.
- Outlier treatment: Outlier observations are assumed to be missing values and the state estimator and its mean square error are replaced by and , respectively. The missing observations are replaced by and the state-space model is fitted j times to the corrected time series until , , or for some value j.
- ;
3. Results
3.1. Simulation Results
3.2. Illustrative Examples
4. Discussion
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Triantafyllopoulos, K. The State Space Model in Finance. In Bayesian Inference of State Space Models; Springer Texts in Statistics; Springer: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
- Auger-Methe, M.; Newman, K.; Cole, D.; Empacher, F.; Gryba, R.; King, A.A.; Leos-Barajas, V.; Flemming, J.M.; Nielsen, A.; Petris, G.; et al. A guide to state–space modeling of ecological time series. Ecol. Monogr. 2021, 91, 1–38. [Google Scholar] [CrossRef]
- Wu, H.; Matteson, D.; Wells, M. Interpretable Latent Variables in Deep State Space Models. arXiv 2022, arXiv:2203.02057. [Google Scholar]
- Matsuura, K. Time Series Data Analysis with State Space Model. In Bayesian Statistical Modeling with Stan, R, and Python; Springer: Singapore, 2022. [Google Scholar] [CrossRef]
- Monteiro, M.; Costa, M. Change Point Detection by State Space Modeling of Long-Term Air Temperature Series in Europe. Stats 2023, 6, 7. [Google Scholar] [CrossRef]
- Pereira, F.C.; Gonçalves, A.M.; Costa, M. Short-term forecast improvement of maximum temperature by state-space model approach: The study case of the TO CHAIR project. Stoch. Environ. Res. Risk Assess. 2023, 37, 219–231. [Google Scholar] [CrossRef]
- Shumway, R.H.; Stoffer, D.S. Time Series Analysis and its Applications: With R Examples; Springer: New York, NY, USA, 2017. [Google Scholar]
- Kalman, R. A New Approach to Linear Filtering and Prediction Problems. ASME J. Basic Eng. 1960, 82, 35–45. [Google Scholar] [CrossRef] [Green Version]
- Harvey, A. Forecasting, Structural Time Series Models and the Kalman Filter; Cambridge University Press: Cambridge, UK, 1990. [Google Scholar] [CrossRef]
- Teunissen, P.J.G.; Khodab, A.; Psychas, D. A generalized Kalman filter with its precision in recursive form when the stochastic model is misspecified. J. Geod. 2021, 95, 108. [Google Scholar] [CrossRef]
- Huang, Y.; Zhang, Y.; Zhao, Y.; Shi, P.; Chambers, J.A. A Novel Outlier-Robust Kalman Filtering Framework Based on Statistical Similarity Measure. IEEE Trans. Autom. Control 2021, 66, 2677–2692. [Google Scholar] [CrossRef]
- Auger-Méthé, M.; Field, C.; Albertsen, C.M.; Derocher, A.E.; Lewis, M.A.; Jonsen, I.D.; Flemming, J.M. State-space models’ dirty little secrets: Even simple linear Gaussian models can have estimation problems. Sci. Rep. 2016, 6, 26677. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Hyndman, R.J.; Athanasopoulos, G. Forecasting: Principles and Practice, 2nd ed.; OTexts: Melbourne, Australia, 2018. [Google Scholar]
- Cipra, T.; Romera, R. Kalman filter with outliers and missing observations. Test 1997, 6, 379–395. [Google Scholar] [CrossRef]
- Crevits, R.; Croux, C. Robust estimation of linear state space models. Commun. Stat.- Simul. Comput. 2019, 48, 1694–1705. [Google Scholar] [CrossRef] [Green Version]
- Durbin, J.; Koopman, S.J. Time Series Analysis by State Space Methods, 2nd ed.; Oxford Statistical Science Series; Oxford University Press: Oxford, UK, 2013. [Google Scholar] [CrossRef]
Parameters | RMSE | MAE | Outlier | Mean | Mean | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Detection | Rate 1 | Rate 2 | |||||||||
0.10 | 0.05 | NC | 0.0416 | 0.0276 | 0.4271 | 0.0335 | 0.0217 | 0.3399 | - | - | |
C | 0.0621 | 0.2614 | 0.5243 | 0.0475 | 0.2214 | 0.4033 | - | - | |||
LI | 0.0584 | 0.1772 | 0.4910 | 0.0438 | 0.1286 | 0.3781 | Time series | 84% | 42% | ||
RKF | 0.0665 | 0.0910 | 0.4910 | 0.0456 | 0.0718 | 0.3781 | Standardized | 74% | 88% | ||
naKF | 0.0536 | 0.0556 | 0.4667 | 0.0393 | 0.0337 | 0.3607 | residuals | ||||
1.00 | 0.10 | NC | 0.3114 | 0.1453 | 1.0734 | 0.2488 | 0.1088 | 0.8539 | - | - | |
C | 0.4638 | 0.6275 | 1.2216 | 0.3644 | 0.4951 | 0.9507 | - | - | |||
LI | 0.4255 | 0.5723 | 1.2127 | 0.3432 | 0.4499 | 0.9421 | Time series | 45% | 8% | ||
RKF | 0.4216 | 0.4347 | 1.2048 | 0.3384 | 0.3422 | 0.9387 | Standardized | 61% | 42% | ||
naKF | 0.4285 | 0.3821 | 1.2210 | 0.3422 | 0.2706 | 0.9383 | residuals | ||||
0.10 | 1.00 | NC | 0.0840 | 0.2456 | 1.1675 | 0.0618 | 0.1977 | 0.9326 | - | - | |
C | 14.5332 | 468.2479 | 1.4690 | 1.3638 | 77.8606 | 1.1298 | - | - | |||
LI | 0.1025 | 0.3266 | 1.1653 | 0.0719 | 0.2373 | 0.9250 | Time series | 91% | 99% | ||
RKF | 0.3768 | 0.5958 | 1.2860 | 0.1245 | 0.3587 | 0.9876 | Standardized | 78% | 98% | ||
naKF | 0.4510 | 0.3155 | 1.2844 | 0.1582 | 0.2525 | 0.9620 | residuals | ||||
0.05 | 0.10 | NC | 0.0275 | 0.0329 | 0.4413 | 0.0212 | 0.0260 | 0.3517 | - | - | |
C | 0.0564 | 0.4242 | 0.5416 | 0.0333 | 0.3516 | 0.4180 | - | - | |||
LI | 0.0343 | 0.1501 | 0.4663 | 0.0237 | 0.0830 | 0.3652 | Time series | 91% | 83% | ||
RKF | 0.0586 | 0.0710 | 0.4914 | 0.0327 | 0.0557 | 0.3798 | Standardized | 75% | 97% | ||
naKF | 0.0476 | 0.0391 | 0.4714 | 0.0279 | 0.0294 | 0.3635 | residuals |
Parameters | RMSE | MAE | Outlier | Mean | Mean | ||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Detection | Rate 1 | Rate 2 | |||||||||
0.10 | 0.05 | NC | 0.0138 | 0.0086 | 0.4315 | 0.0109 | 0.0068 | 0.3443 | - | - | |
C | 0.0170 | 0.2228 | 0.5303 | 0.0137 | 0.2187 | 0.4115 | - | - | |||
LI | 0.0184 | 0.2156 | 0.5561 | 0.0147 | 0.2112 | 0.4193 | Time series | 52% | 4% | ||
RKF | 0.0189 | 0.0696 | 0.4913 | 0.0146 | 0.0684 | 0.3822 | Standardized | 77% | 91% | ||
naKF | 0.0181 | 0.0133 | 0.4656 | 0.0137 | 0.0103 | 0.3613 | residuals | ||||
1.00 | 0.10 | NC | 0.1156 | 0.0524 | 1.0891 | 0.0934 | 0.0419 | 0.8685 | - | - | |
C | 0.1376 | 0.4955 | 1.2374 | 0.1112 | 0.4775 | 0.9679 | - | - | |||
LI | 0.1454 | 0.4962 | 1.3117 | 0.1165 | 0.4788 | 0.9915 | Time series | 19% | 1% | ||
RKF | 0.1366 | 0.3261 | 1.2226 | 0.1102 | 0.3114 | 0.9550 | Standardized | 65% | 41% | ||
naKF | 0.1643 | 0.2065 | 1.2561 | 0.1272 | 0.1803 | 0.9634 | residuals | ||||
0.10 | 1.00 | NC | 0.0235 | 0.0771 | 1.1685 | 0.0188 | 0.0610 | 0.9324 | - | - | |
C | 0.0351 | 4.7013 | 1.4334 | 0.0275 | 4.6231 | 1.1320 | - | - | |||
LI | 0.0341 | 2.0559 | 1.2754 | 0.0242 | 1.3978 | 1.0019 | Time series | 94% | 68% | ||
RKF | 0.0299 | 0.2428 | 1.2191 | 0.0227 | 0.2255 | 0.9664 | Standardized | 89% | 100% | ||
naKF | 0.0423 | 0.1168 | 1.1950 | 0.0254 | 0.0976 | 0.9436 | residuals | ||||
0.05 | 0.10 | NC | 0.0086 | 0.0100 | 0.4466 | 0.0068 | 0.0079 | 0.3561 | - | - | |
C | 0.0125 | 0.4614 | 0.5647 | 0.0098 | 0.4517 | 0.4417 | - | - | |||
LI | 0.0119 | 0.3605 | 0.5628 | 0.0094 | 0.3348 | 0.4290 | Time series | 81% | 23% | ||
RKF | 0.0116 | 0.0722 | 0.4893 | 0.0088 | 0.0702 | 0.3854 | Standardized | 84% | 99% | ||
naKF | 0.0104 | 0.0129 | 0.4617 | 0.0077 | 0.0106 | 0.3644 | residuals |
Estimate | (SE) | Estimate | (SE) | |||
---|---|---|---|---|---|---|
TS1 | Non-treated | 2.7103 | (0.6932) | 4.8341 | (0.5760) | −192.8515 |
LI | 2.6438 | (0.6735) | 4.6330 | (0.5578) | −190.1958 | |
RKF | 2.9174 | (0.6983) | 4.0890 | (0.5653) | −185.7237 | |
naKF | 3.0671 | (0.7237) | 3.8387 | (0.5844) | −183.6041 | |
TS2 | Non-treated | 1.6446 | (0.8822) | 9.3662 | (0.9774) | −170.7793 |
LI | 1.2913 | (0.7006) | 7.8502 | (0.8092) | −161.2743 | |
RKF | 1.1704 | (0.6859) | 7.7522 | (0.7959) | −160.3136 | |
naKF | 1.0999 | (0.6905) | 7.7692 | (0.7967) | −158.2455 | |
TS3 | Non-treated | 0.0623 | (0.0058) | 0.1054 | (0.0046) | 1096.8770 |
LI | 0.0597 | (0.0057) | 0.0971 | (0.0045) | 1149.8800 | |
RKF | 0.0614 | (0.0055) | 0.1000 | (0.0044) | 1129.9350 | |
naKF | 0.0601 | (0.0055) | 0.1020 | (0.0044) | 1124.1500 |
Non-Treated | LI | RKF | naKF | ||||||
---|---|---|---|---|---|---|---|---|---|
RMSE | MAE | RMSE | MAE | RMSE | MAE | RMSE | MAE | ||
TS1 | vs. | 7.0245 | 6.0496 | 7.0087 | 6.0353 | 6.8205 | 5.8609 | 6.7342 | 5.7788 |
Percentage reduction | - | - | 0.22% | 4.14% | 2.90% | 3.12% | 4.13% | 4.48% | |
TS2 | vs. | 11.4091 | 8.1459 | 11.3624 | 8.1456 | 11.2833 | 8.1455 | 11.2249 | 8.1455 |
Percentage reduction | - | - | 0.41% | 0.004% | 1.10% | 0.01% | 1.61% | 0.01% | |
TS3 | vs. | 0.3759 | 0.3231 | 0.3757 | 0.3229 | 0.3756 | 0.3228 | 0.3742 | 0.3213 |
Percentage reduction | - | - | 0.05% | 0.06% | 0.08% | 0.09% | 0.45% | 0.56% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Pereira, F.C.; Gonçalves, A.M.; Costa, M. Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers. Eng. Proc. 2023, 39, 36. https://doi.org/10.3390/engproc2023039036
Pereira FC, Gonçalves AM, Costa M. Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers. Engineering Proceedings. 2023; 39(1):36. https://doi.org/10.3390/engproc2023039036
Chicago/Turabian StylePereira, Fernanda Catarina, Arminda Manuela Gonçalves, and Marco Costa. 2023. "Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers" Engineering Proceedings 39, no. 1: 36. https://doi.org/10.3390/engproc2023039036
APA StylePereira, F. C., Gonçalves, A. M., & Costa, M. (2023). Improving Predictive Accuracy in the Context of Dynamic Modelling of Non-Stationary Time Series with Outliers. Engineering Proceedings, 39(1), 36. https://doi.org/10.3390/engproc2023039036