Proposed Long Short-Term Memory Model Utilizing Multiple Strands for Enhanced Forecasting and Classification of Sensory Measurements
Abstract
:1. Introduction
2. Materials and Methods
2.1. Proposed Stranded LSTM Model
- For classification: 70–20–10% (training, validation, and evaluation).
- For forecasting: 70–10–10–10% (training, validation, evaluation, and testing).
2.2. Metrics Used
- Low jitter cases (): This scenario is characterized by minimal fluctuations in loss, indicating that the model is learning or forecasting smoothly without erratic variations.
- Moderate jitter cases (): In this case, some fluctuations in loss are present, but they do not substantially affect the training process. This level of jitter is typical for larger datasets with shuffled data batches. However, moderate jitter during the evaluation phase may signify either the presence of noise in the dataset or initial signs of a model that is not converging adequately.
- High jitter cases (): This scenario suggests a highly unstable model. If the validation jitter is low while the evaluation jitter is high, it strongly indicates overfitting. Should the difference between these values exceed 0.1, it points to an underfitting issue. evaluation values exceeding 0.07 in most measurement forecasting contexts indicate poor model convergence and instability.
2.3. Stranded LSTM Selection Process
- Least loss selection strategy: This is a classless strategy, and as the name implies, the LSTM strand i that has the minimum value of across all model strands, as expressed by Equation (9).
- Weighted least loss selection strategy: This strategy takes into account all the representative strands of the four classes as calculated in the using Equation (8), as well as their corresponding loss-jitter values expressed as follows: . That is, after calculating vector weight indices where each vector corresponds to per-class selected output strand, weighted inferences using the selected class vectors as multiplicative terms are aggregated to provide the final inference result of a classification or a forecast scenario.Let be the number of the selected element in each class c of the set , and its corresponding loss-jitter value is denoted as . The total sum of the selected elements jitter loss values . The normalized weight of each class element in c is calculated using Equation (10)Let T be the matrix representing the n time forecast values of m attributes from a selected class strand. We can denote the elements of T as , where represents the time index, and represents the attribute index.Let W be a weight matrix. We can denote the elements of W as , where represents the class index and represents the selected strand index. In this strategy, each element of T is multiplied by each W matrix element. This results in four separate matrices. Let us call these resulting matrices , where . It can be represented in matrix notation using the Equation (11) element-wise product denoted by ⊙ symbol.This selection strategy involves choosing at least one strand from various classes, including LSTM models with different cell depths. Incorporating strands from multiple depth classes helps mitigate the vanishing gradient issue, if it arises, and allows for a more comprehensive understanding of the best learners within each class. However, this strategy may also introduce inaccuracies due to less precise strands contributing to the final output. Unlike the least-loss selection strategy, this approach does not require frequent re-evaluation periods. Consequently, it can perform better on large datasets, providing more accurate inferences compared to the least element selection process.
- Fuzzy least loss selection strategy: In this fuzzy selection process, instead of directly using loss and jitter loss values to calculate weights, fuzzy sets are defined for them. At first, for each class c, the total jitter loss is computed based on Equations (3) and (4). Then, fuzzy sets are mapped into each class’s loss and total jitter loss . Three fuzzy sets of high, medium, and low loss are used for loss and jitter loss accordingly. Then, for all classes and strands, the , , , and values are calculated, and for each class, the mean class strand loss and total jitter loss are calculated accordingly. The membership functions that define the degree of belonging to low, medium, and high loss sets are based on the following set of Equations (13).The medium membership function has a triangular shape peaking at , ensuring a smooth transition between fuzzy sets. Similarly, this applies for total jitter according to Equations (14).The medium membership function to total jitter is also a triangular function centered at . The fuzzy weight probabilistic metric for each class c () is defined according to Equation (15).Equation 15 ensures that model classes with lower loss and total jitter values receive higher weight values (closer to 1), while model classes with higher loss and total jitter values obtain lower values (close to 0). Finally, given four forecast matrices , with one strand representative for each class selected from the per class strand that maintains a maximum min–max normalized weight value——that is also inferred in forecasted timesteps and m attributes—. The final forecast values are calculated according to Equation (16).Using a fuzzy strategy provides a smoother handling of uncertainty than hard threshold values and a more robust weight calculation based on fuzzy logic. This strategy applies smooth fuzzy transitions between loss and jitter handled by fuzzy probabilities rather than a deterministic equation described by the weighted inferences aggregation process of Equation (9), as proposed by the weighted balanced selection strategy.
3. Experimental Scenarios and Results
3.1. Classification Scenario
3.2. Forecasting Scenario
4. Discussion of the Results
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
ANN | Artificial Neural Networks or NNs |
DL | Deep Learning |
DTs | Decision Trees |
EWMA | Exponentially Weighted Moving Average |
GRU | Gated Recurrent Unit model |
LSTM | Long Short-Term Memory model |
Mean Absolute Error | |
Mean Absolute Percentage Error | |
Mean Squared Error | |
ML | Machine Learning |
NN | Neural Network model |
Root Mean Squared Error | |
RNN | Recurrent Neural Network model |
SVM | Support Vector Machines |
References
- Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.B.M.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
- Ongsulee, P. Artificial intelligence, machine learning and deep learning. In Proceedings of the 2017 15th International Conference on ICT and Knowledge Engineering (ICT&KE), Bangkok, Thailand,, 22–24 August 2017; pp. 1–6, ISSN 2157-099X. [Google Scholar] [CrossRef]
- Makridakis, S.; Spiliotis, E.; Assimakopoulos, V.; Semenoglou, A.A.; Mulder, G.; Nikolopoulos, K. Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward. J. Oper. Res. Soc. 2023, 74, 840–859. [Google Scholar] [CrossRef]
- Dargan, S.; Kumar, M.; Ayyagari, M.R.; Kumar, G. A Survey of Deep Learning and Its Applications: A New Paradigm to Machine Learning. Arch. Comput. Methods Eng. 2020, 27, 1071–1092. [Google Scholar] [CrossRef]
- Makarynskyy, O. Improving wave predictions with artificial neural networks. Ocean. Eng. 2004, 31, 709–724. [Google Scholar] [CrossRef]
- Chen, L.; Lai, X. Comparison between ARIMA and ANN Models Used in Short-Term Wind Speed Forecasting. In Proceedings of the 2011 Asia-Pacific Power and Energy Engineering Conference, Wuhan, China, 25–28 March 2011; Volume 1, pp. 1–4. [Google Scholar] [CrossRef]
- Tarmanini, C.; Sarma, N.; Gezegin, C.; Ozgonenel, O. Short term load forecasting based on ARIMA and ANN approaches. Energy Rep. 2023, 9, 550–557. [Google Scholar] [CrossRef]
- Lim, B.; Zohren, S. Time-series forecasting with deep learning: A survey. Philos. Trans. R. Soc. 2021, 379, 20200209. [Google Scholar] [CrossRef]
- Ao, S.I.; Gelman, L.; Karimi, H.R.; Tiboni, M. Advances in Machine Learning for Sensing and Condition Monitoring. Appl. Sci. 2022, 12, 12392. [Google Scholar] [CrossRef]
- Tanveer, M.; Sajid, M.; Akhtar, M.; Quadir, A.; Goel, T.; Aimen, A.; Mitra, S.; Zhang, Y.D.; Lin, C.T.; Ser, J.D. Fuzzy Deep Learning for the Diagnosis of Alzheimer’s Disease: Approaches and Challenges. IEEE Trans. Fuzzy Syst. 2024, 32, 5477–5492. [Google Scholar] [CrossRef]
- Kontogiannis, S.; Koundouras, S.; Pikridas, C. Proposed Fuzzy-Stranded-Neural Network Model That Utilizes IoT Plant-Level Sensory Monitoring and Distributed Services for the Early Detection of Downy Mildew in Viticulture. Computers 2024, 13, 63. [Google Scholar] [CrossRef]
- Neu, D.A.; Lahann, J.; Fettke, P. A systematic literature review on state-of-the-art deep learning methods for process prediction. Artif. Intell. Rev. 2022, 55, 801–827. [Google Scholar] [CrossRef]
- Kontogiannis, S.; Gkamas, T.; Pikridas, C. Deep Learning Stranded Neural Network Model for the Detection of Sensory Triggered Events. Algorithms 2023, 16, 202. [Google Scholar] [CrossRef]
- Namuduri, S.; Narayanan, B.N.; Davuluru, V.S.P.; Burton, L.; Bhansali, S. Review—Deep Learning Methods for Sensor Based Predictive Maintenance and Future Perspectives for Electrochemical Sensors. J. Electrochem. Soc. 2020, 167, 037552. [Google Scholar] [CrossRef]
- Mercaldo, F.; Brunese, L.; Martinelli, F.; Santone, A.; Cesarelli, M. Generative Adversarial Networks in Retinal Image Classification. Appl. Sci. 2023, 13, 10433. [Google Scholar] [CrossRef]
- Apaydin, H.; Feizi, H.; Sattari, M.T.; Colak, M.S.; Shamshirband, S.; Chau, K.W. Comparative Analysis of Recurrent Neural Network Architectures for Reservoir Inflow Forecasting. Water 2020, 12, 1500. [Google Scholar] [CrossRef]
- Perumal, T.; Mustapha, N.; Mohamed, R.; Shiri, F.M. A Comprehensive Overview and Comparative Analysis on Deep Learning Models. J. Artif. Intell. 2024, 6, 301–360. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
- Cho, K.; Merrienboer, B.v.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Trischler, A.; Ye, Z.; Yuan, X.; Suleman, K. Natural Language Comprehension with the EpiReader. arXiv 2016, arXiv:11606.02270. [Google Scholar] [CrossRef]
- Fortunato, M.; Blundell, C.; Vinyals, O. Bayesian Recurrent Neural Networks. arXiv 2019, arXiv:1704.02798. [Google Scholar] [CrossRef]
- McDermott, P.L.; Wikle, C.K. Bayesian Recurrent Neural Network Models for Forecasting and Quantifying Uncertainty in Spatial-Temporal Data. Entropy 2019, 21, 184. [Google Scholar] [CrossRef]
- Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar] [CrossRef]
- Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
- Yin, W.; Kann, K.; Yu, M.; Schütze, H. Comparative Study of CNN and RNN for Natural Language Processing. arXiv 2017, arXiv:1702.01923. [Google Scholar] [CrossRef]
- Chen, J.X.; Jiang, D.M.; Zhang, Y.N. A Hierarchical Bidirectional GRU Model with Attention for EEG-Based Emotion Classification. IEEE Access 2019, 7, 118530–118540. [Google Scholar] [CrossRef]
- She, D.; Jia, M. A BiGRU method for remaining useful life prediction of machinery. Measurement 2021, 167, 108277. [Google Scholar] [CrossRef]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. A Comparative Analysis of Forecasting Financial Time Series Using ARIMA, LSTM, and BiLSTM. arXiv 2019, arXiv:1911.09512. [Google Scholar] [CrossRef]
- Xu, G.; Meng, Y.; Qiu, X.; Yu, Z.; Wu, X. Sentiment Analysis of Comment Texts Based on BiLSTM. IEEE Access 2019, 7, 51522–51532. [Google Scholar] [CrossRef]
- Cao, B.; Li, C.; Song, Y.; Fan, X. Network Intrusion Detection Technology Based on Convolutional Neural Network and BiGRU. Comput. Intell. Neurosci. 2022, 2022, 1942847. [Google Scholar] [CrossRef]
- Xu, L.; Xu, W.; Cui, Q.; Li, M.; Luo, B.; Tang, Y. Deep Heuristic Evolutionary Regression Model Based on the Fusion of BiGRU and BiLSTM. Cogn. Comput. 2023, 15, 1672–1686. [Google Scholar] [CrossRef]
- Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
- Younesi, A.; Ansari, M.; Fazli, M.; Ejlali, A.; Shafique, M.; Henkel, J. A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends. IEEE Access 2024, 12, 41180–41218. [Google Scholar] [CrossRef]
- Ige, A.O.; Sibiya, M. State-of-the-Art in 1D Convolutional Neural Networks: A Survey. IEEE Access 2024, 12, 144082–144105. [Google Scholar] [CrossRef]
- Choi, J.G.; Kim, D.C.; Chung, M.; Lim, S.; Park, H.W. Multimodal 1D CNN for delamination prediction in CFRP drilling process with industrial robots. Comput. Ind. Eng. 2024, 190, 110074. [Google Scholar] [CrossRef]
- Kontogiannis, S. Beehive Smart Detector Device for the Detection of Critical Conditions That Utilize Edge Device Computations and Deep Learning Inferences. Sensors 2024, 24, 5444. [Google Scholar] [CrossRef]
- Ahmadzadeh, M.; Zahrai, S.M.; Bitaraf, M. An integrated deep neural network model combining 1D CNN and LSTM for structural health monitoring utilizing multisensor time-series data. Struct. Health Monit. 2025, 24, 447–465. [Google Scholar] [CrossRef]
- Athar, A.; Mozumder, M.A.I.; Abdullah; Ali, S.; Kim, H.C. Deep learning-based anomaly detection using one-dimensional convolutional neural networks (1D CNN) in machine centers (MCT) and computer numerical control (CNC) machines. Peerj Comput. Sci. 2024, 10, e2389. [Google Scholar] [CrossRef]
- Jacome-Galarza, L.R.; Realpe-Robalino, M.A.; Paillacho-Corredores, J.; Benavides-Maldonado, J.L. Time Series in Sensor Data Using State-of-the-Art Deep Learning Approaches: A Systematic Literature Review. In Communication, Smart Technologies and Innovation for Society; Rocha, A., Lopez-Lopez, P.C., Salgado-Guerrero, J.P., Eds.; Springer: Singapore, 2022; pp. 503–514. [Google Scholar] [CrossRef]
- Dao, F.; Zeng, Y.; Qian, J. Fault diagnosis of hydro-turbine via the incorporation of bayesian algorithm optimized CNN-LSTM neural network. Energy 2024, 290, 130326. [Google Scholar] [CrossRef]
- Yamak, P.T.; Yujian, L.; Gadosey, P.K. A Comparison between ARIMA, LSTM, and GRU for Time Series Forecasting. In Proceedings of the 2019 2nd International Conference on Algorithms, Computing and Artificial Intelligence, ACAI ’19, New York, NY, USA, 10–16 October 2020; pp. 49–55. [Google Scholar] [CrossRef]
- Ozdemir, A.C.; Buluş, K.; Zor, K. Medium- to long-term nickel price forecasting using LSTM and GRU networks. Resour. Policy 2022, 78, 102906. [Google Scholar] [CrossRef]
- Zhou, T.; Wang, F.; Yang, Z. Comparative Analysis of ANN and SVM Models Combined with Wavelet Preprocess for Groundwater Depth Prediction. Water 2017, 9, 781. [Google Scholar] [CrossRef]
- Anjoy, P.; Paul, R.K. Comparative performance of wavelet-based neural network approaches. Neural Comput. Appl. 2019, 31, 3443–3453. [Google Scholar] [CrossRef]
- Zhu, Y.; Wang, M.; Yin, X.; Zhang, J.; Meijering, E.; Hu, J. Deep Learning in Diverse Intelligent Sensor Based Systems. Sensors 2023, 23, 62. [Google Scholar] [CrossRef]
- Yu, H.; Wang, Z.; Xie, Y.; Wang, G. A multi-granularity hierarchical network for long- and short-term forecasting on multivariate time series data. Appl. Soft Comput. 2024, 157, 111537. [Google Scholar] [CrossRef]
- Elsworth, S.; Güttel, S. Time Series Forecasting Using LSTM Networks: A Symbolic Approach. arXiv 2020, arXiv:2003.05672. [Google Scholar] [CrossRef]
- Vijayaprabakaran, K.; Sathiyamurthy, K. Towards activation function search for long short-term model network: A differential evolution based approach. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 2637–2650. [Google Scholar] [CrossRef]
- Terven, J.; Cordova-Esparza, D.M.; Ramirez-Pedraza, A.; Chavez-Urbiola, E.A.; Romero-Gonzalez, J.A. Loss Functions and Metrics in Deep Learning. arXiv 2024, arXiv:2307.02694. [Google Scholar] [CrossRef]
- Kontogiannis, S.; Kokkonis, G. Proposed Fuzzy Real-Time HaPticS Protocol Carrying Haptic Data and Multisensory Streams. Int. J. Comput. Commun. Control 2020, 15, 4. [Google Scholar] [CrossRef]
- Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar] [CrossRef]
- Chamatidis, I.; Tzanes, G.; Istrati, D.; Lagaros, N.D.; Stamou, A. Short-Term Forecasting of Rainfall Using Sequentially Deep LSTM Networks: A Case Study on a Semi-Arid Region. Environ. Sci. Proc. 2023, 26, 157. [Google Scholar] [CrossRef]
- Dancer, W.S.; Tibbitts, T.W. Impact of Nowcasting on the Production and Processing of Agricultural Crops; Technical Report; NASA: Washington, DC, USA, 1973. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar] [CrossRef]
- Bengio, Y.; Simard, P.; Frasconi, P. Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
- Patel, M.; Patel, A.; Ghosh, D.R. Precipitation Nowcasting: Leveraging bidirectional LSTM and 1D CNN. arXiv 2018, arXiv:1810.10485. [Google Scholar] [CrossRef]
- Suebsombut, P.; Sekhari, A.; Sureephong, P.; Belhi, A.; Bouras, A. Field Data Forecasting Using LSTM and Bi-LSTM Approaches. Appl. Sci. 2021, 11, 11820. [Google Scholar] [CrossRef]
Class | Depths | Timescale Coverage | Computation Intensity | Usage | Range Factor | Coverage |
---|---|---|---|---|---|---|
1 | 8, 16, 24, 32 | Ultra short | Low (edge) | IoT sensors, audio signals | 4.0 | 4× wide timespan |
Inter class gap of cells | ||||||
2 | 48, 64, 80, 96 | Short term | Medium (edge) | Video, stock data | 2.0 | 2× narrower than base |
Inter class gap of cells | ||||||
3 | 128, 160, 192, 224 | Medium term | High | ECG | 1.75 | 25% narrower than class 2 |
Inter class gap of cells | ||||||
4 | 288, 352, 416, 480 | Long term | Extreme | Genomics, climate | 1.67 | 8% narrower than class 3 |
Condition | MSE Loss | Total Jitter | WPM Value | Meaning |
---|---|---|---|---|
Best Case (Ideal) | Close to 0 | Close to 0 | Close to 1 | Excellent classification or forecasting |
Worst Case | ||||
(Loss and Jitter) | Close to 1 | Close to 1 | Close to 0 | Bad classifier or predictor 1 |
Class | State | Range of Temperature |
---|---|---|
0 | Normal | |
1 | Low-Risk | |
2 | Caution | |
3 | Critical | |
4 | Danger |
Selected Strand | Results | ||||||
---|---|---|---|---|---|---|---|
Model | Timestep (dth) | C1 | C2 | C3 | C4 | Loss | Accuracy |
LSTM | 200 | - | - | - | - | ||
LSTM | 200 | - | - | - | - | ||
Stranded LSTMleast loss selection | 200 | 32 | 32 | 32 | 32 | 0.082 | 0.961 |
Stranded LSTMweighted least loss selection | 200 | 32 | 48 | 128 | 288 | 0.075 | 0.967 |
Stranded LSTMfuzzy least loss selection | 200 | 32 | 48 | 128 | 288 | 0.053 | 0.974 |
Model | Timestep (dth) | Selected Strand | Results Loss | |||
---|---|---|---|---|---|---|
C1 | C2 | C3 | C4 | |||
wc | wc | wc | wc | |||
Stranded LSTMleast loss selection | 240 | 480 | 480 | 480 | 480 | 0.075 |
- | - | - | - | |||
Stranded LSTMweighted least loss selection | 240 | 32 | 96 | 192 | 416 | 0.071 |
0.04 | 0.047 | 0.24 | 0.673 | |||
Stranded LSTMfuzzy least loss selection | 240 | 32 | 96 | 224 | 480 | 0.0697 |
0.02 | 0.032 | 0.15 | 0.798 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kontogiannis, S.; Kokkonis, G.; Pikridas, C. Proposed Long Short-Term Memory Model Utilizing Multiple Strands for Enhanced Forecasting and Classification of Sensory Measurements. Mathematics 2025, 13, 1263. https://doi.org/10.3390/math13081263
Kontogiannis S, Kokkonis G, Pikridas C. Proposed Long Short-Term Memory Model Utilizing Multiple Strands for Enhanced Forecasting and Classification of Sensory Measurements. Mathematics. 2025; 13(8):1263. https://doi.org/10.3390/math13081263
Chicago/Turabian StyleKontogiannis, Sotirios, George Kokkonis, and Christos Pikridas. 2025. "Proposed Long Short-Term Memory Model Utilizing Multiple Strands for Enhanced Forecasting and Classification of Sensory Measurements" Mathematics 13, no. 8: 1263. https://doi.org/10.3390/math13081263
APA StyleKontogiannis, S., Kokkonis, G., & Pikridas, C. (2025). Proposed Long Short-Term Memory Model Utilizing Multiple Strands for Enhanced Forecasting and Classification of Sensory Measurements. Mathematics, 13(8), 1263. https://doi.org/10.3390/math13081263