An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction

Cassano, Fabio; Crespino, Anna Maria; Lazoi, Mariangela; Specchia, Giorgia; Spennato, Alessandra

doi:10.3390/app15074013

Open AccessArticle

An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction

by

Fabio Cassano

¹

,

Anna Maria Crespino

²

,

Mariangela Lazoi

^2,*

,

Giorgia Specchia

²

and

Alessandra Spennato

²

¹

Engineering, Ingegneria Informatica S.p.A., 00144 Roma, Italy

²

Department of Innovation Engineering, University of Salento, 73100 Lecce, Italy

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(7), 4013; https://doi.org/10.3390/app15074013

Submission received: 20 February 2025 / Revised: 28 March 2025 / Accepted: 2 April 2025 / Published: 5 April 2025

(This article belongs to the Special Issue Innovative Applications of Big Data and Cloud Computing, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

:

Early warning systems (EWSs) are crucial for optimising predictive maintenance strategies, especially in the industrial sector, where machine failures often cause significant downtime and economic losses. This research details the creation and evaluation of an EWS that incorporates deep learning methods, particularly using Long Short-Term Memory (LSTM) networks enhanced with attention layers to predict critical machine faults. The proposed system is designed to process time-series data collected from an industrial printing machine’s embosser component, identifying error patterns that could lead to operational disruptions. The dataset was preprocessed through feature selection, normalisation, and time-series transformation. A multi-model classification strategy was adopted, with each LSTM-based model trained to detect a specific class of frequent errors. Experimental results show that the system can predict failure events up to 10 time units in advance, with the best-performing model achieving an AUROC of 0.93 and recall above 90%. Results indicate that the proposed approach successfully predicts failure events, demonstrating the potential of EWSs powered by deep learning for enhancing predictive maintenance strategies. By integrating artificial intelligence with real-time monitoring, this study highlights how intelligent EWSs can improve industrial efficiency, reduce unplanned downtime, and optimise maintenance operations.

Keywords:

early warning system; time-series analysis; LSTM; predictive maintenance; big data analytics

1. Introduction

The advent of Industry 4.0 [1] was a revolutionary event [2]. The core and innovative element was the use of data to transition from traditional programmed and control-based processes and systems to intelligent processes and systems that can predict the behaviour of various stakeholders in the industry value chain (e.g., customers, operators, machines, etc.) and proactively adjust their operations at different levels [3]. It aims to maximise production efficiency by achieving self-awareness, self-prediction and self-maintenance [4,5].

A key point of Industry 4.0 is the interaction between production and maintenance planning [6]. The term maintenance planning “identifies the set of technical, administrative, and managerial activities carried out during the life cycle of an item, with the aim of maintaining or restoring its functionality” [7]. In this context, early warning systems (EWSs) have emerged as valuable tools to anticipate unexpected disruptions and support maintenance strategies [8]. Indeed, EWSs are designed to provide timely alerts and notifications about potential risks, hazards, or critical events [9,10,11].

These systems employ real-time data from a variety of sources, including sensors, surveillance cameras, and environmental monitors, to identify discrepancies from standard conditions [12]. They employ statistical analysis, machine learning, and pattern recognition algorithms to identify unusual patterns or anomalies that might indicate the occurrence of an undesired event [13,14]. EWSs are typically designed with predefined thresholds or rules that trigger alerts when specific conditions are met [15].

The benefits of EWSs are considerable [16,17,18,19]. While they are widely used in disaster management and environmental monitoring [11,20,21], their versatility has enabled their adoption across multiple domains including manufacturing, healthcare, finance, and business management [11]. In the healthcare sector, for instance, the EWS supports clinical assessment to detect acute deterioration at an early stage in order to prevent or reduce adverse events such as unexpected cardiopulmonary arrests, intensive care unit admissions and deaths [22]. In the manufacturing context, EWSs are used to detect early signs of possible interruptions in the production cycle, caused either by machine downtimes or production errors. Targeted actions on product quality control enable EWSs to optimise both quality and productivity [23]. In the financial sector, EWSs are instrumental in anticipating the onset of financial crises in individual countries [24]. Another research line focusing on EWSs leads to identifying early warning signals for decision-makers. These signals can be derived from statistical data, artificial intelligence (AI), or cognitive behavioural techniques. The EWSs in question are referred to by the term “managerial early warning systems” (MEWSs) [25]. Although MEWSs can automate the detection of early warning signals, managers still determine how to respond to them. The purpose of these studies is not automation, but rather helping managers develop strategies [25,26,27].

Across all contexts, EWSs today are referred to as intelligent because they employ the technologies inherent to warning systems, such as sensors, and the technologies inherent to artificial intelligence in a synergistic approach; although there is no strict definition of an intelligent early warning system (IEWS) in the literature, there are numerous examples of implementation [28,29,30,31,32]. The IEWS is a significant advancement in condition-based monitoring systems. It uses big data technology and intelligent algorithms to continuously monitor equipment in various conditions. The system can predict future signal changes, detect faults in the creep process before they occur, diagnose equipment faults, and calculate the equipment’s remaining useful life [33]. The integration of EWSs and AI allows companies to move from reactive to predictive maintenance, anticipating problems before they occur and optimising the planning of maintenance activities.

Despite their growing adoption, EWSs still face critical limitations. In particular, in the industrial context, one of the main issues lies in the event collection phase: systems must handle large volumes of heterogeneous data streams, often affected by noise and false positives, which compromise the effectiveness of subsequent analyses. Moreover, correlating events from disparate sources raises complex challenges related to semantic integration and the timely detection of rare or weak signals, all while ensuring security, scalability, and adaptability across different production environments. Lastly, activities such as anomaly forecasting, signal prioritisation, and visualisation demand advanced proactive solutions, which remain largely underexplored in the current literature [34].

These limitations highlight the need for more robust and flexible approaches capable of improving predictive accuracy, reducing response times, and enhancing the ability to distinguish relevant events from background noise. In this context, the research question that is addressed in this study is as follows: How can an intelligent early warning system, based on deep learning techniques, improve predictive accuracy and real-time event correlation from heterogeneous sources in industrial contexts?

To answer this question, the paper proposes an innovative approach for designing and developing an intelligent early warning system. Specifically, it presents the implementation and validation of a predictive model based on Long Short-Term Memory (LSTM) neural networks, trained and tested on real-world industrial data. The system is designed to accurately predict the most frequent error classes, thereby enhancing equipment reliability and improving the planning of maintenance activities.

The activities described in this study are part of the SCREAM research project, which aims to develop a platform capable of providing companies with strategic insights into their production processes. The goal is to address both product development dynamics and production-related elements, such as infrastructure and machinery, that are critical for optimising every stage of the supply chain. To achieve this, data from industrial machinery were analysed, with a specific focus on performance metrics and malfunction events. The analysis focuses on the prediction of high-frequency error classes to improve overall system reliability and enable more effective preventive maintenance strategies.

The rest of the paper is organised as follows. Section 2 presents the theoretical field on which this study is based. Section 3 explains the research method adopted. Section 4 contextualises the case study and the results reached. Section 5 discusses the results found, including the limitations of the research, implications and potential future research. Finally, Section 6 draws the conclusions, summarising the approach.

2. Theoretical Background

2.1. Big Data and Smart Factories

The growing complexity of manufacturing processes and the continuous expansion of data volumes present significant opportunities for enhancing decision-making. When dealing with vast, heterogeneous, and high-velocity data, we refer to the concept of Big Data. Extracting meaningful insights from Big Data requires a combination of models, algorithms, software, hardware, and technologies designed to improve the accuracy and reliability of analytical results [35]. However, working with Big Data comes with challenges at multiple levels, including data capture, storage, retrieval, sharing, analysis, management, and visualisation. Data scientists face significant hurdles in efficiently handling Big Data to extract reliable insights while optimising costs. Traditional data management typically follows five key steps: cleaning, aggregation, encoding, storage, and access. However, the unique characteristics of Big Data, defined by the five Vs (Volume, Velocity, Variety, Veracity, and Value) [36], complicate these processes, requiring distributed environments and advanced analytics solutions. One of the main challenges is determining how to clean and validate vast datasets to ensure that only relevant and reliable information is retained [37].

In the context of Industry 4.0, Big Data plays a pivotal role in improving operational efficiency, enhancing product quality, and driving overall productivity [38]. However, as noted by [39], many organisations struggle to translate their Big Data investments into tangible business outcomes. This often stems from systemic weaknesses in data management frameworks and the absence of structured methodologies for capturing and leveraging Big Data Value effectively. To fully exploit the potential of Big Data in Industry 4.0, companies must adopt not only advanced analytical techniques but also robust data governance and integration strategies [39]. As highlighted [40], many manufacturing companies face challenges in assessing their Big Data maturity, which hinders their ability to fully exploit data-driven strategies for operational improvement. To overcome these limitations, it is essential to develop robust frameworks that enable real-time monitoring and predictive capabilities.

EWSs fit naturally into the Big Data paradigm as they rely on large volumes of diverse data and require real-time information processing to be effective. These systems continuously analyse data streams from multiple sources, detecting patterns and anomalies that signal potential disruptions. Their ability to integrate and process heterogeneous, high-frequency data makes them a crucial component in modern industrial environments, where timely decision-making is essential.

2.2. Early Warning System

The monitoring system plays a crucial role in managing and controlling manufacturing processes. Its main applications include fault prediction, production improvement, cost reduction, and the development of early warning systems (EWSs). An effective EWS must ensure the timely collection of relevant information, enabling stakeholders to take preventive or corrective actions to minimise risks and optimise operational responses.

The key characteristics of an efficient EWS include the following [41,42]:

Monitoring: Continuous acquisition of critical data through advanced instrumentation and sensing techniques.
Data analysis: Use of statistical and machine learning algorithms to identify anomalous patterns and predictive signals.
Alerting: Implementation of automated notification systems for the timely communication of relevant information.
Response: Planning intervention strategies to mitigate the impact of adverse events.
Risk assessment: Preventive analysis of system vulnerabilities to develop proactive mitigation strategies.

The integration of EWSs in the manufacturing environment has been widely studied in the literature, with a particular focus on the digitalisation of decision-making processes and the minimisation of human intervention in anomaly detection and management [43,44,45,46]. However, despite significant advancements, complete automation of monitoring remains an ongoing challenge.

Several approaches have been proposed to improve the effectiveness of EWSs. Classical statistical methods, such as DFA (Detrended Fluctuation Analysis) and ACF (AutoCorrelation Function), have proven effective in detecting anomalies in electronic components and industrial systems [47,48]. Invariant-based identification models and data-driven techniques have been used for monitoring lithium-ion batteries and heat pumps [49,50]. Additionally, fuzzy logic-based approaches have been developed to enhance predictive maintenance strategies [51].

With the evolution of artificial intelligence techniques, machine learning and, in particular, deep learning (DL) have taken on a central role in EWSs. Deep neural networks are distinguished by their ability to learn complex and non-linear patterns, making them particularly suitable for analysing multidimensional and temporal data [52]. Specifically, recurrent neural networks (RNNs) and Long Short-Term Memory (LSTM) networks have proven effective in predicting rare events and analysing time series, thanks to their ability to retain long-term dependencies [53].

Several studies have applied LSTM networks to EWSs in the manufacturing sector with promising results. In [31], an early warning system based on an LSTM–Bayes model was implemented for the predictive maintenance of steam turbines in the nuclear energy sector. Similarly, in [32], an EWS for monitoring axle box bearings in high-speed trains demonstrated advantages in analysing anomalies that could not be detected with traditional sensors. A combined approach integrating LSTM networks with Bayesian inference for anomaly detection in machinery was proposed in [54].

Table 1 provides a comparative analysis of different models adopted for early warning systems (EWSs), highlighting their applications and key findings. This overview helps to understand technological advancements and methodologies used to enhance monitoring, predictive maintenance, and risk management across various industrial sectors.

Compared to traditional methods, LSTM networks offer a significant advantage in handling sequential data, improving the predictive capabilities of EWSs and reducing false positives. However, some challenges remain, including the need for large volumes of training data and the computational complexity associated with implementing such models.

In light of these considerations, this study proposes the development of an EWS based on LSTM networks, evaluating their performance compared to other available approaches. The following section will present a case study demonstrating how the theoretical capabilities of LSTM networks translate into tangible benefits for predictive monitoring in the manufacturing sector.

3. Research Design

The design of this research was informed by both theoretical considerations and practical constraints derived from the industrial context under investigation. The primary objective was to develop an early warning system capable of predicting critical machine faults based on time-series data collected from industrial sensors. The application domain concerned a legacy industrial printing machine, specifically a paper embosser, owned by Sofidel, a leading company in Tuscany (IT) for tissue paper products. This machine is equipped with a wired industrial sensor array installed by the manufacturer. Through its Application Programming Interface (API), access to the floating-point data is guaranteed. This machine lacks a system to alert or advise the operator of potential issues before they occur.

Based on the literature and the preliminary data analysis (see Section 2 and Section 4), Long Short-Term Memory (LSTM) networks were identified as the most appropriate deep learning architecture.

LSTM networks, when combined with attention mechanisms, provide significant advantages in modelling time-series data, particularly when using a sliding-window approach. Time-series data often exhibit irregular patterns, non-stationarity, and long-range dependencies, which require models capable of effectively capturing both short- and long-term relationships [55]. LSTMs are specifically designed to address the limitations of standard recurrent neural networks (RNNs), notably the vanishing gradient problem, by incorporating gating mechanisms that allow information to persist across long sequences. This capability is crucial for time-series data, where dependencies may span across multiple time steps. Furthermore, LSTMs efficiently handle sequences of variable lengths, making them well suited for applications involving sliding window frames. The sliding window approach segments the time series into overlapping sub-sequences, allowing the model to learn meaningful temporal patterns and generalise better to unseen data [56].

The integration of attention mechanisms with LSTM networks enhances their ability to selectively focus on the most informative parts of the input sequence. Attention layers assign different weights to each time step, ensuring that the model prioritises essential information while downplaying less relevant data points. This is particularly beneficial in scenarios where the significance of past observations varies over time.

Although CNNs have been successfully applied to time-series analysis, they primarily rely on local receptive fields and weight-sharing mechanisms, which make them more suitable for tasks involving spatial dependencies rather than long-term temporal relationships. CNN-based approaches may struggle to model long-term dependencies effectively without additional mechanisms, such as dilated convolutions or temporal pooling layers.

Transformer models, on the other hand, have gained prominence for their ability to capture long-range dependencies without the need for recurrence. However, transformers typically require large amounts of training data to generalise well, and their computational complexity scales quadratically with sequence length due to self-attention operations. Given the constraints of many real-world time-series datasets (where data availability may be limited), LSTM networks with attention provide a more efficient trade-off between performance and computational feasibility [57].

A case study was conducted to describe the activities and results. This method is appropriate for exploring problems and their solutions in real organisational settings [58]. The development of the case study was a collaborative effort between industrial engineers and university researchers, and it was based on the following research phases:

Company selection: The company selected, Sofidel, was deemed a suitable sample [58] due to its involvement in the SCREAM research project, which is funded by the Italian Ministry of Industrial Development.
Dataset Elaboration: The collected dataset has been observed and prepared for the analysis phase.
Exploratory Data Analysis: This is the process of examining data from sensors to gain insights into system behaviour.
Deep Learning implementation: A neural network application is applied as machine learning approach.
Performance Measurements: Performance results are treated in these phases for collecting final feedback on the case study implementation.

4. The Case Study Phases and Results

4.1. Company’s Context Description

The objective of this study is to design and test an algorithm that can work for an EWS that predicts the occurrence of a problem or an error as soon as possible. A particular part of a printing machine from “Sofidel” has been selected as the test scenario of the case study. It is focused on a specific facet of the printing machine, colloquially referred to as the “embosser”. This integral component comprises an assembly of rolls and wheels, collectively responsible for exerting pressure and force on one or more paper substrates. The culmination of these actions yields intricate patterns on the paper. The embosser plays a critical role in the mass production of various paper products and is frequently prone to intricate paper jams. Anticipating those or related errors in advance would be valuable, enabling domain experts to proactively address and prevent potential disruptions, thereby saving time and money.

The machine is equipped with integrated sensors at various levels that enable domain experts to understand whether its mechanical parts are failing or not. The sensor data generated revolve around the measurements of applied forces, pressure distributions, and rotational velocity of the rolls. Their acquisition hinges on the implementation of Internet of Things (IoT) sensors placed on the left, right, and middle parts, in multiple areas, to collect the mentioned physical information with the lowest energy consumption. This cascade of data forms the core of the explored analytical case study, helping to provide a comprehensive understanding and effective management of the embossing process. Once acquired, data are subsequently conveyed to a localised gateway through the Message Queuing Telemetry Transport (MQTT) protocol, known for its efficiency and reliability in data transmission, to an edge server. In the gateway, data are collected for advanced visualisation processes and monitoring, giving domain experts important insights into the production process. In addition, data are stored in a company-internal data storage system to enable engineers to monitor any malfunctions. In parallel, the collected data are marshalled and sent through the Internet as JavaScript Object Notation (JSON) packages to a cloud server for further analysis and elaboration.

4.2. The Dataset Elaboration

Data are collected on the machine’s edge devices and sent to the gateway according to Algorithm 1, which can be summarised as follows: ship the data every time a value among those monitored changes or every 360 s (6 min), no matter if values are changed or not. This approach creates a dataset in the form

< T i m e s t a m p >, < f e a t u r e n a m e >, < V a l u e >

. The total number of sensors on the embosser provides a total number of 34 numerical characteristics, representing each monitored part, and a class that defines the type of operation for a given

T i m e s t a m p

.

This is essentially of two types:

Error, along with a “type”, represents the different nature of the error;
Normal functioning defines that the machine is operating correctly.

The initial analysis of the dataset is based on the conversion and processing of the cloud-stored data into a set of rows corresponding to time–feature pairs. To ensure better data handling procedures, features and classes have been processed around a single

T i m e s t a m p

with a pivoting procedure. There are some caveats when using this approach: as rows are grouped and rotated into a matrix, certain features may become

n u l l

due to incomplete data for the specific

T i m e s t a m p

.

Algorithm 1 Data collection.

Require:: $F_{1} (t) \dots F_{N} (t)$ set of features values acquired over time
1:: procedure Send data
2:: $t \leftarrow 0$
3:: while True do
4:: if $t = 360$ then▹ Every 6 min
5:: bulk send $F_{1} (t) \dots F_{N} (t)$
6:: $t \leftarrow 0$
7:: else
8:: for $i = 1 \dots N$ do
9:: if $F_{i} (t) \neq F_{i} (t - 1)$ then
10:: send $F_{i} (t)$
11:: end if
12:: end for
13:: $t \leftarrow t + 1$
14:: end if
15:: end while
16:: end procedure

To address this situation, Algorithm 1 is used, where it is noticed that the

n u l l

values can be handled through forward and backward filling. This process involves the replication of the closest non-null value for each feature, both forward and backward, until it changes. By doing so, the original structure of the dataset is maintained, and if “not updated”, the value of a feature remains constant over time.

Table 2 presents an example of how

n u l l

values have been handled through the pivot procedure. For simplicity, only a general approach is presented, considering timestamps

t \pm 2

, where

{x, y, z, t, v} \in R

. The forward-fill procedure replicates, for each feature, its value from the most remote non-null timestamp until it changes (refer to Feature 1 and Feature 2 in Table 3). Similarly, backward fill is used to fill values from the most recent non-null timestamp part of the dataset towards the beginning (see “Feature n” in Table 4). Particular attention was paid to the class variable in the dataset, as it required a different treatment compared to the other features. The forward-fill procedure does not change from what has just been described; however, the backward fill does not guarantee that the missing value is replicated until the beginning of the dataset, when it encounters an error code. For this reason, the backward-

n u l l

values were filled with the corresponding class of “normal behaviour”.

4.3. Exploratory Data Analysis

This step of the study involved the analysis of the data from the sensors; the dataset had a

T i m e s t a m p

range from 1 January 2023 to 1 June 2023. Due to the nature of data collection and conversion, no constant

Δ t

[s] was identified that consistently separates the

T i m e s t a m p

of a single dataset row from the next. A simple statistical analysis revealed that

m i n (Δ t) = 1

[s],

m a x (Δ t) = 3

[s] and

a v g (Δ t) = 1.3

[s]. Since a 1:1 correspondence between the

T i m e s t a m p s

of the dataset rows could not be established, time will be expressed in terms of time units (TU) instead of seconds. The TU represents the unit step between consecutive rows, independent of the actual time elapsed. Furthermore, no further investigation was conducted on the trend of

T i m e s t a m p

, as it will be used only as an index to better represent the data.

Exploratory Data Analysis (EDA) began with a simple plot illustrating the trend of all features over time. This initial step provided a clearer understanding of their behaviour, revealing that 5 of the 34 features maintained a constant trend throughout the time frame. As a result, these features were removed from the dataset and excluded from further analysis. Subsequently, to gain deeper insight into the domain of each feature, a random forest analysis was performed using the Mean Decrease Impurity metric (MDI), as shown in Figure 1. The data exhibit significant variability, primarily due to the machine’s inherent nature, which frequently transitions between different states, along with occasional outliers observed during maintenance operations.

Further investigations were conducted by analysing the correlation matrix between features and classes, revealing that an additional 14 variables exhibited identical correlation patterns. Discussions with domain experts confirmed that these values came from closely positioned sensors, allowing a further reduction in the number of features by retaining only one representative variable per group. The final set of selected features, along with their correlation values, is presented in Figure 2, which highlights a refined selection of the 21 features of the original 34 (excluding the target class).

Additional studies in the domain of each feature did not show evidence of seasonal and residual values. Furthermore, the autocorrelation with the target class, as well as the partial autocorrelation of each feature, did not show a strong relationship.

By generating both histograms and box plots to represent the operational value range, the inherent data distribution of each feature was revealed. This analysis facilitated the identification of outliers, highlighting the need for further data preprocessing.

To standardise the columns corresponding to each feature, the “quantile transform” technique was applied. This approach remaps the original probability distribution while mitigating the influence of outliers, ensuring that no rows are excluded and preserving the continuity of the time series

T i m e s t a m p

.

The analysis of the target class, namely “First Alarm” (“Primo Allarme”, in Italian), required specific attention, since multiple types of “First Alarm” occur during the time frame considered. They are analysed by plotting the general trend of the occurrences over time, studying each error’s frequency. This is illustrated in Figure 3, where the axes’ dimensions “Primo Allarme” indicate the type of error and the occurrence time. The x axis represents the time, while the y represents the “Error” type.

In this way, a total of 54 distinct classes were counted: “ErrorCode 0” identifies the normal machine behaviour, while the others represent either general warnings or actual machine malfunctions. Then, how many times each class appeared during the time frame was counted and represented in Table 5. It is important to note that most of them represent only warnings or general information about the machine’s production process.

The analysis was specifically focused on critical errors associated with paper breakage, because when this happens, the recovery procedure takes time and effort, reducing the production rate until the problem is solved. The company’s domain experts suggested that the most trivial and important errors to identify in advance were those related to the tension of the paper and “paper breaking protection”: 2509, 2556, 2557, 2558, and 2559.

Therefore, it was necessary to isolate these five target classes from the others. An analysis was performed to identify potential time–cause–consequence relationships among errors; however, no hidden patterns were detected. Furthermore, discussions with domain experts did not provide any information on how to merge a certain“unwanted” class into any of the five mentioned above. Further attempts were made to create and analyse a meta-dataset with meta-classes, but the high variability of the time-series errors prevented further exploration of this approach.

Given these complexities, the principle of Occam’s razor was applied, favouring simplicity in dataset management. This approach involved reclassifying all errors, except those previously identified, as “0” to represent normal system behaviour. The result of this choice is an unbalanced time-based dataset, where most of the entries represent “normal functioning” of the machine and a small fraction of it includes multiple errors given by the five distinct classes. This problem will be discussed in Section 4.5.

The proposed approach ensures that the results can be reproduced at any time by adopting a pipeline-based research methodology. To achieve this, Kedro has been used to manage various stages of the entire process: from data preparation to neural network training and testing (represented in Figure 4 as a Direct Acyclic Graph, or DAG). The data were processed with Pandas and subsequently visualised with Seaborn 0.12.1. Lastly, the AI models discussed in Section 4.4 were implemented using PyTorch 2.1.0. and the entire process was deployed on a proprietary cloud service named “Alida”.

Kedro is an open-source Python 3 framework hosted by the Linux Foundation that allows us to build data science pipelines, and was created at QuantumBlack to reduce technical debt in data science experiments, making an easier transition from experimentation to production [59]. Its flexibility and integration among modern Integrated Development Environment (IDE) and cloud hosts enable scientists and engineers to easily analyse data and train AI models focusing on reproducibility, reporting, and deployment on remote hosts. Another important feature is the possibility of visualising and debugging pipelines using integrated visualisation tools.

Figure 4 represents the entire training pipeline that we built using Kedro. From top to bottom, it represents the data flows that come from the loading of the representative parameters (marked with a coloured purple line), to the creation of the result images and tables proposed in this work. For the sake of simplicity and clarity, we have omitted the portion of the pipeline related to data preprocessing and cleaning, as it involves standard procedures with no novel contributions. Additionally, to avoid redundancy, we omitted pipeline replication for all five models.

4.4. The Deep Learning Approach Implementation

In the previous section, it is discussed how data are distributed over time, which features are important to consider and how many errors need to be identified to prevent the machine from malfunctioning. This scenario suggests the need to address a multi-classification challenge, trying to predict the occurrence of an error as ahead of time as possible.

To correctly classify the incoming data, a deep learning approach was chosen and adopted, as suggested in several recent studies [60,61]. Among the many neural networks available in the literature, the LSTM was selected as it is particularly well suited to handle the irregular and time-dependent data generated by industrial sensors. Additionally, to mitigate the problem of gradient vanishing, the attention mechanism to enhance the model’s ability to focus on the most informative parts of the data [62] has been used.

Therefore, the generation of the functional EWS is based on the exploration of various strategies, including the following:

Creating a single neural network able to multi-classify the five errors;
Creating multiple neural networks, each specialised in a single error.

To correctly justify the adopted approach, it was necessary to further investigate the distribution of the five error classes. The number of times a signal changed from “0” to any of the error classes was counted. The resulting count is reported in the row “Total occurrences” of Table 6.

It can be observed that the total number of error events changes drastically from one class to another.

The first strategy involved the usage of a single neural network to multi-classify the errors in a naturally unbalanced dataset with an unbalanced number of error classes. Additionally, this would result in the creation of a large model, both in size and computational complexity. Although the entire system is not meant to work with real-time requirements, this approach limits the extendability of the system, leading to the need for a single node with high computational capabilities.

To adopt the second one, it is required to extract chunks of data from the dataset related to the specific error class and feed the models singularly. This would require that the dataset is passed to each individual neural network model and the use of a finalisation step to highlight the most probable error class (e.g., compare all network results and pick the one with the highest elicitation). Therefore, under the mentioned hypothesis, it was decided to adopt the second approach as the research team considered it more extensible and robust, being aware that some neural networks might perform better than others due to the different number of examples provided.

To extract a discrete number of TU rows with which each neural network would be trained and tested, a

Δ t

around each error occurrence was considered. Specifically, all the features up to

t = 300

[s] (5 min) before the error were taken, along with the first five TU elements following the occurrence. This extraction window was agreed with the domain experts as a result of the intrinsic machine behaviour during the paper blocking process (e.g., the slowing of the embosser rolls). Furthermore, since the whole dataset exhibits an imbalance between error and non-error classes, retaining only a portion around the error class can help to reduce overfitting. Each time frame segment encompassed a variable number of TUs including the machine’s “normal functioning” and relevant feature values describing the error class. “The average TU number” was calculated for each error-specific dataset and the results are reported in Table 6. Henceforth, each dataset was independently treated, dividing each into

80 %

for training and

20 %

for testing; an additional

20 %

of the training dataset was left for validation during the training process. Before each training session, the data chunks were shuffled and the training lasted for a maximum of 500 epochs, with an early stop mechanism implemented to stop the training if the validation error exhibited an increasing trend across epochs (

p a t i e n c e = 10

and

m i n_d e l t a = 0.1

). The Learning Rate (LR) was set at

α = 0.001

, and the Adam optimiser was used to calculate the error, while the Mean Square Error was used as a loss function.

To find the most suitable network architecture, the Pytorch Ray Tune was used to write a test matrix with grid search on LSTM layers (

1 - 5 - 10

), attention head number (

2 - 10 - 50 - 100

), and dropout rate (

0.1 - 0.2 - 0.3

) [63]. The total size of the network and the performance obtained by each prototype were compared. To manage the high computational complexity of the models and data, mini-batch training with a

b a t c h_s i z e = 5000

was applied. The results of the prototypes were contrasting, as increasing the complexity of the network did not bring any relevant improvements to the test set. Therefore, considering that the final networks would be deployed on edge machines, with limited computational resources, we opted for the smallest architecture with the highest performance. These share five stacked LSTM layers with 100 hidden units to extract sequential dependencies. Furthermore, a single layer with 10 attention heads concentrates on the most important parts of the data. Then, a fully connected layer processes the attention output, followed by a linear layer and a ReLU activation layer. The architecture is finalised with another linear layer and the dropout rate is set to

0.2

.

The general approach for LSTM applications suggests converting the dataset into sliding windows of size

n \times p

, where n represents the total number of TUs in the window and p denotes the TUs to predict. When discussing with the company’s domain experts, a suggestion was given to look for the error a few minutes before the event occurrence and considering a few instants after its rising. Hence, we decided to design a window size of

n = 100

and

p = 10

, to allow a small warning gap.

4.5. The Performance Measurement

Each neural network was independently trained 10 times, according to the parameters specified in Section 4.4, and in this section, the best results among those runs are presented. For each model, the confusion matrix (along with accuracy), precision, recall, and the area under the receiver operating characteristic (AUROC) were calculated using the test set extracted prior to the training phase. Following the same approach, the retrieved results are reported in Table 7, Table 8, Table 9, Table 10 and Table 11. In particular, each row represents the specified metrics for the prediction time at the given

T U + 1 + \dots + 10

.

Model 2556 consistently demonstrates superior accuracy performance across all forecast horizons, indicative of its robust predictive capabilities. Models 2509 and 2557 share similar accuracy trends, whereas Models 2558 and 2559 exhibit slightly diminished but still commendable accuracy levels.

Precision analysis reveals that Model 2556 achieves notable precision values, particularly at t + 2 and t + 3, underscoring its proficiency in accurately identifying positive instances. In contrast, Model 2557 displays variability in precision across forecast horizons. Models 2509, 2558, and 2559 manifest lower precision values, which implies a higher incidence of false positives.

In terms of recall, Model 2556 performs consistently well, attaining elevated recall values. Model 2559 shows sustained recall values, reflecting an effective equilibrium between false negatives and true positives. Models 2509, 2557, and 2558 display disparate recall outcomes across forecast horizons.

Regarding discriminatory power, Model 2556 maintains consistently high AUROC values, emphasising its ability to distinguish between classes. Model 2559 also demonstrates commendable performance in AUROC. Models 2509, 2557, and 2558 exhibit variability in AUROC values on different forecast horizons.

Additional considerations can be made by relating these results to the training data presented in Table 6. The superior performance of Model 2556 on various metrics aligns with its relatively large training dataset size (14 chunks with an average of 234 entries per chunk). The abundance of data may have contributed to the robust learning and generalisation capabilities of this model.

The similar accuracy patterns observed in Models 2509 and 2557 might be attributed to their comparable dataset sizes (56 and 41 chunks, respectively). However, nuanced differences in precision, recall, and AUROC suggest that other factors, such as the inherent characteristics of the errors or the model architectures, contribute to their distinct performance.

The slightly lower but reasonable accuracy of Models 2558 and 2559 corresponds to their smaller dataset sizes (12 and 37 chunks, respectively). Despite having fewer data entries, these models exhibit notable predictive capabilities, indicating effective learning from the available information.

Generally, the correlation between the size of the dataset and model performance is evident, and larger datasets often lead to better results. However, the influence of other factors, such as error characteristics and the quality of the data, cannot be overlooked.

The final step of the study involved merging the predictions of neural networks to accurately identify a single error event. To achieve this, the entire original dataset was processed and streamed independently to each neural network. By applying the same normalisation functions used during training and maintaining the same feeding window, the predictions were plotted at different levels. Part of the results of this process are presented in Figure 5. This should be read from the bottom to the top and from left to right to understand the behaviour of the signals. Each row represents a different time prediction frame: the top one shows the prediction results of the neural network closer to the current event, while the bottom depict those related to the one with the longer forecast. The total time frame taken into account is about a total of 300 TUs, and each prediction of the neural network is drawn with a different colour.

The squared green signal represents the actual error that occurred during that specific time frame, and its value is labelled in the upper right corner. In addition, a threshold was set, represented as a horizontal dotted line, at

0.6

to highlight signals that exceed this value throughout the time frame. The expected ideal behaviour is that the neural network corresponding to the error reported on the top right of the squared signal (in this specific case, it is the 2509) should rise in advance and keep the same high value until the moment of the error occurrence. As the other neural networks are trained on different target classes, they should not show any activation. The row

t + 10

shows that multiple neural networks seem to recognise an error event, including 2509 (coloured red). In row

t + 7

, the neural network for error 2558 stops its elicitation, as well as the neural network for error 2556 in row

t + 5

. Until row

t + 1

, the neural network 2509 confirms its prediction and extends its accuracy, intercepting the actual occurrence of the error. After this, it correctly stops the recognition, going back to a “zero” state.

5. Discussion

5.1. Theoretical Implications

This study contributes to EWS research [51,64,65], with a specific focus on the implementation of an Intelligent Early Warning System (IEWS) in the manufacturing sector, distinguishing it from previous works such as [31,32]. Although similar to [54], which employs a Bayesian Hypothesis Testing-based model for anomaly detection, this study shifts the focus to predicting the most frequent error classes using deep learning techniques and the power of the attention mechanism.

This shift introduces a new perspective in the theoretical discourse around EWSs, moving from anomaly detection to event anticipation, thereby aligning more closely with the proactive needs of predictive maintenance strategies.

The study also contributes methodologically by proposing a multi-model classification strategy, where each model is specialised in identifying a specific class of critical errors. This approach departs from the common use of monolithic classifiers and opens new avenues for research on modular and scalable architectures for industrial fault prediction. The integration of attention layers into LSTM networks further supports the theoretical discourse on temporal feature selection, allowing the model to identify which segments of the time series carry the most predictive value, a mechanism particularly relevant in scenarios where the importance of historical data fluctuates over time.

5.2. Practical Implications

The approach described in this work brings numerous advantages in real-world scenarios. The adoption of Kedro and the pipeline-based workflow allows the replication of the results and the extraction of new ones using the same identical piece of code. In fact, each node that creates the pipeline can be used multiple times on different data to output the same data structure. This means that the system is scalable, as new errors might be mapped and new models can be trained just by adding a new entry to the data to extract.

Similarly, new entries can be quickly analysed, updating the initial statistical analysis and leading to more insightful details of the general behaviour of the data. With the additional knowledge, models could be updated all at once, retrieving on-screen training/testing performance, and having the models deployed on edge machines.

Having an EWS active and running with these automatic “updates” provides the core value of this work:

Each model warns the operator in advance about possible paper jams, so that immediate recovery procedures can start;
Constantly updated models with improved discrimination capability can enlarge the forecast horizon;
The possibility to extend the developed pipeline to different machine parts to monitor new zones;
Lowers the downtime of the system and improves the production as models are fed with new data.

Moreover, the adoption of Kedro, which allows the automatic deployment of the trained neural networks and their usage on real-time clients (like those close to the machine), brings improvements on many levels:

Allows fast retraining using the same approach on new data;
Allows the automatic deployment and updating of new models directly on the production machines.

Finally, the findings of this study demonstrate that legacy machines, despite not being originally designed with advanced monitoring capabilities, can be enhanced through the integration of sensors and AI models without requiring a complete replacement of existing infrastructure. This approach presents a cost-effective opportunity for manufacturing companies seeking to modernise their production systems with advanced predictive technologies, reducing technological upgrade costs while improving operational efficiency.

5.3. Limitations and Future Works

The current work presents some important limitations that will be addressed in future studies. Initially, the result metrics were plotted from all networks over time to give a general insight into how the entire system works.

From the statistical analysis in Figure 6, it appears evident that the models have an important improvement margin from a precision point of view. As is also clear in Figure 5, false positive events are frequent throughout the test frame. In the same way, the high accuracy levels suggest that there is a tendency to overfit. Despite the shared network architecture, the results vary significantly between all the models. It can be assumed that many factors contribute to this limitation:

The nature of the incoming data and the large number of classes (as discussed in Section 4.3);
The limited number of data chunks used for training and testing (as shown in Table 5);
As the neural networks are trained to catch the error occurrence, instead of the entire error frame, final metrics might be underestimated.

Despite these challenges, tests on the entire dataset present highly variable results among both the models and the prediction time, generally corresponding to the AUROC values reported in the table metric for each model. The main aspect that causes confusion is the presence of many false positives and false negatives, as reported in Figure 5, but this problem could be mitigated by improving both the quality and quantity of the data through more accurate class handling.

All the limitations cited bring us to many possible future scenarios to overcome them and improve the reported results. First, a more detailed and complete dataset should be arranged, along with a cleaner error configuration and description. The high number of classes/errors reported in Figure 3 is considered to have highly polluted the behaviour of the models, as the features drastically change the overall classification process. This implies setting not only to “0” the value classes, but also the level of the features at a low excitation, to support the model classification process. Second, the current model classification process does not consider some of the “drag” that the machine has during the stopping process. In fact, a problem with the same error might have a different stopping time and speed. This might lead to a different type of classification and/or data extraction. Third, the continuing collection of data to feed the neural networks will be positive for the overall performance; however, this is a time-consuming process, as errors rarely occur.

In conclusion, it is also acknowledged that LSTM networks have long been employed in similar scenarios. However, in this particular case, as reported in Figure 6, the system lacks robustness. Therefore, a possible improvement involves the implementation of hybrid models that combine different predictive methods to classify error classes, or the usage of advanced techniques like transfer learning.

6. Conclusions

EWSs represent essential tools for enabling domain experts to anticipate potential system failures. This study investigated the implementation of such a system in a real-world industrial context, using data collected from the embosser of a legacy industrial printing machine. The data required significant preprocessing due to the heterogeneity and asynchronous nature of the acquisition and communication layers.

After a review of the literature on EWS and fault prediction, and following an in-depth analysis of the sensor variables in collaboration with domain experts, a deep learning framework was designed to predict specific classes of paper-related errors. LSTM networks enhanced with attention mechanisms were selected as the most suitable architecture for this task. A multi-model classification approach was adopted, where five distinct neural networks were each trained to identify a specific critical error class.

The development process was managed using a modular pipeline built with Kedro, ensuring both reproducibility and scalability. This framework also facilitated seamless deployment of the trained models on cloud infrastructure, allowing automated retraining and updates as new data become available.

Model performance was assessed using standard metrics, including accuracy, precision, recall, and AUROC. The best-performing model achieved an AUROC of 0.93 and a recall above 90%, with the ability to predict fault events up to 10 time units in advance. While some degree of overfitting was observed, particularly in models trained on smaller subsets, this is attributed to the limited number of training instances for certain error classes.

To assess robustness in real-world conditions, the full dataset was partitioned and streamed sequentially to each model, simulating a real-time deployment scenario. The results confirmed the models’ capacity to anticipate faults within a proactive maintenance horizon. In conclusion, this study offers several practical insights for the design and deployment of intelligent EWSs in industrial environments: (1) the proposed data pipeline can be reused across different machines and operational contexts; (2) the AI models are modular and easily re-trainable on new error types without modifying the overall structure; (3) automatic retraining enables continuous improvement in predictive accuracy; and (4) the architecture is portable and adaptable, making it suitable for deployment on other legacy industrial systems. Researchers and practitioners in the fields of Big Data analytics, cloud-based predictive maintenance, and industrial AI applications can leverage these results to design and address future research and industrial activities.

Author Contributions

Conceptualization, A.M.C., M.L. and A.S.; Methodology, A.M.C., M.L., G.S. and A.S.; Software, F.C.; Validation, F.C.; Formal analysis, G.S. and A.S.; Data curation, F.C.; Writing—original draft, F.C., A.M.C., M.L., G.S. and A.S.; Writing—review & editing, M.L.; Supervision, M.L.; Project administration, M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This project has received funding from the Piano di Sviluppo research and innovation programme under the SCREAM project (SeCure Remote Equipment and Asset Monitoring), grant agreement F/190043/01/X44.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets presented in this article are not readily available because the datasets is based on sensors of a proprietary machine of Sofidel and it is managed and provided to the SCREAM project.

Acknowledgments

We would like to thank Sofidel (https://www.sofidel.com/, accessed on 6 July 2023) for providing the data used in this work and the valuable support during the development of the proposed solution. We also thank EKA (https://www.eka.it/, accessed on 6 July 2023) and Iprel (https://www.iprel.it/, accessed on 6 July 2023) for the support.

Conflicts of Interest

Author Fabio Cassano was employed by the company Engineering. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Kagermann, H.; Helbig, J.; Hellinger, A.; Wahlster, W. Recommendations for Implementing the Strategic Initiative INDUSTRIE 4.0: Securing the Future of German Manufacturing Industry; Final Report of the Industrie 4.0 Working Group; Forschungsunion: Frankfurt, Germany, 2013. [Google Scholar]
Popkova, E.G.; Ragulina, Y.V.; Bogoviz, A.V. Industry 4.0: Industrial Revolution of the 21st Century; Springer: Cham, Switzerland, 2019; Volume 169. [Google Scholar]
Ruiz-Sarmiento, J.R.; Monroy, J.; Moreno, F.A.; Galindo, C.; Bonelo, J.M.; Gonzalez-Jimenez, J. A predictive model for the maintenance of industrial machinery in the context of industry 4.0. Eng. Appl. Artif. Intell. 2020, 87, 103289. [Google Scholar] [CrossRef]
Nikolic, B.; Ignjatic, J.; Suzic, N.; Stevanov, B.; Rikalovic, A. Predictive manufacturing systems in Industry 4.0: Trends, benefits and challenges. In Proceedings of 28th DAAAM International Symposium on Intelligent Manufacturing and Automation; DAAAM International: Vienna, Austria, 2017; pp. 796–802. [Google Scholar]
Cañas, H.; Mula, J.; Díaz-Madroñero, M.; Campuzano-Bolarín, F. Implementing industry 4.0 principles. Comput. Ind. Eng. 2021, 158, 107379. [Google Scholar]
Silvestri, L.; Forcina, A.; Introna, V.; Santolamazza, A.; Cesarotti, V. Maintenance transformation through Industry 4.0 technologies: A systematic literature review. Comput. Ind. 2020, 123, 103335. [Google Scholar] [CrossRef]
DIN EN 13306:2018-02; Maintenance—Maintenance Terminology. DIN: Berlin, Germany, 2018.
Kimera, D.; Nangolo, F.N. Improving ship yard ballast pumps’ operations: A PCA approach to predictive maintenance. Marit. Transp. Res. 2020, 1, 100003. [Google Scholar]
Yekeen, S.; Balogun, A.; Aina, Y. Early warning systems and geospatial tools: Managing disasters for urban sustainability. In Sustainable Cities and Communities; Springer: Berlin/Heidelberg, Germany, 2020; pp. 129–141. [Google Scholar]
Liu, J.; Wang, P.; Jiang, D.; Nan, J.; Zhu, W. An integrated data-driven framework for surface water quality anomaly detection and early warning. J. Clean. Prod. 2020, 251, 119145. [Google Scholar]
Quansah, J.E.; Engel, B.; Rochon, G.L. Early warning systems: A review. J. Terr. Obs. 2010, 2, 5. [Google Scholar]
Waidyanatha, N. Towards a typology of integrated functional early warning systems. Int. J. Crit. Infrastruct. 2010, 6, 31–51. [Google Scholar]
Sansone, D.; Zhu, A. Using Machine Learning to Create an Early Warning System for Welfare Recipients. Oxf. Bull. Econ. Stat. 2023, 85, 959–992. [Google Scholar] [CrossRef]
Moon, S.H.; Kim, Y.H.; Lee, Y.H.; Moon, B.R. Application of machine learning to an early warning system for very short-term heavy rainfall. J. Hydrol. 2019, 568, 1042–1054. [Google Scholar] [CrossRef]
de Moraes, O.L.L. Proposing a metric to evaluate early warning system applicable to hydrometeorological disasters in Brazil. Int. J. Disaster Risk Reduct. 2023, 87, 103579. [Google Scholar] [CrossRef]
Zhao, C.; Ding, D.; Du, Z.; Shi, Y.; Su, G.; Yu, S. Analysis of perception accuracy of roadside millimeter-wave radar for traffic risk assessment and early warning systems. Int. J. Environ. Res. Public Health 2023, 20, 879. [Google Scholar] [CrossRef] [PubMed]
Abdalzaher, M.S.; Elsayed, H.A.; Fouda, M.M.; Salim, M.M. Employing machine learning and iot for earthquake early warning system in smart cities. Energies 2023, 16, 495. [Google Scholar] [CrossRef]
Jieyang, P.; Kimmig, A.; Dongkun, W.; Niu, Z.; Zhi, F.; Jiahai, W.; Liu, X.; Ovtcharova, J. A systematic review of data-driven approaches to fault diagnosis and early warning. J. Intell. Manuf. 2023, 34, 3277–3304. [Google Scholar]
Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 35, 20939–20954. [Google Scholar] [CrossRef]
Grasso, V.F.; Singh, A. Early Warning Systems: State-of-Art Analysis and Future Directions; Draft Report; United Nations Environment Programme (UNEP): Nairobi, Kenya, 2011; Volume 1. [Google Scholar]
Rogers, D.; Tsirkunov, V. Global Assessment Report on Disaster Risk Reduction: Costs and Benefits of Early Warning Systems; Technical Report; The World Bank: Geneva, Switzerland, 2010. [Google Scholar]
O’Neill, S.; Clyne, B.; Bell, M.; Casey, A.; Leen, B.; Smith, S.M.; Ryan, M.; O’Neill, M. Why do healthcare professionals fail to escalate as per the early warning system (EWS) protocol? A qualitative evidence synthesis of the barriers and facilitators of escalation. BMC Emerg. Med. 2021, 21, 15. [Google Scholar]
Pang, J. Early Warning System for Complex Products Quality Improvement using Data Mining and Neural Network. Metall. Min. Ind. 2015, 8, 377–385. [Google Scholar]
Bussiere, M.; Fratzscher, M. Towards a new early warning system of financial crises. J. Int. Money Financ. 2006, 25, 953–973. [Google Scholar]
Leon, R.D. A managerial early warning system: From an abstract to a subjective approach. In Managerial Strategies for Business Sustainability During Turbulent Times; IGI Global: Geneva, Switzerland, 2018; pp. 100–121. [Google Scholar]
Bertoncel, T.; Erenda, I.; Bach, M.P.; Roblek, V.; Meško, M. A managerial early warning system at a smart factory: An intuitive decision-making perspective. Syst. Res. Behav. Sci. 2018, 35, 406–416. [Google Scholar]
Bertoncel, T.; Erenda, I.; Meško, M. Managerial early warning system as best practice for project selection at a smart factory. Amfiteatru Econ. 2018, 20, 805–819. [Google Scholar]
Haibo, L.; Zhi, W. Application of an intelligent early-warning method based on DBSCAN clustering for drilling overflow accident. Clust. Comput. 2019, 22, 12599–12608. [Google Scholar] [CrossRef]
Liu, X.F.; Kane, G.; Bambroo, M. An intelligent early warning system for software quality improvement and project management. J. Syst. Softw. 2006, 79, 1552–1564. [Google Scholar]
Xu, Q.; Peng, D.; Zhang, S.; Zhu, X.; He, C.; Qi, X.; Zhao, K.; Xiu, D.; Ju, N. Successful implementations of a real-time and intelligent early warning system for loess landslides on the Heifangtai terrace, China. Eng. Geol. 2020, 278, 105817. [Google Scholar]
Tang, J.; You, D.; Li, F.; Cheng, Y. Development of Predictive Maintenance System for Nuclear Power Turbine Unit. In Proceedings of the 2023 2nd International Conference on Artificial Intelligence and Computer Information Technology (AICIT), Yichang, China, 15–17 September 2023; pp. 1–4. [Google Scholar]
Liu, L.; Song, D.; Geng, Z.; Zheng, Z. A real-time fault early warning method for a high-speed emu axle box bearing. Sensors 2020, 20, 823. [Google Scholar] [CrossRef]
Gong, Y.; Su, X.; Qian, H.; Yang, N. Research on fault diagnosis methods for the reactor coolant system of nuclear power plant based on DS evidence theory. Ann. Nucl. Energy 2018, 112, 395–399. [Google Scholar]
Ramaki, A.A.; Atani, R.E. A survey of IT early warning systems: Architectures, challenges, and solutions. Secur. Commun. Netw. 2016, 9, 4751–4776. [Google Scholar]
Oussous, A.; Benjelloun, F.Z.; Lahcen, A.A.; Belfkih, S. Big Data technologies: A survey. J. King Saud Univ.-Comput. Inf. Sci. 2018, 30, 431–448. [Google Scholar]
Gandomi, A.; Haider, M. Beyond the hype: Big data concepts, methods, and analytics. Int. J. Inf. Manag. 2015, 35, 137–144. [Google Scholar]
Windmann, S.; Maier, A.; Niggemann, O.; Frey, C.; Bernardi, A.; Gu, Y.; Pfrommer, H.; Steckel, T.; Krüger, M.; Kraus, R. Big data analysis of manufacturing processes. Proc. J. Phys. Conf. Ser. 2015, 659, 012055. [Google Scholar]
Cui, Y.; Kara, S.; Chan, K.C. Manufacturing big data ecosystem: A systematic literature review. Robot. Comput.-Integr. Manuf. 2020, 62, 101861. [Google Scholar]
Gervasi, M.; Totaro, N.G.; Specchia, G.; Latino, M.E. Unveiling the Roots of Big Data Project Failure: A Critical Analysis of the Distinguishing Features and Uncertainties in Evaluating Big Data Potential Value. In Proceedings of the 2nd Italian Conference on Big Data and Data Science (ITADATA 2023) (Vol. 3606), CEUR-WS, Naples, Italy, 11–13 September 2023. [Google Scholar]
Corallo, A.; Crespino, A.M.; Del Vecchio, V.; Gervasi, M.; Lazoi, M.; Marra, M. Evaluating maturity level of big data management and analytics in industrial companies. Technol. Forecast. Soc. Change 2023, 196, 122826. [Google Scholar]
Intrieri, E.; Gigli, G.; Mugnai, F.; Fanti, R.; Casagli, N. Design and implementation of a landslide early warning system. Eng. Geol. 2012, 147, 124–136. [Google Scholar] [CrossRef]
Esposito, M.; Palma, L.; Belli, A.; Sabbatini, L.; Pierleoni, P. Recent advances in internet of things solutions for early warning systems: A review. Sensors 2022, 22, 2124. [Google Scholar] [CrossRef]
Van Gerven, M. Computational foundations of natural intelligence. Front. Comput. Neurosci. 2017, 11, 112. [Google Scholar] [CrossRef] [PubMed]
Tan, L.; Wang, N. Future internet: The internet of things. In Proceedings of the 2010 3rd International Conference on Advanced Computer Theory and Engineering (ICACTE), Chengdu, China, 20–22 August 2011; Volume 5, pp. V5-376–V5-380. [Google Scholar]
Shi, J.; Wan, J.; Yan, H.; Suo, H. A survey of cyber-physical systems. In Proceedings of the 2011 International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 9–11 November 2011; pp. 1–6. [Google Scholar]
Lee, J.; Kao, H.A.; Yang, S. Service innovation and smart analytics for industry 4.0 and big data environment. Procedia Cirp 2014, 16, 3–8. [Google Scholar] [CrossRef]
Livina, V.N.; Lewis, A.P.; Wickham, M. Tipping point analysis of electrical resistance data with early warning signals of failure for predictive maintenance. J. Electron. Test. 2020, 36, 569–576. [Google Scholar] [CrossRef]
Zhou, Y.; Lork, C.; Li, W.T.; Yuen, C.; Keow, Y.M. Benchmarking air-conditioning energy performance of residential rooms based on regression and clustering techniques. Appl. Energy 2019, 253, 113548. [Google Scholar] [CrossRef]
Qin, Y.; Yuen, C.; Adams, S. Invariant learning based multi-stage identification for lithium-ion battery performance degradation. In Proceedings of the IECON 2020 The 46th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 18–21 October 2020; pp. 1849–1854. [Google Scholar]
Li, W.T.; Hassan, N.U.; Khan, F.; Yuen, C.; Keow, Y.M. Data driven model for performance evaluation and anomaly detection in integrated air source heat pump operation. In Proceedings of the 2019 IEEE International Conference on Industrial Technology (ICIT), Melbourne, Australia, 13–15 February 2019; pp. 1280–1285. [Google Scholar]
Vafaei, N.; Ribeiro, R.A.; Camarinha-Matos, L.M. Fuzzy early warning systems for condition based maintenance. Comput. Ind. Eng. 2019, 128, 736–746. [Google Scholar] [CrossRef]
Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Lindemann, B.; Maschler, B.; Sahlab, N.; Weyrich, M. A survey on anomaly detection for technical systems using LSTM networks. Comput. Ind. 2021, 131, 103498. [Google Scholar] [CrossRef]
Liu, G.; Yang, S.; Wang, G.; Li, F.; You, D. A decision-making method for machinery abnormalities based on neural network prediction and Bayesian hypothesis testing. Electronics 2021, 10, 1610. [Google Scholar] [CrossRef]
Wang, T.; Wu, Y.; Zhang, Y.; Lv, W.; Chen, X.; Zeng, M.; Yang, J.; Su, Y.; Hu, N.; Yang, Z. Portable electronic nose system with elastic architecture and fault tolerance based on edge computing, ensemble learning, and sensor swarm. Sens. Actuators B Chem. 2023, 375, 132925. [Google Scholar]
Ni, W.; Wang, T.; Wu, Y.; Liu, X.; Li, Z.; Yang, R.; Zhang, K.; Yang, J.; Zeng, M.; Hu, N.; et al. Multi-task deep learning model for quantitative volatile organic compounds analysis by feature fusion of electronic nose sensing. Sens. Actuators B Chem. 2024, 417, 136206. [Google Scholar]
Wang, T.; Zhang, H.; Wu, Y.; Jiang, W.; Chen, X.; Zeng, M.; Yang, J.; Su, Y.; Hu, N.; Yang, Z. Target discrimination, concentration prediction, and status judgment of electronic nose system based on large-scale measurement and multi-task deep learning. Sens. Actuators B Chem. 2022, 351, 130915. [Google Scholar]
Creswell, J.W.; Creswell, J.D. Research Design: Qualitative, Quantitative, and Mixed Methods Approaches; Sage Publications: Thousand Oaks, CA, USA, 2017. [Google Scholar]
Alam, S.; Chan, N.L.; Couto, L.; Dada, Y.; Danov, I.; Datta, D.; DeBold, T.; Gundaniya, J.; Kaiser, S.; Kanchwala, R.; et al. Kedro. Available online: https://github.com/kedro-org/kedro (accessed on 6 July 2023).
Ayvaz, E.; Kaplan, K.; Kuncan, M. An integrated LSTM neural networks approach to sustainable balanced scorecard-based early warning system. IEEE Access 2020, 8, 37958–37966. [Google Scholar]
Lin, Z.; Shi, Y.; Chen, B.; Liu, S.; Ge, Y.; Ma, J.; Yang, L.; Lin, Z. Early warning method for power supply service quality based on three-way decision theory and LSTM neural network. Energy Rep. 2022, 8, 537–543. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NISP’17), Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010.
Kolluru, V.K.; Challagundla, Y.; Chintakunta, A.N.; Roy, B.; Bermak, A.; SM, R.D. AI-Driven Energy Optimization: Household Power Consumption Prediction With LSTM Networks and PyTorch-Ray Tune in Smart IoT Systems. In Proceedings of the 2024 International Conference on Microelectronics (ICM), Doha, Qatar, 4–7 August 2024; pp. 1–6. [Google Scholar]
Burnaev, E. On construction of early warning systems for predictive maintenance in aerospace industry. J. Commun. Technol. Electron. 2019, 64, 1473–1484. [Google Scholar]
Syafrudin, M.; Fitriyani, N.L.; Alfian, G.; Rhee, J. An affordable fast early warning system for edge computing in assembly line. Appl. Sci. 2018, 9, 84. [Google Scholar] [CrossRef]

Figure 1. Mean Decrease Impurity analysis on the entire dataset.

Figure 2. The final correlation matrix of features.

Figure 3. Plot over time of all errors that occurred during the time frame.

Figure 4. The pipeline used by Kedro to extract the data and train the model 2509.

Figure 5. The round table results of the five neural networks.

Figure 6. Box plot distribution for all models.

Table 1. Comparative analysis of models, applications and key findings.

Reference	Model Adopted	Application Area	Key Findings
[41]	Landslide Early Warning System (EWS)	Geotechnical Monitoring	Effective design and implementation of an EWS for landslide prevention, improving hazard mitigation.
[42]	IoT-based EWSs	Various Sectors	IoT technologies enhance the efficiency of EWSs, enabling real-time monitoring and data-driven decision-making.
[43]	Computational Intelligence	Cyber–Physical Systems	Computational models improve decision-making in natural intelligence-based early warning systems.
[44]	Internet of Things (IoT)	Smart Manufacturing	IoT integration enhances predictive capabilities and automation in industrial monitoring.
[45]	Cyber–Physical Systems (CPSs)	Industrial Monitoring	CPSs improve system reliability, enabling real-time data analysis and fault detection.
[46]	Smart Analytics & Industry 4.0	Manufacturing and Big Data	Advanced analytics enhance predictive maintenance and anomaly detection.
[47]	DFA (Detrended Fluctuation Analysis)	Electronic Component Monitoring	DFA provides early warning signals for predictive maintenance of electrical resistance systems.
[48]	ACF (AutoCorrelation Function)	HVAC System Monitoring	Regression and clustering techniques improve performance benchmarking and anomaly detection.
[49]	Invariant-based Identification	Lithium-Ion Battery Monitoring	Invariant learning enhances multi-stage identification of battery performance degradation.
[50]	Data-Driven Model	Heat Pump Performance Monitoring	Data-driven approaches effectively detect operational anomalies in air-source heat pumps.
[51]	Fuzzy Logic-Based EWSs	Predictive Maintenance	Fuzzy logic improves condition-based maintenance strategies.
[52]	Deep Learning (DL)	General Predictive Analytics	DL models are highly effective in detecting complex, non-linear patterns in EWS data.
[53]	LSTM Networks	Industrial Anomaly Detection	LSTM-based models improve long-term dependency learning, aiding in rare event prediction.
[31]	LSTM–Bayes Model	Steam Turbine Maintenance (Nuclear Sector)	LSTM–Bayes model enhances predictive maintenance for nuclear power turbine units.
[10]	LSTM-Based EWSs	High-Speed Train Axle Box Bearing Monitoring	LSTM improves detection of anomalies that traditional sensors fail to identify.
[54]	LSTM + Bayesian Inference	Machinery Anomaly Detection	Hybrid model optimises fault prediction and decision-making in industrial settings.

Table 2. The initial status of the table.

Timestamp	Feature 1	Feature 2	...	Feature n
t − 2	t	null	...	null
t − 1	null	y	...	null
t	null	null	...	null
t + 1	x	v	...	null
t + 2	null	null	...	z

Table 3. The application of the forward fill to the original table moves the values of the variable x, y, v, t forward in time.

Timestamp	Feature 1	Feature 2	...	Feature n
t − 2	t	null	...	null
t − 1	t	y	...	null
t	t	y	...	null
t + 1	x	v	...	null
t + 2	x	v	...	z

Table 4. The application of the backward fill to the forward-filled table moves the value of y, z back in time.

Timestamp	Feature 1	Feature 2	...	Feature n
t − 2	t	y	...	z
t − 1	t	y	...	z
t	t	y	...	z
t + 1	x	v	...	z
t + 2	x	v	...	z

Table 5. Error code TU count, sorted by the number of occurrences.

ErrorCode	0	2629	2592	2580	2514	2577	2563
Count	2,815,203	600,602	194,801	96,650	44,676	29,137	21,467
ErrorCode	2591	2593	2668	2535	2507	2559	2509
Count	12,221	11,634	9459	8722	8111	7313	6453
ErrorCode	2556	2579	2529	2557	2549	5	2581
Count	5576	4607	3596	3227	3043	1767	1746
ErrorCode	2505	2519	2628	2597	2510	2517	2520
Count	1212	1140	1112	1060	1045	1042	938
ErrorCode	2561	2558	2572	2669	2654	2614	2676
Count	757	744	563	556	262	248	205
ErrorCode	2553	2565	2652	2550	2513	2605	2596
Count	164	128	125	109	96	66	64
ErrorCode	2612	2615	2661	2575	2583	2659	2604
Count	58	48	28	22	15	15	9
ErrorCode	2600	2576	2562	2599
Count	9	8	7	5

Table 6. Training/testing dataset size for each error class.

Error	2509	2556	2557	2558	2559
Total occurrences	56	14	41	12	37
Average TU number	93	234	151	63	125

Table 7. Metric performances on the test set for the model 2509.

	Accuracy	Precision	Recall	AUROC
t + 1	0.89	0.06	0.45	0.41
t + 2	0.89	0.07	0.48	0.45
t + 3	0.89	0.06	0.42	0.36
t + 4	0.89	0.08	0.70	0.45
t + 5	0.89	0.08	0.52	0.43
t + 6	0.89	0.07	0.52	0.44
t + 7	0.88	0.07	0.54	0.42
t + 8	0.88	0.08	0.66	0.44
t + 9	0.88	0.07	0.46	0.37
t + 10	0.87	0.09	0.59	0.47

Table 8. Metric performances on the test set for Model 2556.

	Accuracy	Pprecision	Recall	AUROC
t + 1	0.97	0.02	0.58	0.60
t + 2	0.97	0.16	0.92	0.89
t + 3	0.97	0.00	0.00	0.04
t + 4	0.96	0.03	1.00	0.76
t + 5	0.96	0.08	0.46	0.42
t + 6	0.96	0.09	0.93	0.94
t + 7	0.96	0.04	1.00	0.93
t + 8	0.95	0.03	0.86	0.76
t + 9	0.94	0.09	0.71	0.74
t + 10	0.94	0.03	0.33	0.40

Table 9. Metric performances on the test set for Model 2557.

	Accuracy	Precision	Recall	AUROC
t + 1	0.84	0.03	0.60	0.51
t + 2	0.85	0.07	0.75	0.63
t + 3	0.85	0.03	0.58	0.53
t + 4	0.84	0.07	0.67	0.62
t + 5	0.84	0.06	0.67	0.62
t+ 6	0.84	0.06	0.52	0.61
t + 7	0.84	0.08	0.71	0.74
t + 8	0.84	0.07	0.77	0.72
t + 9	0.84	0.04	0.60	0.57
t + 10	0.84	0.07	0.61	0.63

Table 10. Metric performances on the test set for Model 2558.

	Accuracy	Precision	Recall	AUROC
t + 1	0.95	0.09	0.92	0.83
t + 2	0.93	0.35	0.67	0.68
t + 3	0.92	0.41	0.58	0.59
t + 4	0.91	0.07	0.67	0.65
t + 5	0.90	0.40	0.77	0.75
t + 6	0.90	0.22	0.64	0.64
t + 7	0.90	0.10	0.73	0.71
t + 8	0.89	0.44	0.86	0.83
t + 9	0.88	0.36	0.71	0.77
t + 10	0.88	0.11	0.71	0.69

Table 11. Metric performances on the test set for Model 2559.

	Accuracy	Precision	Recall	AUROC
t + 1	0.96	0.05	0.92	0.59
t + 2	0.96	0.05	0.94	0.57
t + 3	0.96	0.05	1.00	0.54
t + 4	0.96	0.043	0.89	0.52
t + 5	0.96	0.04	0.95	0.56
t + 6	0.96	0.05	1.00	0.60
t + 7	0.95	0.06	0.95	0.62
t + 8	0.95	0.05	0.96	0.61
t + 9	0.95	0.05	0.93	0.63
t + 10	0.95	0.05	0.93	0.63

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cassano, F.; Crespino, A.M.; Lazoi, M.; Specchia, G.; Spennato, A. An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction. Appl. Sci. 2025, 15, 4013. https://doi.org/10.3390/app15074013

AMA Style

Cassano F, Crespino AM, Lazoi M, Specchia G, Spennato A. An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction. Applied Sciences. 2025; 15(7):4013. https://doi.org/10.3390/app15074013

Chicago/Turabian Style

Cassano, Fabio, Anna Maria Crespino, Mariangela Lazoi, Giorgia Specchia, and Alessandra Spennato. 2025. "An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction" Applied Sciences 15, no. 7: 4013. https://doi.org/10.3390/app15074013

APA Style

Cassano, F., Crespino, A. M., Lazoi, M., Specchia, G., & Spennato, A. (2025). An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction. Applied Sciences, 15(7), 4013. https://doi.org/10.3390/app15074013

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An EWS-LSTM-Based Deep Learning Early Warning System for Industrial Machine Fault Prediction

Abstract

1. Introduction

2. Theoretical Background

2.1. Big Data and Smart Factories

2.2. Early Warning System

3. Research Design

4. The Case Study Phases and Results

4.1. Company’s Context Description

4.2. The Dataset Elaboration

4.3. Exploratory Data Analysis

4.4. The Deep Learning Approach Implementation

4.5. The Performance Measurement

5. Discussion

5.1. Theoretical Implications

5.2. Practical Implications

5.3. Limitations and Future Works

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI