Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines

Sun, Jiawen; Ren, Hongxiang; Duan, Yating; Yang, Xiao; Wang, Delong; Tang, Haina

doi:10.3390/jmse12060990

Open AccessArticle

Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines

by

Jiawen Sun

¹

,

Hongxiang Ren

^1,*,

Yating Duan

¹,

Xiao Yang

¹,

Delong Wang

¹ and

Haina Tang

²

¹

Navigation College, Dalian Maritime University, Dalian 116026, China

²

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(6), 990; https://doi.org/10.3390/jmse12060990

Submission received: 19 May 2024 / Revised: 9 June 2024 / Accepted: 10 June 2024 / Published: 13 June 2024

(This article belongs to the Special Issue Recent Advances on Intelligent Maintenance and Health Management in Ocean Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Timely and effective maintenance is imperative to minimize operational disruptions and ensure the reliability of marine vessels. However, given the low early warning rates and poor adaptability under complex conditions of previous data-driven fault prediction methods, this paper presents a hybrid deep learning model based on multi-layer attention mechanisms for predicting faults in a marine diesel engine. Specifically, this hybrid model first introduces a Convolutional Neural Network (CNN) and self-attention to extract local features from multi-feature input sequences. Then, we utilize Long Short-Term Memory (LSTM) and multi-head attention to capture global correlations across time steps. Finally, the hybrid deep learning model is integrated with the Exponential Weighted Moving Average (EWMA) to monitor the operational status and predict potential faults in the marine diesel engine. We conducted extensive evaluations using real datasets under three operating conditions. The experimental results indicate that the proposed method outperforms the current state-of-the-art methods. Moreover, ablation studies and visualizations highlight the importance of fusing multi-layer attention, and the results under various operating conditions and application scenarios demonstrate that this method possesses predictive accuracy and broad applicability. Hence, this approach can provide decision support for condition monitoring and predictive maintenance of marine mechanical systems.

Keywords:

marine diesel engine; fault prediction; multi-layer attention; hybrid deep learning; engine operation

1. Introduction

Shipping, as a crucial carrier of global trade, transports the vast majority of goods worldwide in an economical and reliable manner. Over 80% of current global trade is conducted through the shipping industry [1]. As complex systems that navigate ‘independently’ at sea, the safety and reliability of ships are paramount. Timely detection of the development of potential faults and the implementation of maintenance measures before they evolve into serious faults are effective ways to enhance the safety and availability of ship systems [2].

As the core components of engine room power systems, the operational conditions of marine diesel engines directly influence the safety and economy of maritime navigation. Given the long operational hours and high energy demands of transport ships, diesel engines are expected to remain the preferred power equipment for intelligent commercial vessels for the foreseeable future [3]. Owing to the complex structures of marine diesel engines and their continuous operation in environments with high temperatures and pressures, they are prone to higher probabilities of malfunctions. Additionally, during operation, the gradual aging of components leads to a decline in performance. Faults in marine diesel engines primarily occur throughout the combustion process, encompassing all components involved in combustion, and there are a wide variety of types. These faults can be categorized into four main types based on their structures: (1) fuel system faults, including but not limited to fuel nozzle wear and fuel filter blockages, which can lead to decreased combustion efficiency and unstable operation [4,5]; (2) intake and exhaust system faults, such as intake and exhaust valve failures, as well as exhaust pipe blockages and leaks, which can impact engine performance and emissions [6,7]; (3) lubrication system faults, which cause increased wear of engine components, mainly due to filter blockages and oil leaks [8]; and (4) cooling system faults, which can result in engine overheating and are caused by issues like cooling water pump failures and poor coolant quality [9]. According to a research report by The Swedish Club, a maritime insurance company, the proportion of claims costs for ship machinery increased from 35% during 2010–2014 to 48% during 2015–2017, marking a 35% increase [10]. Specifically, claims related to engine failures accounted for 28% of all machinery claims and 34% of the costs. Thus, effective maintenance of marine diesel engines is crucial in reducing the risk of failure and minimizing downtime losses.

Currently, preventive maintenance remains the long-standing preferred strategy for ship operators [11]. It is carried out on a fixed and predetermined schedule, making its implementation relatively straight-forward. Maintenance schedules are determined based on observed equipment conditions, manufacturer recommendations, and long-term experience, thereby enhancing vessel safety to some extent. However, the primary drawback of preventive maintenance is its inability to accurately predict the actual timing of failure and the optimal intervals for component replacement. This often leads to suboptimal utilization of materials and labor, as well as increased operational costs [12]. The growing complexity of shipboard systems is prompting a shift towards condition-based maintenance (CBM) strategies. At the heart of CBM is condition monitoring, specifically the ongoing collection of data about relevant equipment to assess its operational status and identify potential faults, allowing maintenance to be performed based on the actual condition of the machinery [13]. According to Ahmad et al. [14], condition-based maintenance methods can extend the maintenance cycles of machinery by 50% and reduce operating costs by 25% to 45%. Moreover, transitioning from scheduled, rule-based maintenance to a condition-based predictive approach can yield more accurate and timely maintenance. This not only ensures the safety of ship systems but also reduces material waste and unnecessary inspections.

With the rapid development of ship intelligence and digitalization, optimizing maintenance strategies for marine vessels has become one of the most crucial ways to reduce operating costs and enhance vessel safety. Currently, in the field of marine diesel engine condition monitoring and fault prediction, methods are primarily categorized into three types: knowledge based, model based, and data driven. Knowledge-based methods integrate accumulated experience and extensive knowledge into fault detection, identifying the fault conditions of the target system by setting specific rules [15]. Jiang et al. [16] employed Dempster–Shafer evidence theory to model uncertainty in engine fault diagnosis and subsequently verified the effectiveness of this approach using established decision-making rules. Xu et al. [17] utilized belief rules to transform multi-class fault problems into several parallel binary classification problems. They developed an expert system for engine fault diagnosis that was applied to abnormal wear detection in a specific type of marine diesel engine; this system is capable of identifying concurrent fault modes. Moreover, Gharib et al. [18] developed an improved expert system by selecting the most effective diagnostic parameters for diesel engine subsystems. This system determines the real-time technical conditions of marine diesel engines. Knowledge-based methods are highly cost-effective due to their minimal requirements for extensive data collection and precise mathematical models, and they excel in reducing downtime and providing quick responses. However, these methods also have limitations, as they rely heavily on the accumulation of long-term experience and knowledge, which leads to poor scalability. Moreover, their practical application faces challenges due to the high integration of complex systems and the emergence of new faults.

Model-based methods are the most traditional approaches to fault prediction. They involve developing precise analytical mathematical models based on the internal mechanisms of the subject under study, thereby revealing its intrinsic characteristics. These methods are ideal for systems with well-defined mechanistic structures and extensive process information. By evaluating the pattern of change in the residual difference between the output value from a model and the observed value from a system, the operating state of the system is identified. Common model-based methods include the parameter estimation method, state estimation method, and equivalent space method [19]. Scappin et al. [20] developed a zero-dimensional model of a low-speed diesel engine using energy balance and the ideal gas law to assess engine performance over a complete crank cycle. Its deviation range was experimentally confirmed to be less than 5%. Similarly, Llamas et al. [21] developed a control-oriented mean value model for a large two-stroke engine with Exhaust Gas Recirculation (EGR). Dynamic validation across multiple scenarios confirmed that the transient prediction capability of this model was satisfactory. Moussa Nahim et al. [22] studied the influence of faults on the cooling and lubrication systems of marine diesel engines by developing a physical model. Moreover, given the limitations of obtaining fault data from engines, Pagán Rubio et al. [23] established a diesel engine fault simulator based on thermodynamic models. This simulator can emulate engine behavior under normal and failure conditions and can build a reliable fault database for diagnostic purposes. The model-based methods mentioned above have demonstrated notable successes in fault prediction research in various fields, particularly in scenarios where obtaining large amounts of data is challenging. However, the accuracy of fault identification using these methods relies heavily on the precision of the models. As the system complexity increases, establishing precise models becomes challenging; furthermore, complex models often require significant computational resources, posing challenges for real-time applications [24]. Additionally, as technology continually advances, the internal mechanisms of various systems are becoming more sophisticated, leading to uncertainties and issues that vary over time, which can also impact the reliability of fault prediction. Compared to condition monitoring and prediction, model-based methods hold significant potential for designing and controlling marine systems [25].

With the integration of advanced data acquisition technology in marine machinery [26], data-driven methods have been widely researched in areas such as fault prediction, anomaly identification, and remaining useful life prediction [27,28]. Due to their potential to extract representative features and perform data mining and their exclusive reliance on the historical data of a system, these methods are highly suitable for practical applications. Data-driven methods can be divided into the statistical methods of the past and more recent techniques such as machine learning and deep learning [29]. Traditional statistical methods include Autoregression [30], Kalman filtering [31], and Bayesian networks [32]. They perform well with relatively simple data but have certain limitations when processing complex datasets. As “black box” models, machine learning and deep learning typically do not account for the physical mechanisms of the subject under study. Instead, they aim to find the optimal relationship between the input and the output, leading to superior performance [33]. Machine learning (ML) methods correlate patterns and features discovered in data with specific learning tasks through suitable data transformations and extractions. Lazakis et al. [34] identified the key parameters of a main engine through fault tree analysis and failure mode response analysis. They employed Artificial Neural Networks (ANNs) to predict faults and fault locations in the diesel engine. Similarly, Raptodimos et al. [11] developed a Nonlinear Autoregressive with Exogenous Inputs (NARX) ANN model to forecast the future performance parameter values of marine main engines, providing a prediction model for CBM in shipping. Tan et al. [35,36] conducted a comparative study of several representative one-class classifiers for anomaly detection in marine mechanical systems. Subsequently, they compared several advanced multi-label classification algorithms and validated them with simulated data, which supported their application in the condition monitoring and fault diagnosis of marine systems. Nevertheless, these machine learning-based methods often necessitate prior knowledge and are typically only applicable within specific domains. Furthermore, the simplistic structures of shallow models often lead to inadequate feature extraction capabilities and challenges in effectively identifying incipient faults.

Compared to shallow ML models, deep learning methods demonstrate superior performance in prediction [37]. Recently, deep learning methods have achieved significant advancements in fault prediction within marine systems, which have been attributed to their efficient multi-layer nonlinear mapping and adaptive capabilities. Among these, the Convolutional Neural Network (CNN) has exhibited outstanding performance in temporal feature extraction. Shahid et al. [38] proposed a real-time diagnostic method employing a CNN to detect misfires and load changes during engine operations. To achieve wear fault diagnosis and wear mechanism identification in marine diesel engines’ main bearings, Zhou et al. [39] explored the fusion of recursive graphs with CNNs and confirmed the effectiveness of this method across various typical working conditions. Long Short-Term Memory (LSTM) is a type of recurrent neural network that incorporates a gate mechanism and a memory unit to handle long-term sequences. Owing to its ability to learn long-term dependencies, LSTM has been extensively applied to time series tasks. Han et al. [40] utilized LSTM networks to develop a fault prediction model and validated its performance using sampled data from a hybrid power laboratory. Using data collected from a marine cabin system, Zhou et al. [41] developed an LSTM-based exhaust temperature prediction model suitable for thermal load prediction, fault detection, and diagnosing marine diesel engines. Furthermore, hybrid models, which combine the advantages of multiple models, exhibit superior generalization capabilities compared to single models. For instance, Hong et al. [29] combined CNN and RNN algorithms to construct a hybrid deep learning model for predicting the exhaust gas temperatures of gas turbines with dynamic and nonlinear characteristics. Similarly, Velasco-Gallego et al. [42] developed an ensemble model combining a 1D-CNN and LSTM to predict the remaining useful lives of turbochargers in diesel generators. Generally, the hybrid model based on the CNN and LSTM outperforms models using the CNN or LSTM alone. However, as sequence lengths increase, the performance of LSTM tends to decline due to its limited ability to capture long-term dependencies [43].

Attention mechanisms originate from human vision. They allocate resources to focus more on relevant information while ignoring irrelevant information, facilitating models to yield improvements in learning tasks [44]. Consequently, attention mechanisms can extract advanced features and capture long-term dependencies. Currently, some techniques that combine a CNN and LSTM have shown promising performance in fault prediction. However, previous studies have predominantly focused on stable operating conditions, and further research is needed to explore incorporating different levels of attention into fault prediction for marine engines. The above observations motivated the current study. In this paper, considering the complex and variable operating conditions of marine vessels during actual navigation, it was imperative for prediction models to possess high adaptability and robustness. To address this, we proposed a novel hybrid deep learning framework, CNN-LSTM-MLA, which fuses CNN-LSTM with multi-layer attention mechanisms. This framework effectively extracts local and temporal features from data and allocates feature weights reasonably. Furthermore, we conducted research on ship system condition monitoring and fault prediction by integrating the EWMA control chart used in engineering analysis and applied this method to marine engines operating under different conditions. Finally, the primary objective of this paper was to use a hybrid model fused with multi-layer attention mechanisms to accurately predict the future performance parameter values of a marine diesel engine. Based on this, we explored methods for detecting engine anomalies under different operating conditions. This approach was used to obtain early warnings of potential faults and provide support for the predictive maintenance of marine machinery. The contributions of this study are summarized as follows:

By introducing multi-layer attention fusion into CNN-LSTM, the model can select optimal features and capture long-term dependencies, thereby improving prediction accuracy under different operating conditions.
We conducted extensive experiments using engine data under various navigation conditions, and the results demonstrate that the proposed CNN-LSTM-MLA model exhibited satisfactory accuracy and adaptability in different operating conditions.
The combination of hybrid deep learning models and the EWMA control chart effectively identified anomalies in the long-term signal sequences of the diesel engine, indicating that it is suitable for predicting progressive faults that evolve slowly.

The rest of this paper is organized as follows: In Section 2, we introduce the main parameters of marine diesel engines and present our method in detail according to the proposed framework. Then, Section 3 presents a case study of marine engines, including dataset and comparison methods, extensive experimental results, and fault prediction research. Finally, we conclude this study and comment on future research in Section 4.

2. Methodology

2.1. Marine Diesel Engine and Data

As a complex power conversion device, a marine diesel engine has strong correlations between subsystems and components. This paper focuses on a Wartsila 9L34DF dual-fuel medium-speed turbocharged diesel engine installed on a liquefied natural gas carrier. Serving as the prime mover for electric propulsion, this engine has a stable rotational speed of 750 rpm during operation. Detailed technical specifications are provided in Table 1.

This engine is similar to a conventional diesel engine, except that natural gas can be used as a fuel to meet emission standards when emission requirements are stringent. The basic layout is illustrated in Figure 1. During the operation of the diesel engine, a significant amount of state data representing machine performance are generated. We conducted a case study on the prediction of the exhaust gas temperature (EGT) because it was one of the most important thermal parameters in the actual engine room, reflecting the combustion performance of the cylinders. It served as an important indicator for measuring the efficiency and health status of the engine and formed the basis of engine condition monitoring and fault prediction.

Based on the operating principles of the diesel engine, the EGT was influenced by many parameters. To clarify the data requirements for predicting the diesel engine EGT, we conducted data collection based on four dimensions: thermodynamics, lubricants, vibration, and operating parameters. The collected data came from the real-time operation data of the 9L34DF marine dual-fuel power generation diesel engine, and sensors installed in the ship’s engine room were used to complete the data acquisition. In addition, to capture more transient changes in the data and ensure the quality of the model training, we uniformly collected data at a frequency of 1 Hz. Figure 2 presents line graphs and descriptive statistics of some operational parameters of the marine diesel engine under the anchoring condition, including a total of 28,800 monitoring points, i.e., 8 h of operational data. The thermodynamic parameters include the exhaust gas temperature and air cooler outlet temperature, with mean values of 393.2 °C and 86.8 °C, respectively. The lubricant parameters comprise the turbocharger lubricating oil outlet temperature and lubricating oil inlet pressure, with mean values of 71.0 °C and 6.74 bar, respectively. The vibration parameters consist of the crankcase pressure and engine torsional vibration, with mean values of −0.03 bar and 27.9 mrad, respectively. The operating parameters include the relative engine load and turbocharger speed, with mean values of 13.3% and 8062 r/min, respectively.

The Pearson Product-Moment Correlation Coefficient (PPMCC) was employed in this study to analyze the correlations between variables and determine the final input parameters. The pre-selected operational parameters of the diesel engine included the engine exhaust gas temperature (

T_{e} (℃)

), air cooler outlet temperature (

T_{c} (℃)

), turbocharger lubricating oil outlet temperature (

T_{l} (℃)

), lubricating oil inlet pressure (

P_{l} (b a r)

), crankcase pressure (

P_{c} (b a r)

), engine torsional vibration value (

V_{t} (m r a d)

), engine load (

L (%)

), and turbocharger speed (

R (r p m)

). The Pearson correlation coefficient is used to describe the correlation between two variables, and its calculation formula is as follows:

ρ = \frac{1}{n - 1} \sum_{i = 1}^{n} (\frac{X_{i} - \bar{X}}{δ_{X}}) (\frac{Y_{i} - \bar{Y}}{δ_{Y}})

(1)

where

\bar{X}

and

δ_{X}

are the mean and standard deviation of the sample (

X_{i}

). The Pearson coefficient (

ρ

) has a range of [−1, 1], where values closer to 1 or −1 indicate stronger correlations between the two features, while values closer to 0 imply weaker correlations.

Figure 3 shows the correlations between the aforementioned parameters, where each row/column denotes one parameter. Contour plots representing kernel density are shown in the lower left, frequency histograms are shown diagonally, and scatter plots are shown in the upper right. The computed results revealed that the correlation coefficients between the engine exhaust gas temperature (

T_{e}

) and the air cooler outlet temperature (

T_{c}

), engine load (

L

), and turbocharger speed (

R

) were all greater than 0.6, indicating strong correlations. Based on these findings, the four features mentioned above were selected as inputs for the predictive model to forecast the EGT of the diesel engine at the next time step.

2.2. Framework of the Proposed Method

The framework of the proposed method is illustrated in Figure 4, where the prediction model is a deep learning framework that integrates multi-layer attention mechanisms. This framework fully exploits the high anti-interference capability of the hybrid model. Attention mechanisms were introduced into both the CNN and LSTM models to enhance the adaptive adjustment of data weights in the network, increase the focus on key information, and prevent the problem of important information being overlooked due to long sequences of data. The steps of the proposed method are described below.

Step 1: Data collection. In this research, monitoring data from the engine when the LNG ship was in three states—constant-speed navigation, maneuvering navigation, and anchoring—were selected as the raw data. The details of data collection are provided in Section 2.1.

Step 2: Data preprocessing. This process involves removing outliers, filling in missing data, and denoising and normalizing the raw data. Subsequently, the processed data are partitioned into training and testing datasets. Details of the data preprocessing are provided in Section 3.1.

Step 3: Model construction. The proposed CNN-LSTM-MLA model is a hybrid model that fuses multi-layer attention mechanisms, which enables it to learn more representative features from the engine signal sequence. Specifically, the first step involves extracting local features from input sequences with varying characteristics using the CNN model and enhancing the feature extraction capability through self-attention mechanisms. Then, the output from the CNN is input into the LSTM, which employs multi-head attention to bolster its ability to capture long-term dependencies, thereby exploring the significance of various time steps. In this way, improvements in convergence speed, predictive capacity, and accuracy were achieved for the model.

Step 4: Model evaluation and visualization. The CNN-LSTM-MLA network is trained using the training dataset, and predictions are made and evaluated using the testing dataset. The predicted results are then visualized for analysis.

Step 5: Fault prediction. Based on the distribution of residuals between the model’s prediction value and the actual value, control limits for fault thresholds are constructed using EWMA control charts. Through this method, abnormalities during the operation of the engine can be discovered, and fault warnings can be obtained.

Figure 4. Main framework of the proposed method.

2.3. Model Construction

The prediction model in this paper predicts future EGT changes in the marine diesel engine based on multivariate factors from the past. The multiple feature variables (

x

) and the corresponding EGTs (

y

) can be represented as follows:

x = \{x_{1}, \dots, x_{t}, \dots, x_{T}\}

(2)

x_{t} = \{x_{t 1}, \dots, x_{t n}, \dots, x_{t N}\}

(3)

y = \{y_{1}, \dots, y_{t}, \dots, y_{T}\}

(4)

where

T

and

N

are the total length of the time steps and the number of features, respectively.

x_{t}

is the feature set at time step

t

. The prediction problem of the engine EGT uses the time series of multivariate factors

{\{x_{t}\}}_{t = 1}^{T}

and the EGT

{\{y_{t}\}}_{t = 1}^{T}

as inputs and constructs a model (

F

) to predict

y

at future time steps:

{\{{\hat{y}}_{t}\}}_{t = T + 1}^{T + ∆} = F ({\{x_{t}\}}_{t = 1}^{T}, {\{y_{t}\}}_{t = 1}^{T})

(5)

The basic structure of the proposed hybrid model is shown in Figure 5. It consists of four parts: a multi-feature input layer, an attention-based CNN fusion subnet, an attention-based LSTM fusion subnet, and an output layer. Among them, the role of the input layer is to receive multi-feature input data and normalize them. The CNN model initially extracts the input sequence features and evaluates the relative importance of each feature through self-attention. Then, the output of the CNN is fed into the LSTM, which employs a multi-head attention mechanism to enhance global features and the long-term dependency extraction capability. Finally, the output layer predicts the EGT of the engine.

The following is a detailed description of the internal structure of the CNN-LSTM-MLA model:

2.3.1. CNN Fusion Subnet Based on Self-Attention

The hidden layers of a CNN are usually composed of a convolutional layer, a pooling layer [45], and a fully connected layer. Among them, the convolutional layers include multiple convolution filters and are the core of the network. At time step

t

, each node of the convolutional layer extracts features from the input sequence

x_{t} = \{x_{t 1}, \dots, x_{t n}, \dots, x_{t N}\}

through convolutional operations and generates feature maps. The expression of convolution is

Z = R e L U (x_{t} \otimes W_{z} + b_{z})

(6)

where

Z

is the output of the convolution layer, while

W_{z}

and

b_{z}

represent the weight and bias, respectively.

R e L U

is used as the activation function, and

\otimes

denotes the convolution operation. Subsequently, the pooling layer compresses the feature map produced by the convolution layer to reduce the computational complexity of the network and extract the main features. The process is described as follows:

P = m a x (Z) + b_{p}

(7)

where

P

represents the output of the pooling layer,

m a x

refers to the maximum pooling method, and

b_{p}

is the bias. After the convolution and pooling operations, the fully connected layer converts the feature map into a one-dimensional vector suitable for the attention mechanism. The expression of the fully connected layer is

c_{t} = S i g m o i d (P W_{c} + b_{c})

(8)

where

c_{t} = \{c_{t 1}, \dots, c_{t n}, \dots, c_{t N}\}

represents the sequence feature results extracted by the CNN model;

W_{c}

and

b_{c}

are the weight and bias, respectively; and

S i g m o i d

is used as the activation function.

To further enhance the feature extraction capabilities of the CNN model, an attention mechanism was integrated. The sequence features extracted by the CNN model are processed according to Equation (9):

u_{t n} = t a n h (W_{u} c_{t n} + b_{u})

(9)

where

W_{u}

and

b_{u}

, respectively, represent the weight and bias while

t a n h

is used as the activation function. In the actual training process, the extracted preliminary features did not play equal roles in the prediction task, so a self-attention mechanism was introduced to measure the importance of each feature. Features are probabilistically weighted and aggregated into feature vectors, which are then input into a layer that captures long-term relationships. This process can be expressed as follows:

m_{t n} = \frac{e x p ({(u_{t n})}^{T} w_{t})}{\sum_{n} e x p ({(u_{t n})}^{T} w_{t})}

(10)

a_{t}^{c} = \sum_{n} m_{t n} c_{t n}

(11)

where

m_{t n}

is the normalized weight derived via the

s o f t m a x

function,

a_{t}^{c}

is the output of the CNN model fused with the attention mechanism at time

t

, and

w_{t}

is the parameter of learning during the training process.

2.3.2. LSTM Fusion Subnet Based on Multi-Head Attention

The CNN fusion subnet addressed the challenge of extracting key features during the prediction process, but it lacked the capability to learn and capture long-term dependencies in a time series. To address this, we introduced an LSTM network that received the output

a^{c} = \{a_{1}^{c}, \dots, a_{t}^{c}, \dots, a_{T}^{c}\}

from the CNN fusion subnet. The LSTM is a chain-structured model comprising four repeated layers and is ideal for handling time series problems. The LSTM unit structure consists of input gates (

i_{t}

), forget gates (

f_{t}

), and output gates (

o_{t}

). An input gate controls the addition of new information, a forget gate selectively erases data in the storage cell (

C_{t}

), and an output gate generates the current output upon acquiring a new state. The corresponding process can be expressed as follows:

i_{t} = s i g m o i d (W_{i} \cdot [h_{t - 1}, a_{t}^{c}] + b_{i})

(12)

f_{t} = s i g m o i d (W_{f} \cdot [h_{t - 1}, a_{t}^{c}] + b_{f})

(13)

{\tilde{C}}_{t} = t a n h (W_{\tilde{C}} \cdot [h_{t - 1}, a_{t}^{c}] + b_{\tilde{C}})

(14)

C_{t} = f_{t} \otimes C_{t - 1} + i_{t} \otimes {\tilde{C}}_{t}

(15)

o_{t} = s i g m o i d (W_{o} \cdot [h_{t - 1}, a_{t}^{c}] + b_{o})

(16)

h_{t} = o_{t} \otimes t a n h (C_{t})

(17)

where

a_{t}^{c}

represents the input sequence,

h_{t - 1}

is the hidden layer state at time

t - 1

, and

{\tilde{C}}_{t}

is the temporary state of the memory cell at the input gate at time

t

.

W_{f}

,

W_{i}

,

W_{\tilde{C}}

, and

W_{o}

are the weight matrices of the forget gate, input gate, cell state, and output gate, respectively.

b_{f}

,

b_{i}

,

b_{\tilde{C}}

, and

b_{o}

are the bias terms of the forget gate, input gate, cell state, and output gate, respectively.

h_{t}

is the output value, and

\otimes

denotes the Hadamard product.

To address the issue of information loss due to excessively long sequences during the LSTM model training process, we introduced a multi-head attention mechanism as a supplement to LSTM. Multi-head attention allows the model to focus on different parts of the input sequence from various perspectives and can more effectively mine long-distance data features of correlated time series, which helps to improve the robustness and capacity of the model. Similar to Equation (9), we calculated the hidden representation

H^{l} = \{H_{1}^{l}, \dots, H_{t}^{l}, \dots, H_{T}^{l}\}

of

h_{t}

via the FC neural network and transferred it to the multi-head attention module. The expression for the above process is as follows:

Q_{i} = H_{t}^{l} \times W_{q i}

(18)

K_{i} = H_{t}^{l} \times W_{k i}

(19)

V_{i} = H_{t}^{l} \times W_{v i}

(20)

where

Q_{i}

,

K_{i}

, and

V_{i}

, respectively, represent the query matrix, key matrix, and value matrix.

W_{q i}

,

W_{k i}

, and

W_{v i}

are the learnable matrices during the training process. The calculation of the attention value for the

i - th

attention head is carried out as follows:

{h e a d}_{i} = A t t e n t i o n (Q_{i}, K_{i}, V_{i})

(21)

A t t e n t i o n (Q_{i}, K_{i}, V_{i}) = s o f t m a x (\frac{Q_{i} {(K_{i})}^{T}}{\sqrt{d_{k}}}) V_{i}

(22)

where

d_{k}

represents the feature dimension of the keys and is used for weight scaling.

s o f t m a x

is used as the activation function. The multi-head attention mechanism divides the time series into

h

subspaces, with each head performing self-attention calculations on its subspace to enhance its expressive power. After that, the outputs from each head are integrated and concatenated, followed by a linear transformation to obtain the final result. This process is expressed as follows:

a^{l} = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{i}, \dots {h e a d}_{h}) W^{o}

(23)

where

C o n c a t

represents the concatenation operation used to merge the information from

h

attention heads;

W^{o}

denotes the weight of the linear transformation; and

a^{l}

is the final output result, which can learn more feature information from different spaces.

3. Experimental Results and Discussion

This section provides an overview of the dataset, the evaluation metrics, the experimental results, and marine diesel engine fault prediction. In the experiments, data from a marine diesel engine under three distinct operating conditions were used to evaluate the effectiveness of the model and to compare it with other methods. The experimental methods were implemented in Python 3.7.

3.1. Datasets

3.1.1. Experimental Data Sources

Table 2 provides a detailed description of the experimental dataset for the marine diesel engine under three distinct operating conditions: constant-speed navigation, maneuvering navigation, and anchoring. Specifically, under constant-speed navigation, the real-time speed remained stable at 10 knots; the data for maneuvering navigation were derived from the deceleration phase just before the ship dropped anchor; and during anchoring, the main engine operated under a lower load while the ship remained moored at the anchorage. It should be noted that the range of engine operating parameters mentioned here only represents the dataset used in this study. Data collection spanned from 26 October 2023, to 3 November 2023, primarily in the Beibu Gulf of the South China Sea.

3.1.2. Data Preprocessing

The complex operational environment of the marine diesel engine can compromise sensor reliability, potentially resulting in anomalies or missing fields in records. This can lead to issues such as data loss, excessive noise interference, and discrepancies with the actual values in the original ship data. The quality of data significantly influences EGT prediction results; therefore, preprocessing collected raw data to standardize them is essential.

Filling missing values: Due to equipment communication failures and external environmental influences, a small amount of data may be missing from an original dataset. Given the time dependency of the relevant features, directly removing these data points can lead to discontinuities. We employed the mean interpolation method to fill in the missing data, averaging the values from the moments before and after the missing point. If the missing data spanned a substantial period, we calculated the difference between the data points immediately before and after the missing period and applied linear interpolation to fill the gaps.

Data denoising: Reducing data noise can reveal the intrinsic relationships among various parameters, thereby enhancing the generalization capability of a model. We utilized a soft–hard threshold compromise method to denoise the feature data, selecting a wavelet decomposition level of five and applying the sym8 wavelet to process the data to eliminate high-frequency noise and achieve a smoother sequence. This soft–hard threshold compromise method fully leveraged the strengths of both thresholds, increasing the depth of noise reduction while maximally preserving useful signals.

Data normalization: There were significant dimensional differences between different features, such as engine load values and turbocharger speeds. To mitigate the effects of dimensional disparities among the features and reduce the computational burden in subsequent model processing, we applied the Min–Max normalization method. This technique maps the data values of diverse features to a scale with a uniform magnitude. Its specific calculation formula is as follows:

X_{t}^{'} = \frac{X_{t} - X_{t_{m i n}}}{{X_{t_{m a x}} - X}_{t_{m i n}}}

(24)

where

X_{t}^{'}

is the new value after normalization,

X_{t}

is the original value to be normalized,

X_{t_{m a x}}

is the maximum value of the data sample, and

X_{t_{m i n}}

is the minimum value of the data sample.

After the aforementioned processing, the data were divided into training and test sets in a 7:3 ratio for training and testing each model. Specifically, for the constant-speed navigation, maneuvering navigation, and anchoring conditions, the training and test sets contained 20,160 and 8640, 5600 and 2400, and 28,000 and 12,000 samples, respectively.

3.2. Evaluation Metrics

We evaluated the predictive performance of the model under different operating conditions using Root-Mean-Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Mean Bias Error (MBE), and compared this model with other models. These metrics are defined as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}

(25)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(26)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{{\hat{y}}_{i} - y_{i}}{y_{i}}|

(27)

M B E = \frac{1}{n} \sum_{i = 1}^{n} ({\hat{y}}_{i} - y_{i})

(28)

where

n

represents the number of prediction points, while

y_{i}

and

{\hat{y}}_{i}

denote the actual and predicted values of the

i - th

data point, respectively. The aforementioned metrics are commonly utilized as evaluation indicators in time series forecasting. The model with the lowest values for these metrics is considered the best prediction model.

3.3. Results and Discussion

3.3.1. Prediction Results Using the Proposed Method

In this study, to verify the effectiveness of the proposed method, training and prediction were performed using datasets from three distinct operating conditions. Based on the same training dataset, an LSTM model, an LSTM-Attention model, and a CNN-LSTM model were selected for comparison with the method presented here. For the aforementioned deep learning methods, we used MAE as the performance evaluation metric and employed Bayesian optimization to determine the optimal hyperparameters. Each of the four methods was run 30 times on the constant-speed navigation condition dataset, yielding 30 sets of hyperparameter combinations. The combination with the smallest MAE value was selected as the final hyperparameter set and applied to the EGT prediction, and the results are shown in Table 3. When predicting the EGT under the other conditions, the optimal hyperparameters were also determined using Bayesian optimization. Due to space limitations, these results are not listed individually. The overall and local prediction results for the test datasets of the three operating conditions are shown in Figure 6, Figure 7 and Figure 8. Additionally, Figure 9 displays scatter density maps of the predictions and observed results using different models. The distributions of the scatter points provide an intuitive visualization of the fitting accuracy of each model. It is evident that, across various operating conditions, the method proposed here consistently outperformed the other models. Further details are provided below.

Prediction results for the constant-speed navigation dataset: Under constant-speed navigation conditions, the heading remained stable and the engine operated under high load with an elevated EGT. Consequently, the dataset for constant-speed navigation conditions demonstrated the least fluctuation among the three datasets, and its overall prediction performance is shown in Figure 6, which illustrates that the predicted values obtained using the method proposed in this paper nearly overlapped with the actual values, signifying satisfactory prediction performance. Furthermore, as depicted in the scatter density map in Figure 9, compared to the other three deep learning methods, our method exhibited the least deviation between the predicted and actual values.

Figure 6. Comparison curves of actual and predicted values for different methods under the constant-speed navigation condition.

Prediction results for the maneuvering navigation dataset: The ship frequently steers and operates at relatively low navigation speeds and engine loads under maneuvering navigation conditions. As the experimental data for this condition primarily originated from the deceleration phase before the ship dropped anchor, this dataset was the smallest among the three but exhibited the largest fluctuations. The overall prediction performance is shown in Figure 7. The prediction results of the four methods were basically the same as the actual EGT trend. However, the CNN-LSTM hybrid model, which fuses multi-layer attention mechanisms, demonstrated a distinct advantage over the other models. It yielded results that were numerically closer to the actual values and more accurately reflected the changes in EGT under maneuvering conditions. Similarly, the corresponding scatter density map shown in Figure 9 indicates that the method proposed in this paper offered the best prediction performance.

Figure 7. Comparison curves of actual and predicted values for different methods under the maneuvering navigation condition.

Prediction results for the anchoring condition dataset: Since the studied diesel engine serves as the prime mover of an electric propulsion ship, it continues to operate under a certain load even when the ship is at anchor, despite there being no actual speed. This dataset was the largest among the three datasets, and its overall prediction performance is shown in Figure 8. Although all four methods maintained trends consistent with the actual values, the LSTM model exhibited the most significant deviation. In contrast, the model proposed in this paper closely matched the actual values, and enlarged charts demonstrate this model’s superior prediction accuracy at peaks and troughs. Similarly, the corresponding scatter density map is displayed in Figure 9.

Figure 8. Comparison curves of actual and predicted values for different methods under the anchoring condition.

Figure 9. Scatter density maps of predictions and observed results using different methods under three conditions. (a) Constant-speed navigation. (b) Maneuvering navigation. (c) Anchoring.

3.3.2. Prediction Results Characterized by Metrics

To test the performance of the CNN-LSTM-MLA model more comprehensively, in addition to the EGT prediction curves, we also used four metrics (RMSE, MAE, MAPE, and MBE) to evaluate the results obtained by the proposed model on the test datasets. The final prediction results for the test datasets under three conditions (constant-speed navigation, maneuvering navigation, and anchoring) are detailed in Table 4. The optimal values for all methods are highlighted in bold.

As shown in Table 4, under various operating conditions, the proposed method consistently outperformed the other methods on the test datasets. Specifically, in comparison to LSTM, LSTM-Attention, and CNN-LSTM, the RMSE accuracy of our method on the constant-speed navigation dataset improved by 51.7%, 23.1%, and 20.6%, respectively. For maneuvering navigation predictions, our method increased RMSE accuracy by 59.9%, 50.0%, and 40.8%, respectively, compared to the other models. In predictions of EGT during anchoring, our method’s RMSE accuracy showed improvements of 47.9%, 24.9%, and 27.3%, respectively, compared to LSTM, LSTM-Attention, and CNN-LSTM. The rankings for the other metrics were similar, demonstrating that our proposed method achieved superior accuracy.

Generally, the more complex the engine operating conditions, the more challenging it becomes to make accurate predictions, and the performances of various methods tend to decline. It is evident that under multiple typical conditions, the reductions in performance of the proposed method were the smallest. Consequently, we conclude that compared to existing time series prediction methods, the CNN-LSTM-MLA model proposed in this study demonstrated higher accuracy in predicting the EGT of a marine diesel engine. To visually demonstrate the performance of each model, the histograms in Figure 10 display the comprehensive results of four evaluation metrics across datasets from various operating conditions.

3.3.3. Analysis of Results of the Method without the Attention Mechanism

To demonstrate the efficacy of the multi-layer attention mechanism, we conducted experiments by either excluding or substituting components and compared three structurally similar models, CNN-LSTM, CNN-LSTM-SA, and CNN-LSTM-MHA, with our proposed model to observe changes in model performance. Among them, CNN-LSTM represents a model without an attention mechanism, while CNN-LSTM-SA and CNN-LSTM-MHA, respectively, denote the exclusion of multi-head attention and self-attention.

We conducted an ablation study using the anchoring dataset, and the prediction results are shown in Figure 11. Table 5 summarizes the comparative results of the models. It is evident that compared to CNN-LSTM-MLA, the RMSE prediction accuracy of CNN-LSTM, CNN-LSTM-SA, and CNN-LSTM-MHA decreased by 27.29%, 12.49%, and 8.63%, respectively. Similar trends were noted across the other evaluation metrics, underscoring the significance of the attention mechanism.

Additionally, we discovered that different attention mechanisms had varying impacts on the results. Model 2 exhibited a greater decline in performance than Model 3. Networks with multi-head attention mechanisms demonstrated higher accuracy, indicating that selectively fusing different time steps is more important than fusing different features. These findings affirm that the multi-layer attention mechanism in the CNN-LSTM-MLA model substantially improves the prediction accuracy of the EGT.

3.4. Fault Prediction Research

We conducted a predictive study on progressive faults in marine diesel engines. Initially, this type of fault presents with minor abnormalities that cause a gradual increase or decrease in the EGT. As the fault progresses, these variations intensify, culminating in distinct and severe issues. Furthermore, should the EGT rise or fall beyond a certain threshold, it could potentially lead to engine shutdown.

For this type of fault, predictions of the EGT were made under various engine operating conditions using the optimal prediction method, CNN-LSTM-MLA. Subsequently, an Exponentially Weighted Moving Average (EWMA) control chart was constructed based on the distribution of residuals between the predicted and actual values. The EWMA control chart is a process control mechanism that is commonly used to monitoring changes in time series and achieve sequence stability. It does not require the assumption that data follow a normal distribution and therefore provides robustness in time series anomaly detection. This method employs adaptive upper and lower boundaries as thresholds, assigns different weights to data, incorporates a smoothing factor to calculate the moving average at the current moment, and classifies data points that exceed these boundaries as anomalies. For the time series

T = (x_{1}, x_{2}, x_{3}, \dots, x_{t})

, the calculation formula of the EWMA at time

i

is as follows:

e_{i} = β x_{i} + (1 - β) e_{i - 1} = β \sum_{j = 0}^{i - 1} {(1 - β)}^{j} x_{i - j} + {(1 - β)}^{i} e_{0}

(29)

where

x_{i}

represents the actual data;

e_{i - 1}

is the moving average of the EWMA at the previous moment;

e_{0}

is

x_{1}

from the time series

T

; and

β

(where

β ϵ (0,1]

) serves as the smoothing factor, indicating the impact of the historical data on the current data. The moving average of the historical data diminishes exponentially over time, and the more distant the data from the current moment, the lower their weight in the calculation. A lower

β

yields a smoother control chart, incorporating more historical data into the weighted average, thereby enhancing sensitivity to minor deviations in the data; conversely, a higher

β

amplifies the impact of the current data on the statistic. When the smoothing factor is set to 1, the historical data have no influence on the current data. The upper and lower threshold boundaries of the EWMA are defined as follows:

({U C L}_{i}, {L C L}_{i}) = μ_{0} \pm h \cdot σ_{0} \cdot \sqrt{\frac{β}{2 - β} \cdot {[1 - (1 - β)]}^{2 i}}

(30)

where

μ_{0}

and

σ_{0}

represent the mean and standard deviation, respectively, of a dataset from an engine operating correctly. The adjustment factor (

h

) is employed to control the sensitivity of the EWMA to anomalies. If the EWMA value at time

i

(

e_{i}

) exceeds the upper control limit (

{U C L}_{i}

) or falls below the lower control limit (

{L C L}_{i}

),

x_{i}

is classified as an anomaly.

Since the EWMA control chart possesses the ability to ‘memorize’ historical data, it primarily reflects trend changes in time series. Consequently, it is not sensitive to point anomalies, but it demonstrates excellent detection performance for collective anomalies that occur over a certain time span. Therefore, it is well suited for detecting progressive faults in engines caused by degradation or wear.

3.4.1. Analysis of Normal Operation of the Marine Diesel Engine

When calculating the distribution of the residuals, all datasets represented the healthy operating state of the engine, with the smoothing factor (

β

) set to 0.3. The adjustment factor (

h

) controls the distance between the UCL and the LCL. To ensure that the smoothed residual statistics remained within these threshold boundaries,

h

was determined to be three after iterative calculations. Figure 12 displays the residuals of the EGT of the marine diesel engine under three different operating conditions, plotted on an EWMA control chart. It is noted that the residual sequence became significantly smoother after EWMA processing and the fluctuation amplitude decreased, effectively mitigating the impacts of disturbances and noise in the original residuals. It is evident that although the residual values of the diesel engine varied under different operating conditions, they consistently remained within the control limits. Despite fluctuations under all three conditions, these limits were not continuously exceeded, indicating that the diesel engine had not experienced a fault.

3.4.2. Analysis of Abnormal Operation of the Marine Diesel Engine

Considering that the dataset in this study did not contain fault data, to verify the effectiveness of the proposed method, historical monitoring samples of EGT under various operating conditions were linearly adjusted and expanded, with white noise added to simulate fault conditions. Table 6 provides a detailed description of the adjustments made to the dataset. Based on relevant field knowledge and the summary of marine diesel engine fault types by Cheliotis et al. [46], we conducted experimental validation of engine failure prediction using injector nozzle wear and exhaust valve corrosion as examples. Injector nozzle wear is a typical fuel system fault, while exhaust valve corrosion is a common exhaust system fault. Both cases represented progressive faults in an engine and were thus depicted as sets of points that exceeded the upper and lower limits. Figure 13 and Figure 14 display the respective EWMA fault prediction results for these two cases.

Figure 13 displays EWMA smoothed residual signals for injector fault under various operating conditions. This type of fault is caused by wear of the injector nozzle, which leads to excessive fuel injection and a significant increase in EGT. Taking the constant-speed navigation condition in the figure as an example, the residuals consistently remained within the threshold limits for 2250 s; however, after 2250 s, the residuals began to trend upward, rapidly exceeded the upper control limit, and thereafter increased at a higher rate. This suggests that the diesel engine was operating abnormally and may have soon encountered a severe fault. In practice, increasing wear of the injector nozzle results in a surge in fuel injection, leading to a continuously rising EGT, which, in turn, affects engine operation stability. The EWMA control chart clearly shows that the prediction results using the proposed method aligned with the actual situation, effectively facilitating anomaly monitoring using the engine’s operating status.

Figure 14 displays the EWMA smoothed residual signals for exhaust valve fault under various operating conditions. The primary role of the exhaust valve is to provide high-temperature sealing. With prolonged use, this valve is susceptible to wear and corrosion, which directly impacts the overall performance of the engine. The figure reveals that initially, the residuals fluctuated within the normal range. However, as the step length increased, the deviation between the monitored and predicted values gradually widened, eventually surpassing the lower threshold limit, and was successfully detected. This signifies that the EGT was significantly lower than usual at this time, which was a result of the deterioration in the sealing of the exhaust valve due to corrosion. This led to air leaks and a consequent deviation in the EGT from the normal values. Consequently, it is crucial for engine maintenance personnel to perform necessary maintenance and repair work to prevent the emergence of more severe faults.

4. Conclusions

In this paper, we propose a hybrid deep learning framework based on a multi-layer attention mechanism, termed CNN-LSTM-MLA, for predicting the EGT of a marine diesel engine under different operating conditions. This method builds on the CNN-LSTM model by incorporating multiple layers of attention mechanisms to increase the weights of important features, thereby enhancing prediction accuracy. The proposed method was validated using datasets from three distinct operating conditions. A comprehensive analysis from multiple perspectives demonstrated that the prediction method described in this paper achieved satisfactory accuracy compared to commonly used time series prediction methods. Additionally, utilizing the model’s prediction results, an EWMA control chart was employed to analyze the distribution of the residuals between the observed and predicted values to predict faults in the marine diesel engine. The main conclusions are as follows:

We conducted extensive experiments with three typical datasets of marine engine operating conditions to thoroughly validate the effectiveness of the proposed method. The results show that CNN-LSTM-MLA outperformed the state-of-the-art methods under various engine operating conditions, exhibiting strong applicability and robustness.
Results from the ablation study indicated that the self-attention and multi-head attention modules played a crucial role in improving the predictive accuracy. The multi-layer attention mechanism effectively captured the internal relevance and long-term dependencies of features, significantly enhancing the model’s predictive capabilities.
Under the complex operating conditions of the engine, the predictive performance of each model generally deteriorated; however, the CNN-LSTM-MLA model continued to exhibit the best performance, which further demonstrated its advantages in handling complex conditions.
The analysis of the residual distribution during normal engine operation revealed that the warning thresholds differed significantly under various operating conditions. Traditional fixed threshold methods may result in false alarms or missed detections.
The operational status of the engine can be monitored in real time, and potential faults can be predicted by analyzing the residuals between actual inputs and the expected EGT using the EWMA control chart. The experimental results of the abnormal operation cases demonstrate that this method can promptly detect anomalies in the engine under various operating conditions and offer support for predictive maintenance of marine diesel engines.

In this study, the adoption of a hybrid architecture in the model inevitably resulted in neural network redundancy. Future work will explore using techniques such as transfer learning to refine the model, allowing it to operate efficiently with limited computational resources, thereby achieving real-time and accurate fault prediction for marine machinery. Additionally, marine engines may experience multiple simultaneous faults during practical operations, but this paper primarily focused on the prediction of potential faults and early warnings of abnormalities during engine operations. Subsequent research could further investigate fault diagnosis and localization.

Author Contributions

Conceptualization, J.S.; methodology, J.S., H.R., Y.D. and X.Y.; software, J.S. and D.W.; validation, J.S., H.R. and H.T.; resources, Y.D. and D.W.; data curation, J.S.; writing—original draft preparation, J.S. and Y.D.; writing—review and editing, H.R., X.Y. and H.T.; funding acquisition, H.R. and X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science Foundation of China (Grant No. 52071312), the Key Science and Technology Projects in the Transportation Industry (Grant No. 2022-ZD3-035), the Applied Basic Research Program Project of Liaoning Province (Grant No. 2023JH2/101300144), the Guangxi Key Research and Development Plan (Grant No. GUIKE AB22080106), and the Dalian Science and Technology Innovation Fund Project (Grant No. 2022JJ12GX035).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset is not publicly available due to confidentiality agreements with the data provider.

Conflicts of Interest

The authors declare no conflicts of interest.

References

UNCTAD. Review of Maritime Transport 2021, 2021st ed.; United Nations: New York, NY, USA, 2021. [Google Scholar]
Ančić, I.; Theotokatos, G.; Vladimir, N. Towards Improving Energy Efficiency Regulations of Bulk Carriers. Ocean Eng. 2018, 148, 193–201. [Google Scholar] [CrossRef]
Taljegard, M.; Brynolf, S.; Grahn, M.; Andersson, K.; Johnson, H. Cost-Effective Choices of Marine Fuels in a Carbon-Constrained World: Results from a Global Energy Model. Environ. Sci. Technol. 2014, 48, 12986–12993. [Google Scholar] [CrossRef] [PubMed]
Reddy, S.M.; Sharma, N.; Gupta, N.; Agarwal, A.K. Effect of Non-Edible Oil and Its Biodiesel on Wear of Fuel Injection Equipment Components of a Genset Engine. Fuel 2018, 222, 841–851. [Google Scholar] [CrossRef]
Cai, C.; Weng, X.; Zhang, C. A Novel Approach for Marine Diesel Engine Fault Diagnosis. Cluster Comput. 2017, 20, 1691–1702. [Google Scholar] [CrossRef]
Liu, Y.; Zhang, J.; Ma, L. A Fault Diagnosis Approach for Diesel Engines Based on Self-Adaptive WVD, Improved FCBF and PECOC-RVM. Neurocomputing 2016, 177, 600–611. [Google Scholar] [CrossRef]
Bai, H.; Zhan, X.; Yan, H.; Wen, L.; Jia, X. Combination of Optimized Variational Mode Decomposition and Deep Transfer Learning: A Better Fault Diagnosis Approach for Diesel Engines. Electronics 2022, 11, 1969. [Google Scholar] [CrossRef]
Wang, J.; Sun, X.; Zhang, C.; Ma, X. An Integrated Methodology for System-Level Early Fault Detection and Isolation. Expert Syst. Appl. 2022, 201, 117080. [Google Scholar] [CrossRef]
Velasco-Gallego, C.; Lazakis, I. RADIS: A Real-Time Anomaly Detection Intelligent System for Fault Diagnosis of Marine Machinery. Expert Syst. Appl. 2022, 204, 117634. [Google Scholar] [CrossRef]
Malm, L.A.; Enstrom, J.; Hager, L.M.; Stalberg, P. Main Engine Damage Study; The Swedish Club: Hong Kong, China, 2020. [Google Scholar]
Raptodimos, Y.; Lazakis, I. Application of NARX Neural Network for Predicting Marine Engine Performance Parameters. Ships Offshore Struct. 2020, 15, 443–452. [Google Scholar] [CrossRef]
Apostolidis, A.; Bouriquet, N.; Stamoulis, K.P. AI-Based Exhaust Gas Temperature Prediction for Trustworthy Safety-Critical Applications. Aerospace 2022, 9, 722. [Google Scholar] [CrossRef]
Kandemir, C.; Celik, M. A Human Reliability Assessment of Marine Auxiliary Machinery Maintenance Operations under Ship PMS and Maintenance 4.0 Concepts. Cogn. Technol. Work 2020, 22, 473–487. [Google Scholar] [CrossRef]
Ahmad, R.; Kamaruddin, S. An Overview of Time-Based and Condition-Based Maintenance in Industrial Application. Comput. Ind. Eng. 2012, 63, 135–149. [Google Scholar] [CrossRef]
Straub, J. Expert System Gradient Descent Style Training: Development of a Defensible Artificial Intelligence Technique. Knowl.-Based Syst. 2021, 228, 107275. [Google Scholar] [CrossRef]
Jiang, W.; Hu, W.; Xie, C. A New Engine Fault Diagnosis Method Based on Multi-Sensor Data Fusion. Appl. Sci. 2017, 7, 280. [Google Scholar] [CrossRef]
Xu, X.; Yan, X.; Sheng, C.; Yuan, C.; Xu, D.; Yang, J. A Belief Rule-Based Expert System for Fault Diagnosis of Marine Diesel Engines. IEEE Trans. Syst. Man Cybern. Syst. 2020, 50, 656–672. [Google Scholar] [CrossRef]
Gharib, H.; Kovács, G. Development of a New Expert System for Diagnosing Marine Diesel Engines Based on Real-Time Diagnostic Parameters. Stroj. Vestn. J. Mech. Eng. 2022, 68, 642–655. [Google Scholar] [CrossRef]
Fu, C.; Lu, K.; Li, Q.; Xu, Y.; Gu, F.; Ball, A.D.; Zheng, Z. Physics-Based Modelling for On-Line Condition Monitoring of a Marine Engine System. J. Mar. Sci. Eng. JMSE 2023, 11, 1241. [Google Scholar] [CrossRef]
Scappin, F.; Stefansson, S.H.; Haglind, F.; Andreasen, A.; Larsen, U. Validation of a Zero-Dimensional Model for Prediction of NOx and Engine Performance for Electronically Controlled Marine Two-Stroke Diesel Engines. Appl. Therm. Eng. 2012, 37, 344–352. [Google Scholar] [CrossRef]
Llamas, X.; Eriksson, L. Control-Oriented Modeling of Two-Stroke Diesel Engines with Exhaust Gas Recirculation for Marine Applications. Proc. Inst. Mech. Eng. Part M J. Eng. Marit. Environ. 2019, 233, 551–574. [Google Scholar] [CrossRef]
Moussa Nahim, H.; Younes, R.; Shraim, H.; Ouladsine, M. Modeling with Fault Integration of the Cooling and the Lubricating Systems in Marine Diesel Engine: Experimental Validation. IFAC-PapersOnLine 2016, 49, 570–575. [Google Scholar] [CrossRef]
Pagán Rubio, J.A.; Vera-García, F.; Hernandez Grau, J.; Muñoz Cámara, J.; Albaladejo Hernandez, D. Marine Diesel Engine Failure Simulator Based on Thermodynamic Model. Appl. Therm. Eng. 2018, 144, 982–995. [Google Scholar] [CrossRef]
Lang, X.; Wu, D.; Tian, W.; Zhang, C.; Ringsberg, J.W.; Mao, W. Fatigue Assessment Comparison between a Ship Motion-Based Data-Driven Model and a Direct Fatigue Calculation Method. J. Mar. Sci. Eng. JMSE 2023, 11, 2269. [Google Scholar] [CrossRef]
Thurston, M.G.; Sullivan, M.R.; McConky, S.P. Exhaust-Gas Temperature Model and Prognostic Feature for Diesel Engines. Appl. Therm. Eng. 2023, 229, 120578. [Google Scholar] [CrossRef]
García, E.; Quiles, E.; Correcher, A.; Morant, F. Marine NMEA 2000 Smart Sensors for Ship Batteries Supervision and Predictive Fault Diagnosis. Sensors 2019, 19, 4480. [Google Scholar] [CrossRef] [PubMed]
Tan, W.L.; Nor, N.M.; Abu Bakar, M.Z.; Ahmad, Z.; Sata, S.A. Optimum Parameters for Fault Detection and Diagnosis System of Batch Reaction Using Multiple Neural Networks. J. Loss Prev. Process Ind. 2012, 25, 138–141. [Google Scholar] [CrossRef]
Tamilselvan, P.; Wang, P. Failure Diagnosis Using Deep Belief Learning Based Health State Classification. Reliab. Eng. Syst. Saf. 2013, 115, 124–135. [Google Scholar] [CrossRef]
Hong, C.W.; Kim, J. Exhaust Temperature Prediction for Gas Turbine Performance Estimation by Using Deep Learning. J. Electr. Eng. Technol. 2023, 18, 3117–3125. [Google Scholar] [CrossRef]
Kumar, A.; Srivastava, A.; Goel, N.; McMaster, J. Exhaust Gas Temperature Data Prediction by Autoregressive Models. In Proceedings of the 2015 IEEE 28th Canadian Conference on Electrical and Computer Engineering (CCECE), Halifax, NS, Canada, 3–6 May 2015; pp. 976–981. [Google Scholar]
Cha, J.; Ko, S.; Park, S.-Y.; Jeong, E. Fault Detection and Diagnosis Algorithms for Transient State of an Open-Cycle Liquid Rocket Engine Using Nonlinear Kalman Filter Methods. Acta Astronaut. 2019, 163, 147–156. [Google Scholar] [CrossRef]
Ren, D.; Zeng, H.; Wang, X.; Pang, S.; Wang, J. Fault Diagnosis of Diesel Engine Lubrication System Based on Bayesian Network. In Proceedings of the 2020 6th International Conference on Control, Automation and Robotics (ICCAR), Singapore, 20–23 April 2020; pp. 423–429. [Google Scholar]
Wang, Y.; Huang, Y.; Xiao, M.; Zhou, S.; Xiong, B.; Jin, Z. Medium-Long-Term Prediction of Water Level Based on an Improved Spatio-Temporal Attention Mechanism for Long Short-Term Memory Networks. J. Hydrol. 2023, 618, 129163. [Google Scholar] [CrossRef]
Lazakis, I.; Raptodimos, Y.; Varelas, T. Predicting Ship Machinery System Condition through Analytical Reliability Tools and Artificial Neural Networks. Ocean Eng. 2018, 152, 404–415. [Google Scholar] [CrossRef]
Tan, Y.; Tian, H.; Jiang, R.; Lin, Y.; Zhang, J. A Comparative Investigation of Data-Driven Approaches Based on One-Class Classifiers for Condition Monitoring of Marine Machinery System. Ocean Eng. 2020, 201, 107174. [Google Scholar] [CrossRef]
Tan, Y.; Zhang, J.; Tian, H.; Jiang, D.; Guo, L.; Wang, G.; Lin, Y. Multi-Label Classification for Simultaneous Fault Diagnosis of Marine Machinery: A Comparative Study. Ocean Eng. 2021, 239, 109723. [Google Scholar] [CrossRef]
Khan, A.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Shahid, S.M.; Ko, S.; Kwon, S. Real-Time Abnormality Detection and Classification in Diesel Engine Operations with Convolutional Neural Network. Expert Syst. Appl. 2022, 192, 116233. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, Z.; Zuo, X.; Zhao, H. Identification of Wear Mechanisms of Main Bearings of Marine Diesel Engine Using Recurrence Plot Based on CNN Model. Wear 2023, 520–521, 204656. [Google Scholar] [CrossRef]
Han, P.; Ellefsen, A.L.; Li, G.; Asoy, V.; Zhang, H. Fault Prognostics Using LSTM Networks: Application to Marine Diesel Engine. IEEE Sens. J. 2021, 21, 25986–25994. [Google Scholar] [CrossRef]
Zhou, R.; Cao, J.; Zhang, G.; Yang, X.; Wang, X. Heat Load Forecasting of Marine Diesel Engine Based on Long Short-Term Memory Network. Appl. Sci. 2023, 13, 1099. [Google Scholar] [CrossRef]
Velasco-Gallego, C.; Lazakis, I. Mar-RUL: A Remaining Useful Life Prediction Approach for Fault Prognostics of Marine Machinery. Appl. Ocean. Res. 2023, 140, 103735. [Google Scholar] [CrossRef]
Li, F.; Gui, Z.; Zhang, Z.; Peng, D.; Tian, S.; Yuan, K.; Sun, Y.; Wu, H.; Gong, J.; Lei, Y. A Hierarchical Temporal Attention-Based LSTM Encoder-Decoder Model for Individual Mobility Prediction. Neurocomputing 2020, 403, 153–166. [Google Scholar] [CrossRef]
Brauwers, G.; Frasincar, F. A General Survey on Attention Mechanisms in Deep Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 3279–3298. [Google Scholar] [CrossRef]
Kiranyaz, S.; Gastli, A.; Ben-Brahim, L.; Al-Emadi, N.; Gabbouj, M. Real-Time Fault Detection and Identification for MMC Using 1-D Convolutional Neural Networks. IEEE Trans. Ind. Electron. 2019, 66, 8760–8771. [Google Scholar] [CrossRef]
Cheliotis, M.; Lazakis, I.; Theotokatos, G. Machine Learning and Data-Driven Fault Detection for Ship Systems Operations. Ocean Eng. 2020, 216, 107968. [Google Scholar] [CrossRef]

Figure 1. Basic layout of marine diesel engine.

Figure 2. Line graphs and descriptive statistics of operational parameters of the marine engine.

Figure 3. A visual representation of the relationships between the engine parameters.

Figure 5. Structure of CNN-LSTM-MLA model.

Figure 10. Performance comparison of various prediction methods for three different datasets.

Figure 11. Prediction curves of our proposed method and its variants.

Figure 12. EWMA control charts during the normal operation of the engine. (a) Constant-speed navigation. (b) Maneuvering navigation. (c) Anchoring.

Figure 13. EWMA control charts during the injector wear fault condition. (a) Constant-speed navigation. (b) Maneuvering navigation. (c) Anchoring.

Figure 14. EWMA control charts during the exhaust valve fault condition. (a) Constant-speed navigation. (b) Maneuvering navigation. (c) Anchoring.

Table 1. Technical specifications of Wartsila 9L34DF dual-fuel engine.

Technical Specifications	Details
Number of cylinders	9
Stroke	4
Power	4050 kw
Engine speed	750 rpm
Bore	340 mm
Piston stroke	400 mm
Firing order	1-7-4-2-8-6-3-9-5

Table 2. Description of experimental datasets.

Dataset	Constant-Speed Navigation	Maneuvering Navigation	Anchoring
Location	Beibu Gulf	Beibu Gulf	Yangpu Port
Time interval	1 s	1 s	1 s
Speed over ground	[10, 10.5]	[0.5, 5.5]	[0, 1]
Number of samples	28,800	8000	40,000
Engine data collection
$T_{e} (℃$ )	[514.1, 519.3]	[345.5, 460.4]	[386.9, 401.1]
$T_{c} (℃)$	[89.1, 90.7]	[85.7, 88.6]	[86.1, 86.9]
$L (%)$	[48.6, 54.7]	[6.0, 28.7]	[12.4, 14.4]
$R (r / m i n)$	[15,410, 16,150]	[6930, 12,130]	[7896, 8267]

Table 3. Value list of hyperparameters of the deep learning methods.

Method	Number of Convolution Kernels	Hidden State Size	Number of Attention Heads	Dropout	Learning Rate	Batch Size	Epochs
LSTM	/	256	/	0.1	0.0045	62	150
LSTM-ATT	/	132	/	0.2	0.001	55	120
CNN-LSTM	75	168	/	0.4	0.0033	39	140
CNN-LSTM-MLA	50	135	4	0.2	0.0015	64	100

Table 4. Average RMSE, MAE, and MAPE values of different deep learning methods.

Operating Condition	Method	RMSE	MAE	MAPE	MBE
Constant-speed navigation	LSTM	0.1355	0.1196	0.0231	0.1179
	LSTM-ATT	0.0850	0.0817	0.0158	−0.0815
	CNN-LSTM	0.0824	0.0755	0.0146	0.0749
	CNN-LSTM-MLA	0.0654	0.0543	0.0105	0.0298
Maneuvering navigation	LSTM	1.9141	1.5361	0.3665	1.2690
	LSTM-ATT	1.5634	1.0299	0.2535	−0.2324
	CNN-LSTM	1.2938	1.0249	0.2507	0.1845
	CNN-LSTM-MLA	0.7665	0.6036	0.1469	−0.1207
Anchoring	LSTM	0.2582	0.2288	0.0581	−0.2049
	LSTM-ATT	0.1790	0.1512	0.0387	0.1462
	CNN-LSTM	0.1850	0.1294	0.0329	0.1220
	CNN-LSTM-MLA	0.1345	0.0824	0.0209	−0.0154

Table 5. Performances of our model and its variants on the anchoring condition dataset.

Model	CNN	LSTM	SA	MHA	RMSE	MAE	MAPE
1 CNN-LSTM	✓	✓			0.1850	0.1294	0.0329
2 CNN-LSTM-SA	✓	✓	✓		0.1537	0.0940	0.0238
3 CNN-LSTM-MHA	✓	✓		✓	0.1472	0.0889	0.0225
4 CNN-LSTM-MLA	✓	✓	✓	✓	0.1345	0.0824	0.0209

Table 6. Verification cases description.

Operating Condition	Case	Alteration	Starting Point of the Alteration	Number of Altered Samples
Constant-speed navigation	Injector nozzle wear	Increased	2200	1400
Constant-speed navigation	Exhaust valve corrosion	Decreased	1125	2475
Maneuvering navigation	Injector nozzle wear	Increased	1200	1200
Maneuvering navigation	Exhaust valve corrosion	Decreased	855	1545
Anchoring	Injector nozzle wear	Increased	4500	2700
Anchoring	Exhaust valve corrosion	Decreased	3180	4020

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, J.; Ren, H.; Duan, Y.; Yang, X.; Wang, D.; Tang, H. Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines. J. Mar. Sci. Eng. 2024, 12, 990. https://doi.org/10.3390/jmse12060990

AMA Style

Sun J, Ren H, Duan Y, Yang X, Wang D, Tang H. Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines. Journal of Marine Science and Engineering. 2024; 12(6):990. https://doi.org/10.3390/jmse12060990

Chicago/Turabian Style

Sun, Jiawen, Hongxiang Ren, Yating Duan, Xiao Yang, Delong Wang, and Haina Tang. 2024. "Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines" Journal of Marine Science and Engineering 12, no. 6: 990. https://doi.org/10.3390/jmse12060990

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fusion of Multi-Layer Attention Mechanisms and CNN-LSTM for Fault Prediction in Marine Diesel Engines

Abstract

1. Introduction

2. Methodology

2.1. Marine Diesel Engine and Data

2.2. Framework of the Proposed Method

2.3. Model Construction

2.3.1. CNN Fusion Subnet Based on Self-Attention

2.3.2. LSTM Fusion Subnet Based on Multi-Head Attention

3. Experimental Results and Discussion

3.1. Datasets

3.1.1. Experimental Data Sources

3.1.2. Data Preprocessing

3.2. Evaluation Metrics

3.3. Results and Discussion

3.3.1. Prediction Results Using the Proposed Method

3.3.2. Prediction Results Characterized by Metrics

3.3.3. Analysis of Results of the Method without the Attention Mechanism

3.4. Fault Prediction Research

3.4.1. Analysis of Normal Operation of the Marine Diesel Engine

3.4.2. Analysis of Abnormal Operation of the Marine Diesel Engine

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI