Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation

Srinivaas, A.; Sakthivel, N. R.; Nair, Binoy B.

doi:10.3390/informatics12010025

Open AccessArticle

Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation

by

A. Srinivaas

¹

,

N. R. Sakthivel

^1,*

and

Binoy B. Nair

^2,*

¹

Department of Mechanical Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India

²

Department of Electronics and Communication Engineering, Amrita School of Engineering, Amrita Vishwa Vidyapeetham, Coimbatore 641112, India

^*

Authors to whom correspondence should be addressed.

Informatics 2025, 12(1), 25; https://doi.org/10.3390/informatics12010025

Submission received: 16 December 2024 / Revised: 5 February 2025 / Accepted: 12 February 2025 / Published: 21 February 2025

(This article belongs to the Section Machine Learning)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Fault diagnostics in internal combustion engines (ICEs) is vital for optimal operation and avoiding costly breakdowns. This paper reviews methodologies for ICE fault detection, including model-based and data-driven approaches. The former uses physical models of engine components to diagnose defects, while the latter employs statistical analysis of sensor data to identify patterns indicating faults. Various methods for ICE fault identification, such as vibration analysis, thermography, acoustic analysis, and optical approaches, are reviewed. This paper also explores the latest approaches for detecting ICE faults. It highlights the challenges in the diagnostic process and ways to enhance result accuracy and reliability. This paper concludes with a review of the progress in fault identification in ICE components and prospects, highlighted by an experimental investigation using 16 machine learning algorithms with seven feature selection techniques under three load conditions to detect faults in a four-cylinder ICE. Additionally, this study incorporates advanced deep learning techniques, including a deep neural network (DNN), a one-dimensional convolutional neural network (1D-CNN), Transformer and a hybrid Transformer and DNN model which demonstrate superior performance in fault detection compared to traditional machine learning methods.

Keywords:

internal combustion engines; machine learning; classifiers; feature engineering; predictive modeling; supervised learning; classification algorithm; performance metrics; DNN; 1D-CNN; transformer; hybrid transformer and DNN architecture; data preprocessing; cross-validation

1. Introduction

Fault detection in internal combustion engines (ICEs) is an important task to ensure the reliability and efficiency of automobiles and power production systems. There are multiple techniques available for identifying defects in ICEs, which include the following:

Statistical pattern recognition, which uses statistical methods, such as principal component analysis (PCA) and artificial neural networks (ANNs), to examine engine sensor data and detect patterns that signal a failure. Signal-based fault diagnosis methods primarily rely on sensor data analysis (e.g., vibration, pressure, temperature, acoustic signals) and use statistical, spectral, or time-frequency analysis techniques to detect faults. Many of these techniques fall under statistical pattern recognition [1].

Model-based diagnostics, which involves using mathematical models of the engine and components to replicate the engine’s behavior. By comparing the model’s predictions with the real sensor data, defects can be found based on any disparities. These include techniques like Deterministic Fault Diagnosis Methods, Stochastic Fault Diagnosis Methods, Fault Diagnosis for Discrete-Events, and Fault Diagnosis for Networked and Distributed Systems [1].

Expert systems, which employ a knowledge-based system that uses the experience of humans to detect and study faults. These include techniques like Task-based diagnosis, Qualitative trend analysis, and non-statistical analysis such as fuzzy logic [2].

Hybrid methods, which involve the integration of various techniques, such as statistical pattern recognition and model-based diagnosis, in order to enhance the accuracy and reliability of problem identification. These include techniques like signal-based and data-driven methods such as frequency-domain features extracted through Fast Fourier Transform (FFT), and vibration analysis with data-driven techniques [2].

Current studies in this area have concentrated on creating more precise and effective techniques for identifying faults, while also merging these techniques with sophisticated control systems for immediate usage in cars and power generation systems.

Classification of the research article based on statistical pattern recognition, model-based diagnostics, expert systems, and hybrid methods is shown in Table 1.

The system-based diagnostics of various ICE components, as explored by researchers, describe topics such as air–fuel ratio, pistons, valves, bearings, ignition, injection, sensors, hybrid systems, engine load, combustion, and more. Additionally, the authors’ latest contributions to deep learning (DL) in the fault diagnosis of ICEs, particularly their work on Deep Neural Network (DNN) and Convolution Neural Network (CNN) models, and advanced DL concepts like Digital Twin, Transformer, Intelligent Fault diagnosis, are also discussed in further sections.

2. Literature Review

A detailed review of recent researchers’ contributions is laid out under the ICE system-based approach, highlighting the machine learning algorithm, data analysis, and deep learning algorithm-based fault diagnosis technique in the following subsections.

2.1. Air–Fuel Ratio

Ensuring the ideal air–fuel ratio in an ICE needs a delicate equilibrium between reducing emissions, enhancing performance, and ensuring smooth operation. However, this balance can easily be disturbed by this problem, leading researchers to rely on sophisticated algorithms for reliable air–fuel ratio control.

Adaptive observers (Leon, 2018) [6] offer a viable solution to sensor redundancy. Predicting the value of the defective MAF (Manifold Air Fuel) sensor using data via the other functioning sensors ensures the engine continuing to function despite the sensor’s failure. Adopting sensor-less designs ensures the potential improvement of both reliability and cost, along with a minimization of downtime. Modified Triple Modular Redundancy (MTMR) combined with the dual actuator redundancy (Amin, 2019) [68] approach leads to the stability of the system even in the event of a two-sensor failure, thereby reducing production losses and improving operational resilience.

The combination of Kalman filters and a dedicated PI controller (Amin, 2019) [69] enhances active fault detection and isolation capabilities of the same. It further offers robust air–fuel ratio control through the PI controller, leading to stable operation and accurate air–fuel ratio control, even in the presence of sensor and actuator faults. The algorithm takes account of the exhaust gas temperature (EGT) also.

Genetic Algorithm (GA)-based observer models further optimize fault detection and isolation, particularly for the MAP sensor (Iqbal, 2022) [3]. Advancements made in the fault-tolerant air–fuel ratio control algorithm indicate an assured route to ensure the dependable and effective functioning of ICEs. By reducing the amount of time that a system is not functioning, improving the efficiency of its operation, and ensuring that emissions are pollutant-free, sensor malfunctions can be prevented. These malfunctions can result in expensive periods of inactivity and pose environmental risks.

To address the issue, a hybrid fault-tolerant control system (HFTC) is proposed; Alsuwian (2022) [70]. This hybrid technique integrates the algorithms, which signifies a notable advancement for the future of internal combustion technology by integrating a genetic algorithm-based active fault-tolerant control system (AFTCS) with a higher-order sliding mode control-based passive fault-tolerant control system (PFTCS). Figure 1 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.2. Piston

Researchers are currently examining multiple methods to detect and identify malfunctions in pistons. The objective is to employ alternative techniques, apart from Continuous Wavelet Transform (CWT) and Short-Time Fourier Transform (STFT), for the detection of piston scuffing and scratching problems. These methods concentrate on finding precise frequency ranges that are impacted by the damage. In a study conducted by Zhang (2019) [9], the focus was on diagnosing invasive piston defects, with the aim of improving engine maintenance and performance. Cao (2018) [12] used Fast Fourier Transforms (FFTs), wavelet transforms, and Support Vector Machines (SVMs) to analyze vibration data in order to identify certain frequencies that indicate varying types of problems, such as broken or worn rings. This attained competent degrees of precision in distinguishing between standard and defective situations. Chen (2016) [71] used a three-stage NN to analyze vibrations, efficiently identifying and assessing the location and intensity of piston slap issues. Moosavian (2016) [15], Moosavian (2017) [8] proposed a fault detection method for the cylinder-piston assembly which utilized partial least squares in a layer-wise manner. This approach allows for the comprehensive detection and separation of defects across many levels. These studies highlight the promise of several techniques, such as SVM, ANN, and wavelet transforms, in obtaining accurate and impartial outcomes. Figure 2 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.3. Valve

Multiple methods have been studied for the detection of valve malfunctions in IC engines which employ diverse algorithms. Time-frequency analysis is a frequent theme in Zheng’s (2016) [18] work, and in research conducted by Ftoutou et al. (2017) [14]. This analysis involves converting vibration or acoustic data into the time-frequency domain, frequently using techniques such as the S-transform or wavelet transforms. Subsequently, these characteristics are input into fault classifiers such as multilayer perceptron NNs (Zheng 2016) [18] or Support Vector Machines (Jiang (2017) [21], Gritsenko et al. (2020) [14]) to perform fault classification. Zheng (2016) [18] and Jiang (2017) [21] specifically investigate valve clearance issues and achieve good accuracy (95% and 94%, respectively) by employing MLP and SVM algorithms on acoustic and vibration inputs, respectively. Gritsenko et al. (2020) [24] have broadened the focus to include problems in the gas distribution mechanism (GDM). They have achieved a 90% accuracy rate by combining information from vibration, cylinder pressure, and crankshaft angle, and using SVM classification. Figlus et al. (2016) [26] also boasts 93% accuracy. Ghajar et al. (2016) [31] employ an alternate method by constructing a semi-empirical framework to calculate the volumetric efficiency in engines equipped with variable valve timing systems. Engine speed, load, and valve timings are used as input parameters for this model. These studies demonstrate the possibility of different algorithms and modalities for precise and efficient diagnosis of valve faults in ICEs. Figure 3 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.4. Bearing

IC engine bearing health assessment has utilized dynamic simulation models that incorporate multibody dynamics and wear formulae. Chen et al. (2016) [72] created a model that can forecast wear patterns by taking into account the influence of lubrication and utilizing engine characteristics, bearing geometry, and lubricant qualities as input variables. Haneef (2017) [5] improvised on this method by integrating numerical simulation along with approaches of vibration analysis. The researchers used a model which simulates the movement of multiple bodies and paired it with simulations that study the interaction between surfaces in motion, taking various lubrication conditions into account. They employed envelope analysis to identify repetitive impacts from vibration signals. This facilitated the ongoing updating of wear profiles and enhanced the ability to predict wear and vibration. ANNs have shown promise in the field of bearing defect diagnostics. Khoualdia (2023) [20] used an ANN to train simulated vibration data that contained distinct features under the objective of achieving precise identification and categorization of bearing knock faults. Subsequently, the trained ANN was used to analyze real-time vibration signals and make predictions regarding faults. Ates, C. (2023) [25] uses convolutional auto-encoder algorithms to examine vibration patterns from bearings and identify signs of wear. Zhou (2023) [36] employs a novel approach to detect wear in marine diesel engine bearings. It uses recurrence plots from vibration signals and feeds them into a CNN model, which accurately identifies and categorizes different wear types. This approach shows promise for improving maintenance in marine engines. Rameshkumar (2017) [30] extracts time-domain statistical features from acoustic emission, vibration, and sound sensor signals to predict lubricant solid particle contamination in spherical roller bearings. Figure 4 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.5. Sensor

Using ANN, Komorska et al. (2019) [35] developed a method to identify sensor malfunctions in combustion engine control systems with accuracy. Their methodology consisted of deliberately inducing faults in six crucial sensors of a four-cylinder GDI engine and using an NN to precisely categorize these problems. This method produced a commendable success rate of over 99%. Meanwhile, Zeng et al. (2017) [73] addressed the issue of fault detection using support vector data description (SVDD) and Dempster–Shafer evidence theory. Their research centered on integrating data from several vibration sensors to obtain enhanced precision. The researchers showcased the efficacy of multi-sensor data fusion in enhancing fault detection reliability and minimizing uncertainty by first evaluating single-sensor data using SVDD and then integrating the findings from three sensors using Dempster–Shafer theory. Figure 5 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.6. Ignition

Recent improvements in finding problems with ignition have led to a huge rise in the use of advanced algorithms, which have made them more accurate and effective than ever before. The effects of using machine learning have been very good. For example, Mulay et al. (2018) [29] used ARMA and SVM on engine vibration data to achieve 92.2% success in finding misfires. Tao et al. (2019) [41], who combined time-domain and high-frequency vibrating data with Extreme Gradient Boosting (XGBoost), achieved a 99.3% success rate. Shahid et al. (2022) [44] recently showed that a Convolutional Neural Network (CNN) can quickly and accurately find misfires and engine load problems in real time, showing the commendable effects of deep learning on the field.

Aside from machine learning, advanced signal processing and optimization methods have also become useful for finding problems with ignition. In a new study, Xu and his colleagues (2019) [74] came up with a creative way to find misfires in different types of combustion engines. Their method combines discrete spectrum interpolation with generalized force analysis at the center of gravity of the engine. This makes it possible to pinpoint misfires with great accuracy. Yang and his colleagues (2022) [40] published a new study that showed a way to watch conditions that combines adaptive Variational Mode Decomposition (VMD), Grey Wolf Optimization (GWO), and dictionary learning. This method removes the need to break signals down and also makes rebuilding signals more accurate.

A variety of computer-based algorithm methods display how research into finding ignition faults is always changing. The best method depends on various factors like the type of engine, the amount of data that is available, and the level of accuracy. Engine performance and reliability could be greatly improved by continuing to study and improve at these methods. Figure 6 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.7. Injection

Researchers have made substantial progress in diagnosing injection defects and engine performance concerns using a wide variety of sensors and algorithms. Various methodologies have been developed in the field of injection event detection and classification of faults, ranging from Ferrari et al.’s (2019) [34] utilization of FFT on turbocharger speed for precise injection event detection, to Ftoutou et al.’s (2018) [51] categorization of injection faults such as misfires, leaks, and clogs utilizing vibration signals and unsupervised fuzzy clustering. Awad et al. (2019) [75] used wavelet analysis on in-cylinder pressure to study the variations in combustion cycle variability (CCV) in gasoline direct injection (GDI) engines. On the other hand, Vichi et al. (2016) [39] and Becciani et al. (2019) [43] utilized fast FFT and turbocharger speed to estimate and rectify the differences in fuel injection between cylinders in diesel engines. Stojanovic et al. (2016) [11] used cepstrum analysis on rate tube measurements for distortion source detection, while Li et al. (2016) [55] integrated orthogonal vibration signals and GRNNs to improve the identification of fuel supply faults in diesel engines. A study was conducted by Agnieszka Merkisz-Guranowska and Marek Waligórski (2016) [46] on continuous engine assessment using vibration and acoustic signal analysis. Taghizadeh-Alisaraei and Mahdavian (2019) [58] use different time frequency representation (TFR) approaches on vibration data to predict faults and detect real-time knock. Wang et al. (2016) [76] introduced an Elman neural network observer to diagnose faults in MAP sensors by simulating intake pressure. Ftoutou and Chouchane (2017) [77] displayed the efficacy of angle frequency domain vibration analysis in detecting early injection faults. The mentioned studies exemplify the various capabilities of different sensors and algorithms in the transformation of engine fault detection and optimization. Figure 7 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.8. Hybrid

In this section, we discuss numerous new hybrid fault diagnosis methods that combine the best features of various algorithms and sensors to accurately and reliably detect faults. This differs from the previous section, which focused on individual component-based fault diagnosis in engines. Finding the main causes of engine faults is best achieved with methods like 3CA (Zuo et al. (2023) [61]), and signal decomposition, separation, and classification are best performed with VMD-ICA-FCM (Bi et al. (2019) [64]). Various techniques use signal decomposition methods like EMD, ITD, and VMD combined with machine learning algorithms like SVM and ICA to find faults in engines (Wang et al. (2022) [78]), bearings (Zhang et al. (2022) [49]), and induction motors (Liu et al. (2021) [66]). The DSM (Dislocation Superimposed Method) of the statistical method approach adopted for fault isolation by extracting impulsive fault components from acoustic signals in gasoline engines by Chen et al. (2023) [47] works well. Gammatone filter banks, which were used by Waligorski (2020) [53] and Yao et al. (2017) [56], are new methods to study the working of engines and detect noise sources. These different techniques display the role of hybrid methods in solving hard fault diagnosis programs, increasing accuracy and speed in various systems. Figure 8 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.9. Engine Load

Unal et al. (2023) [79] proposed a new means for the classification of engine load. The algorithm uses a Multilayer Perceptron Neural Network (MLPNN) with one hidden layer. The MLPNN is trained using vibration data obtained from a magnetic pickup sensor. The data are then translated into the crank-angle domain (CAD) to extract relevant features. This approach attains a classification accuracy of 100% for engine loads of both 100% and 75%. It showcases its superiority in terms of robustness, as it provides a deep understanding of the physical aspects, and efficiency, by utilizing sensors that are easily accessible. This sets it apart from other methods now in use. This creates opportunities for a wide range of applications in engine monitoring, fault detection, and optimization. Figure 9 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.10. Others

Xie et al. (2018) [50] use a BP neural network to mix sensor data with exhaust emissions in their study. This led to a diagnostic accuracy of 98.33%, which was higher than the accuracy achieved by looking at each source separately. The Dislocation Superimposed Method (DSM) is used by Dayong et al. (2016) [59] in separate fault components from acoustic signals, increasing accuracy in fault identification for diesel. Chiliński and Zawisza (2016) [80] use numerical models to study crankshaft vibrations. They make sure their model is correct by comparing it to data from experiments. This helps us learn more about this important system. Hofmann et al. (2016) [17] use a multi-net neural network along with geometric classification to correctly find diesel injectors that are showing signs of age. This new development makes better preventative upkeep possible. Using Mel-frequency cepstral coefficients (MFCCs) and dynamic temporal warping (DTW), Kemalkar and Bairagi (2016) [23] propose a new method of fault identification with a 93 percent accuracy, displaying the work of non-invasive sound research. These different methods show that we are always improving at finding problems in ICEs, which could lead to better engine performance and care. In the area of IC engine fault diagnostics, machine learning and signal processing techniques are becoming more and more popular. Ensemble Empirical Mode Decomposition (EEMD) is a very good way to separate shaking signals into their own intrinsic mode functions. This helps to find fault information that was hidden. Ensemble Empirical Mode Decomposition (EEMD) and Support Vector Machines (SVMs) were used successfully by Xu (2020) [62] and Wang et al. (2021) [28] to find a wide range of engine flaws with great accuracy. In their research, Kumar et al. (2019) [33] looked into what tree family algorithms could do and were able to achieve a 97% success rate in finding bugs. Nasha Wei et al. (2019) [81] used Wavelet Packet Transform (WPT) to improve monitoring of the lubrication state by looking at acoustic emission signals along with vibration analysis. The improvements made it easier to find faults precisely and without damaging them, and also opened the door to the creation of advanced engine health tracking systems. Ramteke et al. (2019) [65] used the FFT and statistical feature extraction to study vibrations and noise emissions from diesel engines and find liner scuffing flaws. Moosavian et al. (2017) [82] used both time-domain and frequency-domain methods to look at shaking data and guess how much friction there was in the engine. They were able to come up with an ANN-based method that worked very well. Sugumaran (2011) [54], using decision tree algorithms to select the most prominent features from statistical features, extracted vibration signals from a rotational mechanical system. These selected features were then classified using c-SVC and nu-SVC models of Support Vector Machines (SVMs) with four different kernel functions to compare their fault classification accuracies. The results showed that the c-SVC model with the Radial Basis Function (RBF) kernel provided the best classification accuracy, achieving over 97.833% accuracy. In 2016, Guranowska [57] came up with new ways to measure vibroacoustic signals and use advanced methods to separate and label fault sources in off-road diesel engines; this made it easier to obtain a good idea of how bad the fault was. Sakthivel (2010) [4] investigated the use of Support Vector Machines (SVMs) and Proximal Support Vector Machines (PSVMs) for fault classification in mono block centrifugal pumps using statistical features from vibration signals. Gao Fan (2020) [83] used Stress Wave analysis for fault detection of steam turbine engines. The study was able to find mechanical issues with the rotor and too much differential expansion in the LP cylinder. The diverse methodologies presented underscore the efficacy of signal analysis in identifying and diagnosing engine faults. These unique research approaches, each targeting specific anomalies, highlight the potential for enhanced monitoring and maintenance of engine health. The findings from various studies demonstrate that integrating signal processing techniques with fault diagnosis can significantly improve the accuracy and reliability of engine diagnostics, leading to better preventative maintenance strategies and overall performance of the engine. Figure 10 presents a function block diagram that comprehensively outlines the system’s components and their interactions, as derived from various researchers’ work.

2.11. Combustion

In the domain of combustion engine fault diagnosis, various studies have explored the efficacy of different algorithms. Jafarian et al. (2018) [10] employed a multi-sensor vibration signal monitoring approach utilizing FFT, eigenvalue analysis, and classifiers like ANN, SVM, and kNN to detect misfire and valve clearance faults. McMahan (2018) [16] investigated the impact of valve clearance on vibration characteristics using FFT, while Lilo (2016) [60] combined vibration and exhaust gas signals with FFT and fuzzy logic for fault diagnosis, utilizing an ANN classifier. Further, Moosavian (2014) [22] explored SVM and kNN classifiers for fault diagnosis based on vibration data analyzed through FFT. These studies highlight the potential of various algorithms, including FFT, ANN, SVM, and kNN, in effectively diagnosing combustion engine faults with high accuracy.

Sripakagorn (2004) [84] employed high-resolution numerical simulations for studying unsteady flame extinction and reignition, while Prieler et al. (2022) [85] used machine learning (ANNs) to predict flame speed, offering a faster alternative to detailed simulations. Liu et al. (2019) [86] conducted experiments with laser-induced incandescence to investigate soot formation in oxygenated fuels, while Xu et al. (2021) [67] modeled NOx formation in swirl flames using CFD with detailed NOx chemistry. Finally, Masri, A. R. (2016) [87] developed a multi-dimensional turbulent combustion model for spark ignition engines, incorporating detailed chemistry, turbulence, and spray characteristics. These diverse approaches, ranging from numerical simulations to experimental techniques and machine learning, highlight the ongoing efforts to understand and optimize combustion processes for improved efficiency, reduced emissions, and cleaner burning.

Kumar et al. (2019) [33] proposed using machine learning algorithms like J48 and Hoeffding tree to achieve 97% classification accuracy for combustion fault detection in CI engines based on vibration signal analysis. Huang Q, Liu J (2022) [88] found that the ANN model accurately predicted engine responses such as pressure, combustion phasing, and exhaust-related parameters. The ANN model includes finding the optimal spark timing for maximum brake torque, sensitivity analysis to improve engine performance, and the creation of virtual sensors for engine diagnosis with acceptable errors. Zabihi-Hesari A et al. (2022) [27] used a fault detection and diagnosis technique for a 12-cylinder diesel engine based on vibration signature analysis, using a combination of FFT, DWT, and an artificial neural network to analyze vibration signals captured from both the intake manifold and cylinder heads of the engine in the time, frequency, and time-frequency domains to detect and locate combustion faults. X. Xu (2020) [89] and L. Chang (2021) [90] both employed Belief Rule-Based (BRB) expert systems for concurrent fault diagnosis in marine diesel engines. They explored a novel approach using multiple, concurrently activated BRB subsystems to model the complex relationships between fault features and fault modes. The latter proposed a BRB prediction approach with customized attribute weights to represent the relevance of indicative factors to specific faults. Both studies utilized BRB inference and tradeoff analysis to identify concurrent faults based on predetermined thresholds, demonstrating the effectiveness of BRB in handling complex and non-mutually exclusive fault scenarios.

Firmino et al. (2020) [32] compared vibration and acoustic analysis using ANNs, achieving high accuracy (99.30% and 98.70%, respectively). Mu et al. (2021) [37] combined MICEEMD-PWVD and TD-2DPCA for valve clearance fault diagnosis with an accuracy of 98.7%. Roy et al. (2019) [7] used a threshold-based algorithm on wavelet packet transformed (WPT) mobile sound recordings to detect combustion events in diesel engines with 94.7% accuracy. Shahbaz et al. (2021) [63] proposed an ANN-based observer for air–fuel ratio (AFR) control, demonstrating stable AFR control even under severe faults. Muhammad et al. (2022) [91] employed a peak detection algorithm on STFT and Hilbert transformed smartphone sound recordings to achieve 93.8% accuracy in combustion event detection. M. N. V. R. S. S. Sumanth (2019) [38] varied engine speed and load conditions to investigate the effect of wall wetting on hydrocarbon emissions during cold start and steady-state operations. Finally, Kumar et al. (2023) [13] used a threshold-based algorithm on WVD and HHT applied to Android mobile sound recordings to detect misfires with 96.5% accuracy. These studies showcase the effectiveness of various algorithms, including ANNs, MICEEMD-PWVD, and WPT, for combustion event detection and fault diagnosis in ICEs.

2.12. Deep Learning Approaches in Fault Diagnosis

AI techniques, particularly artificial neural networks (ANNs), are being used more and more in fault diagnosis applications for internal combustion engines (ICEs). This is because of the intrinsic capacity of ANNs to learn complex non-linear relationships between input data and output classifications [92]. This makes them very fit to detect fault signatures from corrupted sensor measurements without programming any particular feature extraction. For example, Ahmed et al. [93] showed successful options for training different ANNs to identify and classify broadly occurring ICE faults (defective lash adjuster, piston chirp, etc.) using vibration data. They discovered that the smooth variable structure filter (SVSF) was superior to a conventional approach such as backpropagation, reaching a fault identification precision of 97%. But traditional shallow machine learning techniques may not do the job reasonably well if the interactions in ICEs become more complex over time. To resolve this, Al-Zeyadi et al. [94] considered Deep Neural Networks (DNNs) as a potential means for the enhancement of fault diagnosis. They discovered that an algorithmic approach known as Deep-SBM based on a DNN can accurately predict about 3000 different fault types from vehicle features and self-reported symptoms. These demonstrate the ability of DNNs to adapt to the large-scale and high-dimensional characteristics of modern ICE fault diagnosis. Although vibration signals are frequently employed in ANN-based fault diagnosis, Venkatesh et al. [95] used them to visualize the capability of detecting the misfiring states in a four-cylinder petrol engine using vibration data plots as input to the pre-trained DNNs, viz. AlexNet, VGG-16, and GoogLeNet. ICE fault diagnosis used transfer learning and image-based analysis techniques from the ResNet, VGG, and MobileNet families, with VGG-16 being the best performing architecture on their test set, achieving a top classification accuracy of 98.7%. In addition to DNNs [96], a two-stage approach consists of an Auto-associative Neural Network (AANN) for fault detection followed by a multi-class Support Vector Machine (SVM) for fault classification [93]. To do so, the AANN is trained on normal operating data, which will allow it to create residuals for identifying deviations from the expected behaviors, whereas the SVM uses these residuals to classify the fault type. This procedure was verified on real historical diagnostic data provided by Cognitran Ltd. (Essex, UK).

Qin et al. (2023) introduced MSCNN-LSTMNet, a noise-robust network for engine diagnostics [97]. The network featured a Residual-CNN Denoising Module that predicted and subtracted noise from input signals, enhancing feature extraction. The Multi-scale CNN-LSTM Module employed convolutional layers with varied kernel sizes to capture features across multiple timescales, learning both transient and long-term dependencies. The addition of the LSTM layer improved sequential learning of engine operations, boosting robustness to noise and generalization.

Wang et al. (2023) considered Long Short-Term Memory Recurrent Neural Networks (LSTM RNNs) for misfire detection, applying them to crank speed signals [98]. They implemented LSTM networks, specifically designed to learn temporal dependencies in sequential data, demonstrating strong capability in modeling engine behavior and detecting crank speed variations caused by misfires. This method overcame the limitations of traditional misfire detection methods, which often failed at high engine speeds and low loads [99]. The researchers explored various data partition strategies and network topologies to maximize the performance of the LSTM RNNs, achieving high diagnostic accuracy.

Zhang et al. (2023) [42] provided a detailed analysis on the application of 1-D and 2-D Convolutional Neural Networks (CNNs) for misfire detection in diesel engines. They highlighted the advantages of CNNs in automatically identifying relevant features from raw data, reducing the need for manual feature engineering, and adapting to various engine operating conditions. The 1-D CNNs captured the temporal information directly from crank speed signals as time series data, while the 2-D CNNs processed a mapped version of the data, offering a fresh interpretation. The study demonstrated the ability of both architectures to recognize all types of misfires (complete and partial misfire events, with and without individual cylinder sensors) across multiple engine contexts. It also explored the performance of the models in non-stationary conditions such as accelerations and decelerations, highlighting some limitations at high speeds [42].

The selection of the CNN architecture is dependent upon the use case, available data, and required complexity. The sources also correlate to the adaptability of CNNs for diagnosing faults in internal combustion engines, as each architecture comes with its own respective advantages for addressing challenges such as noise and data variability as well as the necessity of establishing effective feature extraction methods [100].

Advanced Deep Learning Technique

This section provides an in-depth review of the latest advancements in deep learning techniques, focusing on their application in mechanical systems, power systems, and ICEs. It highlights emerging trends and state-of-the-art methodologies driving research in these domains.

The Transformer model applies self-attention mechanisms to weigh the importance of different parts of the input data, allowing it to capture contextual relationships effectively. In addition, it applies parallel computation to process multiple parts of the data at the same time, significantly speeding up training and inference times [101]. Weng et al. (2023) [102] proposed a Multisensory Fusion Transformer (MsFT) for rotating machinery fault diagnosis with a local learning unit (LLU) to enhance both global and local feature extraction, and analyzed faults in gearboxes and cylindrical roller bearings, demonstrating superior accuracy (98.26% and 99.34% on two datasets) and robustness against noise compared to other deep learning methods. Li et al. (2022) [103] used a Transformer based on the focus loss stacked sparse noise reduction auto-encoder for diagnosis and long short-term memory (LSTM) networks for prediction, achieving high accuracy rates of 97.5% during training and 92.5% during testing in predicting gas concentrations and providing early warnings for power grid electric transformer faults. Zhang et al. (2024) [104] combined Transformer architecture with multi-head dilated convolution to effectively extract both local and global fault features from rotating parts in complex equipment. The model achieved high diagnostic accuracy, reaching up to 97.95%, outperforming existing models by 10.97%. Nascimento et al.’s (2022) [105] Transformer for Predictive Maintenance (T4PdM) model used vibration sensor data in bearings and rotor systems, achieving 99.98% accuracy on the Machinery Fault Database (MaFaulDa) dataset and 98% on the Case Western Reserve University (CWRU) dataset. Kumar et al.’s (2024) [106] Transformer-based deep neural network (DNN) was designed for fault classification and prediction in diesel engines using multivariate time-series sensor data. The model, trained with 27 input features, 64 hidden dimensions, two layers, and nine attention heads, achieved 70.01% accuracy on a simulated diesel engine dataset, helping detect faults early in engine components like compressors, intercoolers, intake manifolds, and exhaust manifolds. Cui et al. (2023) [107] developed a Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)-Swin Transformer model for the marine two-stroke diesel engines. They converted the vibration signals into two-dimensional (2D) images using a signal-to-image transformation technique and then processed them with a Swin Transformer-based deep neural network (DNN) for fault classification in engine cylinders, air supply systems, and combustion processes, achieving 98.3% accuracy. Zhou et al. (2023) [108] proposed a Faultformer which utilizes a transformer encoder to process sequential vibration data with a multi-headed self-attention mechanism to learn complex relationships within the data, achieving an accuracy of 97.91% on the Paderborn dataset when using Fourier down sampling and data augmentation, demonstrating its effectiveness in fault prediction without relying solely on FFT.

A Digital Twin (DT) is a virtual representation of a physical mechanical system that continuously updates using real-time sensor data, physics-based models, and machine learning to detect, diagnose, and predict faults for proactive maintenance and failure prevention [109,110]. Wang et al. (2024) [111] proposed a DT-assisted dual-channel parallel convolutional neural network (CNN)-Transformer model for rolling bearing fault diagnosis. A high-fidelity DT was developed to generate balanced simulation data, which are combined with real test data using continuous wavelet transform. The dual-channel CNN-Transformer model achieved 99.87% accuracy. Dong et al. (2024) [112] integrated a DT with a multiscale parallel one-dimensional convolutional neural network (CNN) for fault diagnosis in CNC machine tool table feed systems, which simulated mechanical operations, generated synthetic fault data, and bridged the gap between real and simulated signals using the Wasserstein distance. The CNN extract fault features from both simulated and real signals achieved an accuracy above 95%. Reitenbach et al. (2024) [113] studied DT-integrated Computational Fluid Dynamics (CFD) and Computational Structural Mechanics (CSM) to detect deviations, assess aerodynamic impacts, and optimize predictive maintenance strategies on the turbine geometry captured via high resolution cameras for the turbine blad geometry. Acanfora et al. (2024) [114] proposed a DT technology utilizing ANN and a DT Simulation Model for marine diesel engines. The ANN analyzed synthetic data to detect failure patterns, while the simulation model estimated operational parameters indicating degradation. Liu et al. (2023) [115] implemented a high-dimensional fully connected GAN (HDFC-GAN) which extracts deep data features, while an LSTM-based advanced Digital Twin framework captured temporal patterns, enhancing model reliability. Transfer learning (TL) bridges multiple Digital Twins, improving generalization and diagnostic accuracy across varying conditions applied to triplex pump fault detection; the approach achieved 89.28% accuracy.

Federated Learning (FL) is a decentralized learning approach where multiple devices train a shared model without sharing raw data, making it useful for fault detection across different machines while maintaining privacy [116,117]. Vijayalakshmi et al. (2024) [118] proposed a fault data origination, acquisition, classification, optimization, and standardization (FLOACOS) methodology at local facilities before sharing only model updates with a central server. The study evaluated MLP, CNN, RNN, and LSTM models, showing that applying FLOACOS improves accuracy, with the CNN achieving 98.56% in rotating machines. Zhao et al. (2024) [19] proposed a Federated Distillation Domain Generalization (FDDG) framework to enhance the generalization ability of federated learning-based fault diagnosis models, particularly for new clients with different data distributions. The framework integrates knowledge distillation, generative learning, and domain invariance discovery to improve fault identification. Tested on gearbox and bearing datasets, FDDG achieved 86.86% and 88.76% accuracy. Ge et al. (2024) [119] presented an FL incorporated with variational auto-encoders (VAEs) for fault diagnosis, particularly in data-limited target domains to extract domain-invariant features while maintaining decentralized learning. Tested on cross-working conditions and cross-device scenarios, the proposed VAE-FEDAvg method outperformed the baseline approach. Zhang et al. (2022) [120] introduced Federated Similarity Collaboration (FedSC) for motors, pumps, and turbines; the method is adaptable to diagnosing bearing wear, gearbox failures, vibration anomalies, and electrical issues. FedSC addresses data heterogeneity by allowing multiple clients to train personalized models while maintaining data privacy. FedSC’s potential for real-time industrial fault detection makes it a significant advancement in intelligent predictive maintenance.

Federated Transfer Learning (FTL) builds on FL by adding transfer learning, allowing knowledge transfer between different but related datasets. This is helpful when machines or industries have different data distributions [121,122]. Yang et al. (2024) [123] proposed an FTL framework for rolling bearing fault diagnosis, integrating a domain adversarial neural network (DANN) with maximum mean discrepancy (MMD) to enhance feature alignment and global model aggregation for dynamically updating weighting parameters to improve knowledge transfer across domains. Han et al. (2021) [124] integrated supervised classification with multiple adversarial domain adaptation, ensuring robust feature alignment between source and target domains while mitigating negative transfer effects. Applied to wind turbines and bearings, the approach demonstrated notable accuracy improvements, with performance gains of up to 13.7% over the traditional method. Wang et al. (2023) [125] addressed challenges related to low-quality data and domain shifts across clients. The method integrates dynamic filtering to refine pseudo-labels, a Batch Normalized Maximum Mean Discrepancy (BN-MMD) loss function for domain alignment, and dynamic model aggregation to prioritize high-quality local models. Applied to rotating machinery fault detection, FTL achieved over 96% accuracy across multiple datasets. Yang et al. (2022) [126] introduced an FTL model which integrates multi-task learning (MTL) and dynamic adaptive weight adjustment (DAWA) to enhance knowledge transfer without centralizing data. Applied to wind turbine gearbox fault diagnosis, the approach leveraged convolutional neural networks (CNNs) for local model training. Experimental results demonstrate superior fault recognition accuracy, with MMT-DAWA outperforming FedAvg and ensemble learning by effectively managing outliers.

3. Experimental Investigations

The experiments performed in this study investigate the effectiveness of machine learning algorithms in diagnosing faults within a four-stroke four-cylinder petrol engine using vibration data. Detailed specifications of the IC engine are given in Table 2. Simulated cylinder cutoff faults were induced in each cylinder and compared to normal operation conditions. The objective is to evaluate the performance of various classifiers in discriminating between different fault types based on vibration data transformed using various techniques.

Table 2 provides detailed specifications for the Ambassador engine used in the experimental setup. This information is crucial for understanding the engine’s operational characteristics and performance capabilities.

The data acquisition process involved collecting vibration data under normal engine operation and simulated faults of cylinder injection cutoff in Cylinder 01, Cylinder 02, Cylinder 03, Cylinder 04, as well as a healthy condition as Cylinder ALL. Statistical features like mean, median, and standard deviation were extracted from the raw vibration data. Principal Component Analysis (PCA) was used to reduce dimensionality, while various feature selection methods (MRMR, Chi2, ReliefF, ANOVA, Kruskal–Wallis) were employed to identify the most relevant features.

A total of 16 classifiers were applied to the raw data, PCA-transformed data, and data processed with each feature selection method. The performance of each classifier was evaluated using metrics like accuracy, F1 score, and ROC AUC calculated from the confusion matrix and a sample calculation is also shown for understanding.

3.1. Experimental Setup

This experiment is conducted to investigate the effectiveness of various machine learning classifiers to identify the engine’s normal operation and each cylinder cutoff condition. The experimental setup for evaluating the vibration and acoustic characteristics of a 4-cylinder Ambassador engine (AITEC Triplee Lab Experiments, Pune, Maharashtra, India) involves a comprehensive arrangement comprising a tri-axial accelerometer (Dytran Instruments Inc., Chatsworth, Georgia, USA) for vibration, microphone for sound pressure (GRAS Acoustics, Chennai, Tamil Nadu, India), a dynamometer (AITEC Triplee Lab Experiments, Pune,Maharashtra India), and a data acquisition (DAQ) system (National Instruments, Bangalore, Karnataka, India) interfaced with LabVIEW software. The engine is securely mounted on a test bench, with a tri-axial accelerometer affixed at the top head center to capture vibration data across three orthogonal axes. The engine is coupled to a dynamometer that applies load conditions of 0, 15, and 30% through an electric current-controlled bulb attachment, the loadcell. The dynamometer allows precise load control and simulates various operational conditions. The accelerometer data are collected via a DAQ system connected to a computer running LabVIEW (National Instruments, Bangalore, Karnataka, India), enabling real-time data monitoring, and recording. The DAQ system ensures accurate sampling rates and consistent data collection across different load conditions. Each cylinder is cut off using each cylinder cutoff switch controlling the fuel supply to the cylinder of choice. This setup facilitates detailed analysis of the engine’s vibration behavior and performance under varying loads, providing valuable insights for engine diagnostics and optimization. Figure 11 illustrates the experimental setup.

3.2. Methodology

The methodology for evaluating the vibration characteristics of a 4-cylinder Ambassador engine and classifying engine conditions using machine learning is presented in Figure 12, which provides an overview of the adapted methodology. Initially, a tri-axial accelerometer captures vibration data from the engine under varying load conditions (0%, 15%, and 30%).The decision to focus on low load conditions was driven by the need to detect faults in scenarios where engines operate under partial or minimal loads, such as idling or low-speed cruising, which have reduced signal to noise ratios due to subtler fault signatures [127,128,129]. The raw data undergo feature extraction to compute statistical metrics such as mean, median, mode, standard deviation, kurtosis, skewness, range, mean absolute deviation (MAD), and interquartile range (IQR). Post-feature extraction, dimensionality reduction, and feature selection techniques such as PCA, MRMR, Chi-Square test, ReliefF, ANOVA, and Kruskal–Wallis are applied to refine the dataset. These processed data are then used to train multiple classifiers, including decision trees, discriminant function, Naïve Bayes, SVM, KNN, ensemble methods, various NN architectures and DL proposed methods DNN and 1D-CNN, advanced DL techniques Transformer and a hybrid model Transformer and DNN model. The classifiers’ performance is evaluated using metrics like accuracy, ROC, PPV, FDR, TPR, FNR, and F1 score. This comprehensive setup aims to identify normal engine operation and conditions where individual cylinders are cut off, providing essential insights for engine diagnostics and optimization.

3.3. Sensor Signal Graph

A signal graph is a visual representation of a signal as it varies over time. The Figure 13a–c signal graph shows three graphs of the acceleration in X, Y, and Z over the time in seconds in which the engine is run continuously and the changes in the operation condition from the normal, first, second, third, and fourth cylinder cutoff conditions, respectively, against the three load conditions 0, 15, and 30%, respectively, are plotted; we can observe that the normal engine operation has a lesser range than the engine cutoff condition due to the mechanical vibration caused by imbalance operation; in the engine, each cylinder cutoff stage crosses all the load conditions and the maximum peak points of vibration, with the maximum load condition of our study being 30%, followed by 15% and 0%, respectively.

3.4. Classifiers Details

Cross-validation is a strategy utilized in the realm of machine learning to gauge the capability of a model to extrapolate data that it has not encountered before. This strategy involves the segmentation of the available data into multiple subsets, often referred to as ‘fold’. In each iteration, one fold is set aside for validation purposes while the model is trained using the remaining folds. This procedure is executed repeatedly, with each round selecting a different fold for validation. The fundamental aim of cross-validation is to curb the issue of overfitting. Overfitting is a phenomenon where the model is excessively tailored to the training data, resulting in subpar performance when exposed to new, unseen data.

For this study, a 10-fold cross-validation is used. In this case, the dataset is divided into 10 subsets. The model is trained 10 times, each time leaving out one of the subsets from training and using it as the test set. The significance of setting the value as 10 is that it provides a balance between robust evaluation and computational efficiency. It offers a good trade-off by dividing the data into 10 subsets for comprehensive assessment.

3.4.1. Decision Tree Classifier

A Decision Tree Classifier is a type of machine learning model that makes decisions based on a series of questions or tests. The tree structure consists of nodes, which represent the questions or tests, and branches, which represent the possible answers. Each node tests a specific attribute of the data, and the branches represent the outcomes of that test. The process starts at the root node at the top of the tree, and data are passed down the tree along the branches that correspond to the outcomes of the tests.

At the bottom of the tree, we have leaf nodes, which represent the final decisions or classifications. Once the data reach a leaf node, they are assigned the class label associated with that node.

The goal of the Decision Tree Classifier is to create a model that predicts the value of a target variable by learning simple decision rules inferred from the data features.

The mathematics behind decision trees involves several concepts:

Entropy: Entropy is the amount of information needed to accurately describe data. If data are homogeneous (all elements are similar), then entropy is 0 (pure). If elements are equally divided, then entropy moves towards 1 (impure). Mathematically, it is represented as follows:

Entropy H (p) = - {plog}_{2} (p) - (1 - p) \log_{2} (1 - p)

(1)

where p is the probability of success.

Gini Index/Impurity: Measures impurity in the node. It has a value between 0 and 1. The Gini index of value 0 means the sample is perfectly homogeneous and all elements are similar, whereas a Gini index of value 1 means maximal inequality among elements. Mathematically, it is represented as:

Gini (p) = 1 - (p^{2} + (1 - p)^{2})

(2)

where p is the probability of success.

These concepts are used to decide the best split at each node of the fine tree.

3.4.2. Discriminant Classifiers

Discriminant Analysis is a statistical technique used in pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events.

Linear Discriminant, also known as Fisher’s Discriminant, is a simple and effective method that assumes different classes generate data based on different Gaussian distributions. The fitting function estimates the parameters of a Gaussian distribution for each class. The trained classifier then finds the class with the smallest misclassification cost.

The mathematical formula behind LDA is given by

δ_{k} (x) = x^{T} Σ^{- 1} μ_{k} - \frac{1}{2} μ_{k}^{T} Σ^{- 1} μ_{k} + \log (π_{k})

(3)

where

(x)—input vector
( $δ_{k}$ )—covariance matrix
$μ_{k}$ —mean vector of class (k)
$π_{k}$ —prior probability of class (k).

Optimizable Discriminant allows for automatic hyperparameter tuning, which can improve the model’s performance. It is considered for automatically finding the best model parameters.

3.4.3. Naive Bayes Classifier

The Naive Bayes classifier is a probabilistic machine learning algorithm based on Bayes’ Theorem. The “Naive” part of the name comes from the assumption that all features (or attributes) used in the model are independent of each other. In other words, changing the value of one feature does not directly influence or change the value of any of the other features used in the algorithm.

The general mathematical formula behind the Naive Bayes classifier is Bayes’ theorem:

P (y | x) = \frac{P (x | y) \cdot P (y)}{P (x)}

(4)

where

P(y|x)—posterior probability of class (target) given predictor (attribute)
P(y)—prior probability of class
P(x|y)—likelihood which is the probability of predictor given class
P(x)—prior probability of predictor.

Gaussian Naive Bayes (GNB) Classifier: The GNB classifier is a type of Naive Bayes method where continuous attributes are considered and the data features follow a Gaussian distribution throughout the dataset. The algorithm is based on Bayes’ Theorem and assumes that all the features are independent of each other. The mathematical formula behind GNB is the Bayes theorem given by

P(y|x) = P(x)P(x|y) ⋅ P(y)

(5)

where

P(y|x)—posterior probability of class (target) given predictor (attribute)
P(y)—prior probability of class
P(x|y)—likelihood which is the probability of predictor given class
P(x)—prior probability of predictor.

Kernel Naive Bayes Classifier: The Kernel Naive Bayes classifier, like the Gaussian Naive Bayes, is based on Bayes’ Theorem but uses Kernel Density Estimation (KDE) for the likelihood estimation. KDE is a non-parametric way to estimate the probability density function of a random variable. The mathematical formula behind Kernel Naive Bayes is like the Gaussian Naive Bayes, but the likelihood estimation P(x|y) is replaced by the kernel density estimation. The formula for KDE is given by

KDE (x) = \frac{1}{hm} \sum_{r = 1}^{m} K (\frac{x - x_{r}}{h})

(6)

where

K—kernel function (e.g., Gaussian, Epanechnikov, etc.)
h—bandwidth
m—number of observations.

3.4.4. Support Vector Machine (SVM) Classifiers

Support Vector Machines (SVMs) are powerful machine learning algorithms used for both classification and regression. They are particularly effective because they can handle high-dimensional data and manage non-linear relationships. The main objective of the SVM algorithm is to find the optimal hyperplane in an N-dimensional space that maximally separates the data points in different classes.

Linear SVM: Linear SVM is a variant of SVM used for linearly separable data. If a dataset can be classified into two classes by using a single straight line, then such data are termed as linearly separable data, and the classifier used is called a Linear SVM classifier. The mathematical formula behind Linear SVM involves maximizing the margin between the classes while minimizing the norm of the weight vector, subject to the following constraint:

f (X) = w^{T} \cdot X + b

(7)

where

w—weight vector to minimize
X—data to classify
b—bias term, or the linear coefficient estimated from the training data.

Fine Gaussian SVM: Fine Gaussian SVM, also known as Gaussian Processes Classifier, is a classification machine learning algorithm. Gaussian Processes are a generalization of the Gaussian probability distribution and can be used as the basis for sophisticated non-parametric machine learning algorithms for classification and regression. The most important parameters in the Fine Gaussian SVM class are C, and gamma. C refers to the distance of the margins the hyperplane separates between the classes.

f (X_{1}, X_{2}) = \exp (- γ \cdot {| | X_{1} - X_{2} | |}^{2})

(8)

where

X₁ and X₂ are the data points
γ—parameter that defines how far the influence of a single training example reaches.

3.4.5. K-Nearest Neighbors (KNN) Classifier

The KNN is a simple yet effective algorithm used in supervised learning. It classifies an object based on the majority class of its ‘K’ nearest neighbors.

KNN works by finding the distances between a query and all the examples in the data, selecting the specified number examples (K) closest to the query, then voting for the most frequent label (in the case of classification) or averaging the labels (in the case of regression).

The Fine KNN, also known as the standard KNN, uses a distance metric to find the K nearest neighbors. The most common distance metric used is the Euclidean distance, defined as

d (x, y) = \sqrt{\sum_{i = 1}^{n} {(x_{i} - y_{i})}^{2}}

(9)

where

x and y are two points in the n-dimensional space
x_i and y_i are the coordinates of points x and y, respectively.

Once the distances are calculated, the algorithm selects the K instances closest to the new point and then classifies the new point based on the majority class of these K instances.

The Coarse KNN is a variation of the standard KNN that is used to speed up the algorithm. Instead of calculating the distance from the query point to all points in the dataset, it first groups the points into a smaller number of clusters. Then, it calculates the distance from the query point to the centroid of these clusters. The cluster whose centroid is closest to the query point is selected, and the distances from the query point to the points within this cluster are calculated. The K nearest neighbors are then selected from this subset of points.

The formula for calculating the centroid of a cluster is

C = \frac{1}{n} \sum_{i = 1}^{n} xi

(10)

where

C—centroid
n—number of points in the cluster
xi—coordinate of the i-th point in the cluster.

The Coarse KNN can be much faster than the Fine KNN when dealing with large datasets, as it reduces the number of distance calculations required. However, it may not be as accurate, as it makes an approximation by only considering the points within the nearest cluster.

3.4.6. Ensemble Machine Learning Classifier

An Ensemble Classifier is a type of machine learning model that combines the predictions from multiple other models to make a final prediction. The idea behind ensemble methods is that combining the predictions of several models can often produce better results than any single model could. This is because different models may capture different patterns in the data and combining them can provide a more holistic view.

Boosting Trees: Boosting involves training models in sequence, where each new model is trained to correct the errors made by the previous models. An example of a boosting method is the Gradient Boosting algorithm. Boosting is a sequential technique where the first algorithm is trained on the entire dataset and the subsequent algorithms are built by fitting the residuals of the first algorithm, thus giving more weight to those observations that were poorly predicted by the previous model. It relies heavily on the concept of weights.

In the context of Boosted Trees, each tree is built on the residuals (i.e., the difference between the observed and predicted values) of the previous tree. So, each tree is learning from the mistakes of the previous tree. The final prediction is a weighted sum of the predictions of each tree.

The mathematical formula for the final prediction of a boosted model is

\hat{y} = \sum_{i = 1}^{K} w_{i} f_{i} (x)

(11)

where

$\hat{y}$ —predicted output
K—number of trees
w_i—weight of the i-th tree
f_i(x) is the prediction of the i-th tree.

Bagging Trees: Bagging, or Bootstrap Aggregating, involves creating multiple subsets of the original data, training a model on each subset, and combining the predictions. An example of a bagging method is the Random Forest algorithm.

We create multiple subsets of the original dataset by sampling with replacement, and then train a model (like a decision tree) on each subset. The final prediction is an average of the predictions from all models (for regression problems) or a majority vote (for classification problems).

The mathematical formula for the final prediction of a bagged model can be represented as follows:

Generally, for regression problems,

\hat{y} = \frac{1}{B} \sum_{b = 1}^{B} T_{b} (x)

(12)

In our study case for our classification problem,

\hat{y} = majority vote {\{T_{b} (x)\}}_{b = 1}^{U}

(13)

where

$\hat{y}$ —predicted output
B is the number of trees
T_b(x) is the prediction of the b-th tree.

In the case of regression, the final prediction is the average of the predictions from all the trees; N Radhika (2024) [45]. For classification, the final prediction is the class that receives the most votes from all the trees.

3.4.7. Neural Network Classifiers

An NN Classifier is a type of ANN used for categorizing input data into one of several known classes. It operates under supervised learning, where the network is trained on a dataset containing inputs and corresponding correct outputs.

Structure, Mathematical Formulas, and Properties:

Forward Pass:

Input Layer: the input layer passes the input data to the next layer without any transformation.
Hidden Layers: for each neuron j in the hidden layer,

Activation:

aj = σzj

(14)

Weighted Sum:

zj = i = \sum_{i = 1}^{n} WijXi + bj

(15)

where

zj is the weighted sum of inputs plus bias for neuron j
Xi is the input from the previous layer
Wij is the weight connecting neuron i in the previous layer to neuron j in the current layer
bj is the bias for neuron j
σ is the activation function
aj is the output (activation) of neuron j.

The 16 classifiers are listed, and their selection is described in Table 3; Shan (2024) [48,52].

Proposed DNN Architecture

The Deep Neural Network (DNN) architecture, depicted in Figure 14, is designed to classify the operational state of engine cylinders based on mathematical features derived from tri-axial accelerometer signals. It capitalizes on fully connected layers to learn complex patterns in the feature space. The architecture consists of an input layer for feature ingestion, multiple fully connected hidden layers for feature transformation, and a SoftMax output layer for classification. Detailed configurations, including layer dimensions, activation functions, and dropout rates, are provided in Table 4 for reference.

The key features and design choices include noise-filtered inputs, where the input layer processes pre-selected features to mitigate noise in the vibration signals, ensuring that the model focuses on fault-specific information. Multiple dense hidden layers progressively capture complex relationships within the data, equipped with ReLU activation functions to introduce non-linearity and improve learning capacity. Dropout regularization is applied to the hidden layers to prevent overfitting, ensuring better generalization to unseen data. The final layer employs a SoftMax activation function to generate probabilities for the five classes, enabling reliable classification.

For training and optimization, the loss function used is Categorical Cross-Entropy, which measures the deviation between predicted and actual labels, guiding the model in multi-class classification tasks. The Adam optimizer facilitates adaptive learning with default parameters (learning rate = 0.001, beta1 = 0.9, beta2 = 0.999). The network is trained using mini-batches of size 32 for 50–100 epochs, with early stopping implemented based on validation performance to prevent overfitting.

Proposed 1D-CNN Architecture

The 1D Convolutional Neural Network (1D-CNN) architecture, depicted in Figure 15, was designed for efficient feature extraction and classification, leveraging CNNs’ ability to handle structured temporal data while reducing noise. The architecture comprised an input layer, convolutional layers for feature extraction, pooling layers for dimensionality reduction, fully connected layers for integration, and a final SoftMax output layer for classification. The details of the architecture, including layer dimensions, activation functions, and dropout rates, are visually represented in Figure 15 and detailed in Table 5 for clarity and reproducibility.

Key features and design choices included noise-resilient inputs, where the input layer processed mathematically selected features derived from vibration signals, ensuring robustness against noise and enhancing fault detection accuracy. The CNN employed two 1D convolutional layers to extract hierarchical features from the input data, designed to identify both low-level patterns and higher-order relationships associated with cylinder conditions. Max pooling layers reduced feature dimensionality while preserving key information, improving computational efficiency and reducing overfitting risks. Fully connected layers consolidated features and enabled the network to classify input data into one of five predefined categories. The final layer used a SoftMax activation function to generate class probabilities for multi-class classification.

For training and optimization, the loss function used was Categorical Cross-Entropy, quantifying the discrepancy between predicted and actual labels, and optimizing the network for multi-class classification tasks. The Adam optimizer ensured efficient and adaptive learning with default parameters (learning rate = 0.001, beta1 = 0.9, beta2 = 0.999). The model was trained using mini-batches of size 32 over 100–500 epochs, with early stopping implemented based on validation accuracy to prevent overfitting.

Performance metrics such as accuracy, precision, recall, and F1 score were evaluated during training to assess the model’s classification performance. The architecture was tuned iteratively using these metrics to ensure high generalization capacity and robustness. The proposed 1D-CNN model effectively processed time-series data to classify cylinder faults with precision.

Proposed Transformer Architecture

Transformers have proved to be excellent in terms of correlation and understanding between features, and are popularly used in large language models for their capability to understand human context. The transformer comprises two main components, the encoder and the decoder, which mainly utilize the multi-headed attention mechanism for feature correlation to extract a rich set of features [131]. Figure 16 shows the architecture of the Transformer model developed for engine cylinder fault classification. Input features are projected into a 64-dimensional space and combined with positional encoding. A two-layer transformer encoder with multi-head attention processes the data, followed by a feed-forward network. Finally, a linear classifier maps the transformed features to the output classes. The layers of the transformer are as follows:

Positional Encoding

Signal data depend on the position of information provided, which is why positional encoding is a necessary step providing the model with additional information regarding the position of datapoints provided. Positional encoding in a Transformer model is calculated in the initial step and is used throughout model training and evaluation, providing a mathematical representation for positional encoding in Transformer models.

PE_(p,2i) = sin (pos/10,000^(2i/d_model))

(16)

PE_(p,2i+1) = cos (p/10,000^(2i/d_model))

(17)

Self-Attention

The self-attention mechanism, used in natural language processing due to its ability to establish connections between different segments of an input sequence and assign priorities for parts of sequences, proves resourceful in processing accelerometer data for fault diagnosis in internal combustion engines. The following is a brief overview of the self-attention mechanism in transformer networks.

The input matrix is element-wise multiplied by trainable matrices Wq, Wk, and Wv to form the Query, Key, and Value matrices.
Upon this, a dot product and SoftMax are performed, followed by element-wise multiplication with the Value matrix, comprising the attention weights.

Q = X_f W^Q

(18)

K = X_f W^K

(19)

V = X_f W^V

(20)

Y = S_A(Q, K, V) = SoftMax(A) · V = SoftMax(Q · K^T/√d) · V

(21)

Q—Query matrix
Wq—Trainable weight matrix for query
K—Key matrix
Wk—trainable weight matrix for key
V—Value matrix
Wv—trainable weight matrix for value
D—d_model, number for model size
Y—computed attention matrix.

Multi-Headed Attention Mechanism

The original Q, K, and V matrices are divided into multiple heads, upon which attention is computed and then the matrices are concatenated back to form the original Q, K, and V matrices; this not only decreases computational cost during training and inference, but also allows the model to focus on crucial information within the input factors and to synthesize larger amounts of information all at once without bottlenecking computation.

Encoder Layer

Comprises a stack of layers comprising a multi-headed attention mechanism followed by a feed-forward neural network; skip connections are implemented around the sub-layers which is then followed by layer normalization for computational ease in training and evaluation.

Decoder Layer

The decoder is a stack of identical layers. Each layer consists of three sub-layers: two for self-attention and feed-forward processing, and a third for multi-head attention over the encoder’s output. As with the encoder, the decoder applies residual connections around each sub-layer followed by layer normalization. Additionally, the self-attention mechanism in the decoder is modified to prevent attending to future positions by masking, so that the prediction for position i depends only on the known outputs from positions earlier than i, with the output embeddings offset by one position.

Proposed Hybrid Model Combining Transformers and Deep Neural Networks (DNNs) Architecture for Engine Fault Diagnosis

In this study, a hybrid machine learning model that integrates Transformer architectures with DNNs to enhance fault diagnosis in engines is proposed. This model leverages the Transformer’s ability to capture long-range dependencies and sequential patterns within time-series sensor data through self-attention mechanisms, identifying subtle temporal anomalies indicative of faults [132]. Concurrently, the DNN component excels at modeling complex, non-linear relationships in high-dimensional data, processing the enriched representations from the Transformer to improve the classification and prediction accuracy of fault conditions.

The Transformer model proposed in this study, discussed in the section Proposed Transformer Architecture, serves as the baseline for our extended Transformer + DNN framework. Figure 17 gives an overview of the Hybrid model integration; the Transformer output is fed as the input to the DNN model. Transformer architecture parameters are represented in Table 6.

Figure 18 represents the DNN architecture, and Table 7 provides the DNN architecture parameters. Table 8 provides the hyperparameters applied for the hybrid transformer and DNN model architecture.

3.5. Importance of Data Transformation and Feature Selection

Engine vibration data from sensors are hard to understand because they have a lot of dimensions and can be duplicated. Techniques for transforming data and choosing which features to use are very important for obtaining useful insights. Data transformation and feature selection allow for better accuracy in diagnosis by figuring out the most important differences between normal and abnormal cylinder states, and help with the accurate identification of distinguishing faults, focusing on key features that affect how a cylinder behaves. They also help find the exact cylinder having the problem, which allows for more focused maintenance. In early fault detection, a small change in the way vibrations are recorded is often an initial sign of potential faults. Choosing the right features helps find these small changes, for which preventative maintenance may be implemented, avoiding expensive downtime. The costs of computing high-dimensionality sensor data can be quite high. Dimensionality reduction methods like PCA make the dataset size smaller while retaining key information leading to our faster analysis with less data; this helps in increasing speed, resulting in faster data exploration and quicker model training. Smaller datasets need less computing resources, increasing their cost effectiveness. Better interpretability of the high-dimensionality data increases difficulty; hence, choosing appropriate features is crucial, which leads to our simplified data visualization focusing on a smaller set of useful features; visualization becomes clearer, which helps researchers understand the physical processes that cause the different states of the cylinder.

3.5.1. Feature Ranking

Filter-based methods: These techniques assign scores to features based on intrinsic properties like correlation with the target variable (e.g., cylinder state). Chi2 and ANOVA are examples that calculate statistics to identify features with a statistically significant relationship to the class labels.

Chi2

The Chi-squared statistic (χ²) is calculated for each feature individually, comparing the observed and expected frequencies of different outcomes for each category of the feature,

Χ² = Σ ((O − E)²/E)

(22)

where

Σ = sum over all categories of the feature
O = observed frequency of a particular outcome (e.g., cylinder cutoff) for a specific category
E = expected frequency of the same outcome, calculated based on the null hypothesis that the feature and target variable are independent.

Higher Chi-squared values indicate a stronger deviation from the null hypothesis, suggesting a more significant relationship between the feature and the target variable.

ANOVA

Analysis of Variance (ANOVA) is another tool for filter-based feature selection in engine vibration analysis. It assesses the statistical significance of differences between group means for various features across different cylinder states (e.g., Cylinder normal operation vs. Cylinder cutoff).

ANOVA decomposes the total variance within the data into two components:

Between-group variance (SSB): measures the variability between the means of different groups (cylinder states).
Within-group variance (SSE): measures the variability within each group.

The F-statistic, calculated as the ratio of SSB to SSE, tests the null hypothesis that all group means are equal.

F = SSB/SSE

(23)

Higher F-statistics indicate a larger between-group variance compared to within-group variance, suggesting a higher likelihood that the differences between group means are not due to chance.

Wrapper-based methods: These methods evaluate feature subsets by training a model with each subset and selecting the one that performs best. ReliefF falls under this category, assessing feature importance based on their ability to distinguish between neighboring data points of different classes.

ReliefF Iterates Through Data Points

For each data point, find its k nearest neighbors of the same class (Nearest Hits, NH) and k nearest neighbors of the opposite class (Nearest Misses, NM).

For each feature j,

Calculate the difference in the feature value between the data point and each NH (diff_hit_j).
Calculate the difference in the feature value between the data point and each NM (diff_miss_j).

The feature weight (W_j) is updated based on the difference in feature values:

W_j = W_j − (∑(diff_hit_j) − ∑(diff_miss_j))/(m ∗ k)

(24)

m = number of features
k = number of nearest neighbors.

Features with higher absolute weight values are considered more relevant as they have larger differences in values for nearest neighbors of different classes.

Embedded methods: These methods integrate feature selection within the model training process, leading to a more optimized model and selected feature set simultaneously. Minimum Redundancy Maximum Relevance (MRMR) is a filter-based feature selection both highly relevant to the target variable (e.g., cylinder state) and minimally redundant with other selected features.

MRMR Utilizes Two Measures to Select Features

Relevance (D): measures the mutual information between a feature and the target variable. Higher mutual information indicates stronger relevance.

D(F, C) = I(F, C) − [Σ I(Fi, C)/(m − 1)]

(25)

F—Feature
C—Target variable
Fi—other selected features
m—number of features.

Redundancy (R): measures the average mutual information between a feature and other already selected features. Lower redundancy is desired.

R(F, S) = [Σ I(Fi, F)/(s − 1)]

(26)

Fi—other selected features
S—number of already selected features.

MRMR iteratively selects features that maximize the relevance-to-redundancy ratio (D/R) until a desired number of features is reached.

Kruskal–Wallis’s Test

Another statistical method used for feature selection, particularly in non-parametric scenarios, is the Kruskal–Wallis test. It assesses whether there are statistically significant differences between the distributions of the feature values across different classes.

Rank transformation: For each feature, assign ranks to data points across all groups (cylinder states) based on their values. Ties receive average ranks.

H statistic calculation:

H = 12/n(n + 1) ∗ Σ [(Rj² − n_j²/k)]

(27)

where

H—Kruskal–Wallis statistic
N—total number of samples
n_j—number of samples in group j (e.g., running cylinder)
R_j—sum of ranks in group j
K—number of groups (cylinder states).

To further understand the overview of these feature selection algorithms, see Table 9. Feature selection techniques adopted for the classifiers are created. For understanding of the mathematical feature extraction from the RAW signal data, refer to Appendix A Figure A1.

Table 9 shows feature selection techniques for classifiers, each enhancing model performance by focusing on informative features. RAW data, being unprocessed, serve as a baseline. Principal Component Analysis (PCA) reduces dimensionality while preserving variance, improving efficiency. Minimum Redundancy Maximum Relevance (MRMR) selects features with high relevance and low redundancy, boosting classification accuracy. The Chi-squared test identifies features with significant statistical influence. ReliefF assesses significant feature variations across conditions. The ANOVA and Kruskal–Wallis tests, both non-parametric, detect significant differences in feature distributions, aiding robust feature selection in diverse datasets.

4. Result Metrics Calculation and Discussion

The classifiers are run, and the confusion matrix is plotted for each classifier in Table 3 and each data transformation technique adopted in Table 4; below, a sample calculation of required result metrics and their performance graphs such as accuracy, total cost, F1 score, and RoC-AuC are discussed in detail.

4.1. Confusion Matrix

A confusion matrix is a table that is used to evaluate the performance of a classification model. It provides a summary of the predictions made by the model compared to the actual ground truth across different classes. Figure 19 illustrates the confusion matrix.

In a typical confusion matrix:

The rows represent the actual classes or labels.
The columns represent the predicted classes made by the model.

4.1.1. Accuracy

Understanding the accuracy of classifiers is essential for assessing the efficacy of machine learning models. It offers a deeper understanding of the classifier’s accuracy in correctly detecting and categorizing instances within a dataset. The accuracy of classifier research is a crucial parameter that directly impacts its overall success by serving as the primary measure for evaluating its performance. Increased accuracy results in better predictive capability and dependability; on the contrary, decreased accuracy suggests possible shortcomings and opportunities for model improvement. Therefore, we must have a complete understanding of classifier accuracies to make well-informed decisions to improve model performance which ensures the reliability of classification outcomes in diverse scenarios; Kannan (2023) [38]. Mathematically, the accuracy is calculated as follows:

Accuracy = (TP + TN)/(TP + TN + FP + FN)

(28)

TP (True Positives): correctly predicted positive cases.
TN (True Negatives): correctly predicted negative cases.
FP (False Positives): incorrectly predicted positive cases (Type I error).
FN (False Negatives): incorrectly predicted negative cases (Type II error).

The accuracy score provides an overall measure of how well the classifier performs across all classes in the dataset.

From the Figure 20 accuracy heatmap for three load conditions, 0%, 15%, and 30%, we can infer the following for the 16 classifiers.

The interpolation of the Figure 20 accuracy heatmap across the load with 16 classifiers and seven feature selection techniques is used to build Table 10.

4.1.2. Total Cost

Analyzing engine vibrations to accurately distinguish normal and faulty cylinders while maximizing overall accuracy is important; misidentifying a faulty cylinder as normal could have far more severe consequences than the other way around.

This is where “total cost” comes in; by building a custom cost matrix, Table 11, we can assign different penalties to different misclassifications. In our case, the cost of missing a faulty cylinder (False Negative) might be significantly higher than mistakenly labeling a normal cylinder as faulty (False Positive).

Using this tailored approach, we prioritize minimizing the errors with the greatest impact. The “total cost” across a set of predictions then reflects the overall penalty based on these weighted mistakes. By analyzing this metric, we gain deeper insights into how well our model performs in differentiating healthy and faulty cylinders, beyond just basic accuracy.

This allows us to continuously improve and fine-tune the model, ensuring it prioritizes accuracy where it truly matters—identifying potential problems without unnecessary alarms. So, forget a one-size-fits-all approach to accuracy. With “total cost”, we take control and tailor our machine learning models to deliver results that truly matter in the real world, optimizing our engine analysis for both safety and efficiency.

From Figure 18’s total cost heatmap for three load conditions, 0, 15, and 30%, we can infer the following for the 16 classifiers,

The interpolation of Figure 21’s total cost heatmap across the load with 16 classifiers and seven feature selection techniques is used to build Table 12.

Now we can combine the accuracy and total cost analysis to understand the performance of the classifier that is suitable for our predicting the condition of the IC engine in Table 13.

4.1.3. F1 Score

Evaluating multiple classifiers can be tricky, especially when dealing with imbalanced data or when precision and recall are equally important. The F1 score, a harmonic mean of precision and recall, offers a single metric that captures the trade-off between accurate positive predictions (precision) and finding the most actual positives (recall). This makes it valuable for comparing classifiers and understanding their performance balance in scenarios where both aspects matter.

Precision: let us consider cylinder cutoff and normal operation; this metric reflects how accurate the model is in its “cutoff” predictions. High precision means the model rarely flags normal cylinders as faulty, minimizing unnecessary alarms and potential maintenance costs. However, a trade-off exists—a model with overly high precision might miss some actual cutoff cylinders.

Recall: On the other hand, recall focuses on how well the model identifies all true cutoff cylinders. High recall ensures we catch most, if not all, faulty cylinders. However, if recall is too high, the model might start flagging normal cylinders as faulty.

The F1 score can be calculated using the following formula:

F1 = 2 ∗ (Precision ∗ Recall)/(Precision + Recall)

(29)

where

Precision: proportion of true positives among all predicted positives (TP/(TP + FP))
Recall: proportion of actual positives that are correctly identified (TP/(TP + FN))

From Figure 22 and Table 14, analyzing the performance of 16 classifiers across different feature selection methods reveals interesting patterns. Linear Discriminant and Linear SVM consistently perform well, achieving high scores across most methods like ANOVA, Kruskal–Wallis, ReliefF, and MRMR. This suggests their effectiveness in extracting relevant information from various feature sets for cylinder cutoff classification. This could be supported by the nature of the algorithm where the Linear Discriminant is effective in multivariant data, our data source being a Tri-accelerometer sensor. Our sensor data are analyzed over time and each time point is treated as a separate dimension; our dataset is also a high-dimensionality dataset where the Linear SVM algorithm is superior in identifying the optimal hyperplane that separates the data points of different classes in a high-dimensional space.

In contrast, Coarse KNN consistently appears as the least effective classifier under various methods like ANOVA, Kruskal–Wallis, MRMR, and ReliefF. This indicates its limitations in effectively utilizing the selected features for accurate classification in this specific task. The Fine KNN has outperformed the Coarse KNN where the Coarse algorithm deploys several hundred neighbors; our study is set to 100 neighbors, while the Fine KNN is set to 1 neighbor. Thus, the Coarse KNN makes it more robust to noise and outliers but potentially oversimplifies the data.

It is important to note that other classifiers exhibit varying performance depending on the specific method used. For example, under CHi2, Optimizable Discriminant shows exceptional performance alongside Linear Discriminant, while Kernal Naive Bayes struggles. Similarly, Bagged Trees excel alongside Linear SVM under Kruskal–Wallis.

Finally, the impact of PCA is noteworthy. While Fine Tree performs well with RAW, it shows the lowest performance when using PCA-processed data. This highlights the potential for PCA to negatively impact the performance of some classifiers in certain scenarios, emphasizing the need for careful evaluation of different data preprocessing techniques. Overall, the analysis highlights the importance of considering both classifier characteristics and feature selection methods when choosing the optimal approach for a specific task. Consistent performers like Linear Discriminant and Linear SVM offer reliable choices, while understanding the limitations of others like Coarse KNN is crucial. Additionally, careful evaluation of the impact of data preprocessing techniques is essential to ensure they enhance rather than hinder classification performance.

Figure 23 serves as the basis for Table 15. Coarse KNN was found to be the bottom performer; it degrades in high-dimensional spaces due to the high number of neighbors and the sensitivity to outliers as they influence the distance calculation and lead to incorrect classification or prediction. This is because in high-dimensional spaces, all points appear to be almost equidistant, making it hard for the algorithm to find meaningful nearest neighbors.

Linear SVM and Linear Discriminant excel in all the feature selection methods under the 15% load condition; specifically, the CHi2 Bagged Tress showed a high classification. It makes many small copies of the data and builds a decision tree for each one. The final answer is found by averaging the results from all the trees, which is very adaptive for complex datasets.

Figure 24 serves as the basis for Table 16. Coarse KNN steadily remains the bottom performer in all three load conditions. Linear Discriminant and Linear SVM outperform in all three load conditions. Notably, the MRMR feature selection method aims to choose features that are highly correlated with the target (maximum relevance) but uncorrelated with each other (minimum redundancy). This helps in reducing overfitting and improving model interpretability. In the MRMR, the Kernal Naïve Bayes and the Narrow NN-01 had the top classification performance. In addition, kindly refer to Appendix A for the individual F1 score values of the individual classifiers.

4.1.4. RoC Curve

The Receiver Operating Characteristic (ROC) curve is a graphical representation of the performance of a binary classifier system as its discrimination threshold is varied. It plots the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The TPR is also known as sensitivity or recall, while the FPR is the complement of the true negative rate (TNR) and is also known as the fall-out.

The area under the ROC curve (AuC) is a single scalar value that summarizes the performance of the classifier across all possible thresholds. The AuC value ranges between 0 and 1, where a higher value indicates better performance. An AuC of 0.5 suggests that the classifier performs no better than random guessing, while an AuC of 1 indicates perfect discrimination between positive and negative classes.

The ROC curve and its AuC provide valuable insights into the trade-off between the true positive rate and false positive rate, allowing for the evaluation and comparison of different classifiers’ performance.

0% Load Condition ROC Curve AuC Study

Studying the minimum performing classifier in machine learning provides a worst-case scenario perspective, offering valuable insights into potential failures and robustness improvement opportunities. It helps identify areas for enhancement in feature selection or model training by revealing features that may not contribute to or may even hinder predictive performance. This approach also aids in avoiding overfitting, especially when the performance difference between classifiers is small, ensuring results are not overly optimistic. Furthermore, if the performance gap between the minimum and maximum performing classifiers is minimal, using a simpler or faster model could be more resource efficient.

The interpolation of Figure 25 is used to make Table 17.

For the 0% load condition heatmap plot for the seven feature selection techniques, we can observe all the classifiers have an AuC value greater than 0.5, indicating that they are capable of better classification. The lowest value is observed in the Fine KNN classifier, with a value of 0.9147 in the RAW of the cylinder-04 dataset. The Fine Tree classifier achieves a value of 0.8057 in the PCA of the cylinder-04 dataset. The Narrow NN-03 classifier achieves a value of 0.9123 in the MRMR of the cylinder-03 dataset. The Fine KNN classifier achieves a value of 0.9272 in the Chi2 of the cylinder-03 dataset. The Fine KNN classifier achieves a value of 0.9063 in the ReliefF of the cylinder-04 dataset. The Fine Tree classifier achieves a value of 0.8966 in the ANOVA of the cylinder-04 dataset. Lastly, the Fine KNN classifier achieves a value of 0.9272 in the Kruskal–Wallis of the cylinder-03 dataset.

15% Load Condition RoC Curve AuC Study

All the classifiers have an Area under the Curve (AuC) greater than 0.5. The classifier with the lowest AuC is the Kernal Naïve Bayes, with a value of 0.9205 in the RAW of the cylinder-01. The Fine KNN classifier has a value of 0.7982 in the PCA of cylinder-04. The Kernal Naïve Bayes classifier has a value of 0.9285 in the MRMR of cylinder-02. In the Chi2 of cylinder-01, the Kernal Naïve Bayes classifier has a value of 0.919. The Optimizable Discriminant classifier has a value of 0.9148 in the ReliefF of cylinder-01. In the ANOVA of cylinder-01, the Kernal Naïve Bayes classifier has a value of 0.9205. Lastly, in the Kruskal–Wallis of cylinder-04, the Kernal Naïve Bayes classifier has a value of 0.9192. These are interpolated from Figure 23.

The interpolation of Figure 26 is used to make Table 18.

30% Load Condition RoC Curve AuC Study

All the classifiers have an AuC value greater than 0.5, indicating that they are capable of better classification. The lowest value is observed in the Fine Tree classifier, with a value of 0.7787 in the PCA of the cylinder-04 dataset. The Fine KNN classifier achieves a value of 0.8839 in the RAW of the cylinder-02 dataset. The Narrow NN-03 classifier achieves a value of 0.8108 in the MRMR of the cylinder-02 dataset. The Fine KNN classifier achieves a value of 0.8839 in the Chi2 of the cylinder-02 dataset. The Fine KNN classifier achieves a value of 0.8820 in the ReliefF of the cylinder-02 dataset. The Fine Tree classifier achieves a value of 0.8879 in ANOVA of the cylinder-02 dataset. Lastly, the Fine KNN classifier achieves a value of 0.8961 in the Kruskal–Wallis of the cylinder-02 dataset. All datasets are interpolated from Figure 24.

The interpolation of Figure 27 is used to make Table 19.

4.2. Deep Learning Approach Result Discussion

The performance of the deep learning (DL) approach of the proposed DNN architecture classifier and the 1D-CNN architecture classifier is discussed in this section.

4.2.1. Proposed DNN Architecture Performance Evaluation

The performance of the proposed Deep Neural Network (DNN) model for engine fault diagnosis, as shown in Figure 28, demonstrates exceptional classification capabilities. The confusion matrix highlights near-perfect classification with high values along the diagonal and minimal misclassifications. The F1 score, indicative of the model’s precision and recall balance, approaches 1, reflecting outstanding accuracy. The low total cost of misclassifications underscores the model’s efficiency, while the ROC AUC further confirms the model’s strong discriminatory power. These metrics collectively affirm the robustness and reliability of the DNN model in practical applications.

4.2.2. Proposed 1D-CNN Architecture Performance Evaluation

The performance of the proposed 1D Convolutional Neural Network (1D-CNN) for engine fault diagnosis is illustrated in Figure 29, which showcases the confusion matrices under various load conditions. The model demonstrated superior classification capabilities, with high values along the diagonal and minimal misclassifications, indicative of its robustness in fault detection. The F1 score approached 1, highlighting the model’s accuracy. Additionally, the low total cost of misclassifications and the high ROC AUC further validated the model’s efficiency and reliability.

In comparison with the Deep Neural Network (DNN) model, the 1D-CNN exhibited better performance metrics, underscoring its advantage in handling structured temporal data. Given the feature-selected data, the 1D-CNN’s predictions were superior to those of the 16 empirical classification methods studied. This comprehensive performance evaluation confirms the 1D-CNN as the most effective model for real-time fault detection in engine systems among the ones studied.

4.2.3. Proposed Transformer Architecture Performance Evaluation

The performance of the proposed Transformer-based classification for the engine fault diagnosis confusion matrix is shown in Figure 30. We can observe that at 30% load, the model works excellent and at the 15% and 0% load conditions, there is a slight increasing trend of misclassification, particularly among Cylinder 2, 3, and 4, suggesting a potential overlap in feature representations at lower loads; but, notably, in all the load conditions, the normal operation Cylinder ALL is performing well in classification which distinguishes the faulty or non-faulty operations of the engine. To express the model success, a load comparative study is illustrated in Figure 31.

The Transformer model performs best at 15% load, achieving an average accuracy of 99.02%, with three folds reaching a perfect 100% accuracy. At 30% load, the accuracy slightly drops to 97.46%, showing some variation across folds but still maintaining high reliability. The no-load condition (0%) has the lowest accuracy at 95.05%, with more noticeable variations among the folds. Comparing performance across conditions, the model improves by 3.97% when moving from no load to 15% load, while 30% load is 2.41% better than no load but 1.56% lower than 15% load. These results suggest that fault diagnosis is most reliable at 15% load, while accuracy decreases slightly at no load and 30% load, likely due to changes in fault signal clarity and feature representation. The Transformer architecture maintains robust performance across all load conditions.

4.2.4. Proposed Hybrid—Transformer and DNN Architecture Performance Evaluation

The performance of the proposed Hybrid model of the Transformer and DNN-based classification for the engine fault diagnosis confusion matrix is shown in Figure 32. The model performs best at 15% load, with minimal misclassifications, showing strong fault separation. At 30% load, there are a few more misclassifications, especially in Cylinder 2, where some samples are confused with Cylinder 3 and 4. The zero-load condition has the highest misclassification rate, particularly in Class 2 and Class 3, suggesting that fault patterns are harder to distinguish without load. Overall, accuracy improves with some load, with 15% load being the most reliable, while zero load shows the most confusion among fault classes. To express the model success, a load comparative study is illustrated in Figure 33.

The hybrid Transformer-DNN model demonstrates its highest accuracy of 98.70% at 15% load, indicating optimal fault classification under partial load conditions. Performance slightly declines at 30% load (95.87%), with an observed increase in misclassification, particularly affecting Class 2 and Class 4. The no-load condition (93.40%) shows the most variation in accuracy across folds, with the highest standard deviation (0.0499), suggesting that fault patterns are less distinguishable when the engine operates without load. The comparative analysis reveals a 5.30% accuracy improvement from no load to 15% load, while 30% load still outperforms no load by 2.47% but falls 2.83% short of 15% load performance. These findings highlight that the model is most stable and reliable at 15% load, whereas zero load presents the most classification challenges, emphasizing the need for enhanced feature extraction or adaptive learning techniques under no-load conditions. Overall, the model performance is higher than the traditional classification models and DNN models.

5. Future Scope

Future research endeavours should aim to extend the findings of this study through a series of more comprehensive experimental evaluations. These evaluations should involve testing the proposed models across a wider spectrum of engine types, diverse fault scenarios, and varying environmental conditions. Such an approach would facilitate a more thorough assessment of the models’ robustness and adaptability.

Moreover, integrating model-based and signal processing approaches with machine learning techniques holds promise for enhancing diagnostic accuracy and providing deeper insights into engine fault diagnosis. The development of hybrid methodologies that combine knowledge-based, model-based, and signal processing methods could potentially lead to the creation of more holistic diagnostic frameworks. Adapting techniques like FL and FTL for different types of engines or engine-based datasets needs lot of experimental validation and new models for the fault diagnosis.

Further exploration into the comparative performance of these diverse diagnostic approaches, as well as their potential integration, would significantly contribute to the advancement of the field of engine fault diagnosis. These efforts would not only improve diagnostic precision but also pave the way for more reliable and efficient fault detection mechanisms in various engine systems.

6. Conclusions

Between 2016 and 2024, there was notable advancement in the research of diagnosing faults in ICEs. Nevertheless, there are still deficiencies and constraints that must be resolved to enhance our comprehension of defect diagnostics in this domain. ICEs are intricate devices that may encounter a range of malfunctions, thereby necessitating the development of thorough methodologies for detecting and diagnosing these problems.

Our work encompasses empirical and deep learning approaches for fault detection in internal combustion engines, including a broad range of algorithms and sensor data analysis techniques. The empirical evaluation used 16 different machine learning classifiers and was tested across three load conditions, namely 0%, 15%, and 30%, and seven feature selection methods, namely RAW, PCA, MRMR, Chi2, ReliefF, ANOVA, and Kruskal–Wallis. In general, Linear Discriminant and Linear SVM classifiers have shown to perform very well, and their combination with specific feature selection methods has been successful. The worst performer was Coarse KNN, which proves that this algorithm has limitation in this application. This analysis showed that this selection of appropriate classifiers and feature selection methods is crucial, as differing selections alter the performance considerably.

From the assessment of the 16 classifiers, many interesting observations were gained. Linear Discriminant and Linear SVM proved to be very stable and accurate for different feature sets and load conditions. Bagged Trees was well suited to the Kruskal–Wallis method, meaning it is applicable to some characteristics of data. PCA is generally helpful in reducing the dimensionality but detrimental to some classifiers such as Fine Tree, indicating that preprocessing techniques should be selected carefully. This study further presented a DNN architecture for cylinder fault diagnosis, showing a better performance than traditional machine learning methods. The proposed 1D-CNN architecture enhanced the fault detection capabilities and surpassed even the DNN model in terms of accuracy, total cost, and ROC AUC.

The proposed 1D-CNN is effective for processing vibration sensor time-series data to extract meaningful features; the approach may well be suited to real-time monitoring and diagnosis of an engine. The Transformer model proves to be highly effective in engine fault diagnosis, particularly under partial load conditions, where it demonstrates strong fault differentiation. The hybrid Transformer-DNN model further enhances classification performance by combining the strengths of both architectures. It maintains reliability across varying load conditions, showing improved fault separation.

Possible avenues for further work include fine-tuning deep architectures and testing the effectiveness of combining the inputs from various sensor sources towards higher fault detection accuracy and robustness.

Author Contributions

Conceptualization N.R.S. and B.B.N., Methodology N.R.S. and B.B.N., Formal Analysis A.S., Investigation A.S., Writing—Original Draft Preparation A.S., Supervision N.R.S. and B.B.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are available from the corresponding authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

ICEs	Internal Combustion Engines
PCA	Principal Component Analysis
ANNs	Artificial Neural Networks
MAF	Manifold Air Fuel
MTMR	Modified Triple Modular Redundancy
PI	Proportional-Integral (controller)
EGT	Exhaust Gas Temperature
GA	Genetic Algorithm
MAP	Manifold Absolute Pressure
HFTC	Hybrid Fault-Tolerant Control System
AFTCS	Active Fault-Tolerant Control System
PFTCS	Passive Fault-Tolerant Control System
CWT	Continuous Wavelet Transform
STFT	Short-Time Fourier Transform
FFT	Fast Fourier Transform
SVM	Support Vector Machine
NN	Neural Network
GDM	Gas Distribution Mechanism
CNN	Convolutional Neural Network
ARMA	Autoregressive Moving Average
XGBoost	Extreme Gradient Boosting
VMD	Variational Mode Decomposition
GWO	Grey Wolf Optimization
SVDD	Support Vector Data Description
GRNN	General Regression Neural Network
TFR	Time-Frequency Representation
ICA	Independent Component Analysis
FCM	Fuzzy C-Means
EMD	Empirical Mode Decomposition
ITD	Intrinsic Time-Scale Decomposition
DSM	Damped Sinusoid Model
MLPNN	Multilayer Perceptron Neural Network
CAD	Crank-Angle Degree
CCV	Combustion Cycle Variability
MRMR	Minimum Redundancy Maximum Relevance
Chi2	Chi-Square Test
ReliefF	Relief Feature Selection Algorithm
ANOVA	Analysis of Variance
ROC-AUC	Receiver Operating Characteristic—Area Under the Curve
BP	Backpropagation
MFCC	Mel-Frequency Cepstral Coefficients
DTW	Dynamic Time Warping
EEMD	Ensemble Empirical Mode Decomposition
WPT	Wavelet Packet Transform
WVD	Wigner-Ville Distribution
HHT	Hilbert-Huang Transform
LIASSR	Least Instantaneous Angular Speed Reduction
AFR	Air–Fuel Ratio
DAQ	Data Acquisition
MAD	Mean Absolute Deviation
IQR	Interquartile Range
KNN	K-Nearest Neighbors
ROC	Receiver Operating Characteristic
PPV	Positive Predictive Value
FDR	False Discovery Rate
TPR	True Positive Rate
FNR	False Negative Rate
LDA	Linear Discriminant Analysis
KDE	Kernel Density Estimation
GNB	Gaussian Naive Bayes
SSB	Between-group Sum of Squares
SSE	Within-group Sum of Squares
NH	Nearest Hits
NM	Nearest Misses
TP	True Positives
TN	True Negatives
FP	False Positives
FN	False Negative
RAW	“Unprocessed
FPR	false positive rate
TNR	true negative rate
AuC	Area under the Curve

Appendix A

Figure A1. Feature extraction from the RAW sensor data.

The F1 score data point for the different load conditions along with the mean and median is provided below for each feature selection technique.

Figure A2. F1 Score Heatmap plot with 0% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure A3. F1 Score Heatmap plot with 15% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure A4. F1 Score Heatmap plot with 30% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

References

Gao, Z.; Cecati, C.; Ding, S.X. A Survey of Fault Diagnosis and Fault-Tolerant Techniques—Part I: Fault Diagnosis with Model-Based and Signal-Based Approaches. IEEE Trans. Ind. Electron. 2015, 62, 3757–3767. [Google Scholar] [CrossRef]
Gao, Z.; Cecati, C.; Ding, S.X. A Survey of Fault Diagnosis and Fault-Tolerant Techniques—Part II: Fault Diagnosis with Knowledge-Based and Hybrid/Active Approaches. IEEE Trans. Ind. Electron. 2015, 62, 3768–3774. [Google Scholar] [CrossRef]
Iqbal, M.S.; Amin, A.A. Genetic algorithm based active fault-tolerant control system for air fuel ratio control of internal combustion engines. Meas. Control. 2022, 55, 703–716. [Google Scholar] [CrossRef]
Sakthivel, N.R.; Sugumaran, V.; Nair, B.B. Application of Support Vector Machine (SVM) and Proximal Support Vector Machine (PSVM) for fault classification of monoblock centrifugal pump. Int. J. Data Anal. Tech. Strateg. 2010, 2, 38–61. [Google Scholar] [CrossRef]
Haneef, M.D.; Randall, R.B.; Smith, W.A.; Peng, Z. Vibration and Wear Prediction Analysis of IC Engine Bearings by Numerical Simulation. Wear 2017, 384, 15–27. [Google Scholar] [CrossRef]
León, P.G.; García-Morales, J.; Escobar-Jiménez, R.; Gómez-Aguilar, J.; López-López, G.; Torres, L. Implementation of a fault tolerant system for the internal combustion engine’s MAF sensor. Measurement 2018, 122, 91–99. [Google Scholar] [CrossRef]
Roy, S.K.; Mohanty, A.R.; Kumar, C.S. In-cylinder combustion detection based on logarithmic instantaneous angular successive speed ratio. J. Phys. Conf. Ser. 2019, 1240, 012037. [Google Scholar] [CrossRef]
Moosavian, A.; Najafi, G.; Ghobadian, B.; Mirsalim, M.; Jafari, S.M.; Sharghi, P. Piston scuffing fault and its identification in an IC engine by vibration analysis. Appl. Acoust. 2016, 103, 108–120. [Google Scholar] [CrossRef]
Zhang, Y.; Yang, Y.; Li, H.; Han, H.; He, Z. Layer-Wise Fault Diagnosis Based on Key Performance Indicator for Cylinder-Piston Assembly of Diesel Engine. In Proceedings of the IEEE 15th International Conference on Control and Automation (ICCA), Edinburgh, UK, 16–19 July 2019; pp. 307–312. [Google Scholar]
Jafarian, K.; Mobin, M.; Jafari-Marandi, R.; Rabiei, E. Misfire and valve clearance faults detection in the combustion engines based on a multi-sensor vibration signal monitoring. Measurement 2018, 128, 527–536. [Google Scholar]
Stojanovic, S.; Tebbs, A.; Samuel, S.; Durodola, J. Cepstrum Analysis of a Rate Tube Injection Measurement Device; SAE Technical Paper 2016-01-2196; SAE International: Detroit, MI, USA, 2016. [Google Scholar]
Cao, W.; Dong, G.; Xie, Y.-B.; Peng, Z. Prediction of wear trend of engines via on-line wear debris monitoring. Tribol. Int. 2018, 120, 510–519. [Google Scholar] [CrossRef]
Kumar, A.; Gupta, R.; Singh, P. Hybrid methods for misfire detection using threshold-based algorithm on WVD and HHT with Android mobile sound recording. J. Combust. Emiss. 2023, 19, 302–314. [Google Scholar]
Ftoutou, E.; Chouchane, M. Feature Extraction Using S-Transform and 2DNMF for Diesel Engine Faults Classification. In Applied Condition Monitoring, Proceedings of the International Conference on Acoustics and Vibration (ICAV2016), Hammamet, Tunisia, 21–23 March 2016; Fakhfakh, T., Chaari, F., Walha, L., Abdennadher, M., Abbes, M., Haddar, M., Eds.; Springer: Berlin/Heidelberg, Germany, 2017; Volume 5. [Google Scholar]
Moosavian, A.; Najafi, G.; Ghobadian, B.; Mirsalim, M. The effect of piston scratching fault on the vibration behavior of an IC engine. Appl. Acoust. 2017, 126, 91–100. [Google Scholar] [CrossRef]
McMahan, J.B. Functional Principal Component Analysis of Vibrational Signal Data: A Functional Data Analytics Approach for Fault Detection and Diagnosis of Internal Combustion Engines. Ph.D. Thesis, Mississippi State University, Starkville, MI, USA, 2018. [Google Scholar]
Hofmann, O.; Strauß, P.; Schuckert, S.; Huber, B.; Rixen, D.; Wachtmeister, G. Identification of Aging Effects in Common Rail Diesel Injectors Using Geometric Classifiers and Neural Networks; SAE Technical Paper 2016-01-0813; SAE International: Detroit, MI, USA, 2016. [Google Scholar]
Zheng, T.; Tan, R.; Li, Y.; Yang, B.; Shi, L.; Zhou, T. Fault diagnosis of internal combustion engine valve clearance: The survey of the-state-of-the-art. In Proceedings of the 12th World Congress on Intelligent Control and Automation (WCICA), Guilin, China, 12–15 June 2016; pp. 2614–2619. [Google Scholar]
Zhao, C.; Shen, W. A Federated Distillation Domain Generalization Framework for Machinery Fault Diagnosis with Data Privacy. Eng. Appl. Artif. Intell. 2024, 130, 107765. [Google Scholar] [CrossRef]
Khoualdia, T.; Lakehal, A.; Chelli, Z. Practical investigation on bearing fault diagnosis using massive vibration data and artificial neural network. In Big Data and Networks Technologies; Springer: Berlin/Heidelberg, Germany, 2019; pp. 110–116. [Google Scholar]
Jiang, Z.; Mao, Z.; Wang, Z.; Zhang, J. Fault Diagnosis of Internal Combustion Engine Valve Clearance Using the Impact Commencement Detection Method. Sensors 2017, 17, 2916. [Google Scholar] [CrossRef] [PubMed]
Moosavian, A.; Ahmadi, H.; Sakhaei, B.; Labbafi, R. Support vector machine and K-nearest neighbour for unbalanced fault detection. J. Qual. Maint. Eng. 2014, 20, 65–75. [Google Scholar] [CrossRef]
Kemalkar, A.K.; Bairagi, V.K. Engine fault diagnosis using sound analysis. In Proceedings of the 2016 International Conference on Automatic Control and Dynamic Optimization Techniques (ICACDOT), Pune, India, 9–10 September 2016; pp. 943–946. [Google Scholar]
Gritsenko, A.; Shepelev, V.; Zadorozhnaya, E.; Almetova, Z.; Burzev, A. The advancement of the methods of vibro-acoustic control of the ICE gas distribution mechanism. FME Trans. 2020, 48, 127–136. [Google Scholar] [CrossRef]
Ates, C.; Höfchen, T.; Witt, M.; Koch, R.; Bauer, H.J. Vibration-based wear condition estimation of journal bearings using convolutional autoencoders. Sensors 2023, 23, 9212. [Google Scholar] [CrossRef] [PubMed]
Figlus, T.; Gnap, J.; Skrúcaný, T.; Šarkan, B.; Stoklosa, J. The Use of Denoising and Analysis of Acoustic Signal Entropy in Diagnosing Engine Valve Clearance. Entropy 2016, 18, 253. [Google Scholar] [CrossRef]
Zabihi-Hesari, A.; Ansari-Rad, S.; Shirazi, F.A.; Ayati, M. Fault detection and diagnosis of a 12-cylinder trainset diesel engine based on vibration signature analysis and neural network. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 1910–1923. [Google Scholar] [CrossRef]
Wang, A.; Li, Y.; Du, X.; Zhong, C. Diesel Engine Gearbox Fault Diagnosis Based on Multi-features Extracted from Vibration Signals. IFAC-Pap. 2021, 54, 33–38. [Google Scholar] [CrossRef]
Mulay, S.; Sugumaran, V.; Devasenapati, S.B. Misfire detection in I.C. engine through ARMA features using machine learning approach. Prog. Ind. Ecol. Int. J. 2018, 12, 93–111. [Google Scholar]
Rameshkumar, K.; Natarajan, K.; Krishnakumar, P.; Saimurugan, M. Machine Learning Approach for Predicting the Solid Particle Lubricant Contamination in a Spherical Roller Bearing. IEEE Access 2024, 12, 78680–78700. [Google Scholar] [CrossRef]
Ghajar, M.; Kakaee, A.H.; Mashadi, B. Semi-empirical modeling of volumetric efficiency in engines equipped with variable valve timing system. J. Cent. South Univ. 2016, 23, 3132–3142. [Google Scholar] [CrossRef]
Firmino, R.; da Silva, F.G.; dos Santos, A.C. A comparative study of acoustic and vibration signals for bearing fault detection and diagnosis based on MSB analysis. Measurement 2020, 154, 107465. [Google Scholar]
Kumar, N.; Sakthivel, G.; Jegadeeshwaran, R.; Sivakumar, R.; Kumar, S. Vibration Based IC Engine Fault Diagnosis Using Tree Family Classifiers—A Machine Learning Approach. In Proceedings of the 2019 IEEE International Symposium on Smart Electronic Systems (iSES) (Formerly iNiS), Rourkela, India, 16–18 December 2019; pp. 225–228. [Google Scholar]
Ferrari, A.; Paolicelli, F. A virtual injection sensor by means of time frequency analysis. Mech. Syst. Signal Process. 2019, 116, 832–842. [Google Scholar] [CrossRef]
Komorska, I.; Wołczyński, Z.; Borczuch, A. Diagnosis of sensor faults in a combustion engine control system with the artificial neural network. Diagnostyka 2019, 20, 19–25. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, Z.; Zuo, X.; Zhao, H. Identification of wear mechanisms of main bearings of marine diesel engine using recurrence plot based on CNN model. Wear 2023, 520, 204656. [Google Scholar] [CrossRef]
Mu, J.; Liu, X.; Chen, Y.; Zhang, H. Fault diagnosis using MICEEMD-PWVD and TD-2DPCA for valve clearance in combustion engines. J. Sound Vib. 2021, 500, 116003. [Google Scholar]
Sumanth, M.N.; Murugesan, S. Experimental Investigation of Wall Wetting Effect on Hydrocarbon Emission in Internal Combustion Engine. IOP Conf. Ser. Mater. Sci. Eng. 2019, 577, 012029. [Google Scholar] [CrossRef]
Vichi, G.; Becciani, M.; Stiaccini, I.; Ferrara, G.; Ferrari, L.; Bellissima, A.; Asai, G. Analysis of the Turbocharger Speed to Estimate the Cylinder-to-Cylinder Injection Variations-Part 2-Frequency Domain Analysis; SAE Technical Paper 2016-32-0085; SAE International: Detroit, MI, USA, 2016. [Google Scholar]
Kang, J.; Zhang, X.; Zhang, D.; Liu, Y. A Condition-Monitoring Approach for Diesel Engines Based on an Adaptive VMD and Sparse Representation Theory. Energies 2022, 15, 3315. [Google Scholar] [CrossRef]
Tao, J.; Qin, C.; Li, W.; Liu, C. Intelligent Fault Diagnosis of Diesel Engines via Extreme Gradient Boosting and High-Accuracy Time–Frequency Information of Vibration Signals. Sensors 2019, 19, 3280. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Bi, F.; Cheng, J.; Tang, D.; Shen, P.; Bi, X. A Multiple Attention Convolutional Neural Networks for Diesel Engine Fault Diagnosis. Sensors 2024, 24, 2708. [Google Scholar] [CrossRef] [PubMed]
Becciani, M.; Romani, L.; Vichi, G.; Bianchini, A.; Asai, G.; Minamino, R.; Bellissima, A.; Ferrara, G. Innovative Control Strategies for the Diagnosis of Injector Performance in an Internal Combustion Engine via Turbocharger Speed. Energies 2019, 12, 1420. [Google Scholar] [CrossRef]
Shahid, S.M.; Ko, S.; Kwon, S. Real-time abnormality detection and classification in diesel engine operations with convolutional neural network. Expert Syst. Appl. 2022, 192, 116233. [Google Scholar]
Radhika, N.; Sabarinathan, M.; Jen, T.-C. Machine learning based prediction of Young’s modulus of stainless steel coated with high entropy alloys. Results Mater. 2024, 23, 100607. [Google Scholar] [CrossRef]
Merkisz-Guranowska, A.; Waligórski, M. Recognition and Separation Technique of Fault Sources in Off-Road Diesel Engine Based on vibroacoustic Signal. J. Vib. Eng. Technol. 2018, 6, 263–271. [Google Scholar] [CrossRef]
Chen, L.; Zhang, Y.; Wang, M.; Li, P.; Liu, Q. Statistical pattern recognition using DSM (Dislocation Superimposed Method) for fault isolation: Extracting impulsive fault components from acoustic signals in gasoline engines. Hybrid J. Engine Fault Anal. 2013. [Google Scholar]
Shan, C.; Chin, C.S.; Mohan, V.; Zhang, C. Review of various machine learning approaches for predicting parameters of lithium-ion batteries in electric vehicles. Batteries 2024, 10, 181. [Google Scholar] [CrossRef]
Zhang, K.; Wu, J.; Li, S.; Chen, Y.; Zhao, X. Hybrid methods using EMD-SVM for fault diagnosis: Combining signal decomposition with machine learning for rolling bearing fault diagnosis. J. Roll. Bear. Fault Diagn. 2022. [Google Scholar]
Xie, C.; Wang, Y.; MacIntyre, J.; Sheikh, M.; Elkady, M. Using Sensors Data and Emissions Information to Diagnose Engine’s Faults. Int. J. Comput. Intell. Syst. 2018, 11, 1142–1152. [Google Scholar] [CrossRef]
Ftoutou, E.; Chouchane, M. Diesel Engine Injection Faults’ Detection and Classification Utilizing Unsupervised Fuzzy Clustering Techniques. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2019, 233, 5622–5636. [Google Scholar] [CrossRef]
Kannan, N.; Saimurugan, M.; Sowmya, S.; Edinbarough, I. Enhanced quadratic discriminant analysis with sensor signal fusion for speed-independent fault detection in rotating machines. Engine IOP Conf. Ser. Meas. Sci. Technol. 2013, 34, 12. [Google Scholar]
Waligórski, M.; Batura, K.; Kucal, K.; Merkisz, J. Empirical assessment of thermodynamic processes of a turbojet engine in the process values field using vibration parameters. Measurement 2020, 158, 107702. [Google Scholar] [CrossRef]
Saimurugan, M.; Ramachandran, K.I.; Sugumaran, V.; Sakthivel, N.R. Multi component fault diagnosis of rotational mechanical system based on decision tree and support vector machine. Expert Syst. Appl. 2011, 38, 3819–3826. [Google Scholar] [CrossRef]
Li, N. Determination of knock characteristics in spark ignition engines: An approach based on ensemble empirical mode decomposition Meas. Sci. Technol. 2016, 27, 045109. [Google Scholar] [CrossRef]
Yao, X.; Wang, L.; Chen, M.; Li, J.; Zhang, Q. Hybrid methods using Gammatone filter bank and Robust ICA for noise source identification: Separating and identifying noise sources in internal combustion engines. J. Intern. Combust. Engine Noise Anal. 2017. [Google Scholar]
Merkisz-Guranowska, A.; Waligórski, M. Analysis of vibroacoustic estimators for a heavy-duty diesel engine used in sea transport in the aspect of diagnostics of its environmental impact. J. Vibroengineering 2016, 18, 1346–1357. [Google Scholar] [CrossRef]
Taghizadeh-Alisaraei, A.; Mahdavian, A. Fault detection of injectors in diesel engines using vibration time-frequency analysis. Appl. Acoust. 2019, 143, 48–58. [Google Scholar] [CrossRef]
Dayong, N.; Changle, S.; Yongjun, G.; Zengmeng, Z.; Jiaoyi, H. Extraction of fault component from abnormal sound in diesel engines using acoustic signals. Mech. Syst. Signal Process. 2016, 75, 544–555. [Google Scholar] [CrossRef]
Lilo, M.A.; Latiff, L.A.; Abu, A.B.; Mashhadany, Y.I. Vibration fault detection and classification based on the FFT and fuzzy logic. ARPN J. Eng. Appl. Sci. 2016, 11, 4633–4637. [Google Scholar]
Zhao, H.; Zhang, J.; Jiang, Z.; Wei, D.; Zhang, X.; Mao, Z. A New Fault Diagnosis Method for a Diesel Engine Based on an Optimized Vibration Mel Frequency under Multiple Operation Conditions. Sensors 2019, 19, 2590. [Google Scholar] [CrossRef] [PubMed]
Xu, Y.; Huang, B.; Yun, Y.; Cattley, R.; Gu, F.; Ball, A.D. Model Based IAS Analysis for Fault Detection and Diagnosis of IC Engine Powertrains. Energies 2020, 13, 565. [Google Scholar] [CrossRef]
Shahbaz, M.; Iqbal, J.; Mehmood, A.; Aslam, M. Model-based diagnosis using an ANN-based observer for fault-tolerant control of air-fuel ratio in internal combustion engines. J. Electr. Eng. Technol. 2021, 16, 287–299. [Google Scholar]
Bi, X.; Cao, S.; Zhang, D. A Variety of Engine Faults Detection Based on Optimized Variational Mode Decomposition–Robust Independent Component Analysis and Fuzzy C-Mean Clustering. IEEE Access 2019, 7, 27756–27768. [Google Scholar] [CrossRef]
Ramteke, S.M.; Chelladurai, H.; Amarnath, M. Diagnosis of Liner Scuffing Fault of a Diesel Engine via Vibration and Acoustic Emission Analysis. J. Vib. Eng. Technol. 2019, 8, 815–833. [Google Scholar] [CrossRef]
Liu, R.; Chen, X.; Zhang, Y.; Hu, W.; Wang, J. Hybrid methods using WPD-SVM for effective fault identification: Integrating wavelet packet decomposition and support vector machines for induction motor fault diagnosis. J. Induction Mot. Fault Diagn. 2021. [Google Scholar]
Xu, H.; Liu, F.; Wang, Z.; Ren, X.; Chen, J.; Li, Q.; Zhu, Z. A Detailed Numerical Study of NOx Kinetics in Counterflow Methane Diffusion Flames: Effects of Fuel-Side versus Oxidizer-Side Dilution. J. Combust. Vol. 2021, 2021, 6642734. [Google Scholar] [CrossRef]
Amin, A.A.; Mahmood-Ul-Hasan, K. Advanced Fault Tolerant Air-Fuel Ratio Control of Internal Combustion Gas Engine for Sensor and Actuator Faults. IEEE Access 2019, 7, 17634–17643. [Google Scholar] [CrossRef]
Amin, A.; Mahmood-ul-Hasan, K. Hybrid fault tolerant control for air–fuel ratio control of internal combustion gasoline engine using Kalman filters with advanced redundancy. Meas. Control. 2019, 52, 473–492. [Google Scholar] [CrossRef]
Alsuwian, T.; Tayyeb, M.; Amin, A.A.; Qadir, M.B.; Almasabi, S.; Jalalah, M. Design of a hybrid fault-tolerant control system for air–fuel ratio control of internal combustion engines using genetic algorithm and higher-order sliding mode control. Energies 2022, 15, 56s66. [Google Scholar] [CrossRef]
Chen, J.; Randall, R.B.; Peeters, B. Advanced diagnostic system for piston slap faults in IC engines, based on the non-stationary characteristics of the vibration signals. Mech. Syst. Signal Process. 2016, 75, 434–454. [Google Scholar] [CrossRef]
Chen, J.; Randall, R.B. Intelligent diagnosis of bearing knock faults in internal combustion engines using vibration simulation. Mech. Mach. Theory 2016, 104, 161–176. [Google Scholar] [CrossRef]
Zeng, R.; Zhang, L.; Mei, J.; Shen, H.; Zhao, H. Fault detection in an engine by fusing information from multivibration sensors. Int. J. Distrib. Sens. Netw. 2017, 13. [Google Scholar] [CrossRef]
Kang, J.; Zhang, X.; Zhang, D.; Liu, Y. Pressure drops and mixture friction factor for gas–liquid two-phase transitional flows in a horizontal pipe at low gas flow rates. Chem. Eng. Sci. 2021, 246, 117011. [Google Scholar] [CrossRef]
Awad, O.I.; Zhang, Z.; Kamil, M.; Ma, X.; Ali, O.M.; Shuai, S. Wavelet Analysis of the Effect of Injection Strategies on Cycle to Cycle Variation GDI Optical Engine under Clean and Fouled Injector. Processes 2019, 7, 817. [Google Scholar] [CrossRef]
Wang, Y.; Zhang, F.; Cui, T.; Zhou, J. Fault diagnosis for manifold absolute pressure sensor (MAP) of diesel engine based on Elman neural network observer. Chin. J. Mech. Eng. 2016, 29, 386–395. [Google Scholar] [CrossRef]
Ftoutou, E.; Chouchane, M. Injection Fault Detection of a Diesel Engine by Vibration Analysis. In Design and Modeling of Mechanical Systems—III. CMSM 2017; Lecture Notes in Mechanical Engineering; Haddar, M., Chaari, F., Benamara, A., Chouchane, M., Karra, C., Aifaoui, N., Eds.; Springer: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
Wang, J.; Zhang, Y.; Li, M.; Gao, X.; Liu, H. Statistical pattern recognition using SVESS-ITD-ICA for robust fault identification: Integrating signal denoising, decomposition, and separation methods for improved diesel engine fault diagnosis. J. Engine Fault Diagn. 2022. [Google Scholar]
Unal, H.; Yilmaz, A.; Demir, T.; Kaya, E. Statistical pattern recognition using multilayer perceptron neural network (MLPNN) with CAD signals: Achieving 100% accuracy, robustness to noise, and efficiency leveraging readily available sensor data. J. Engine Load Anal. 2023. [Google Scholar]
Chiliński, B.; Zawisza, M. Analysis of bending and angular vibration of the crankshaft with a torsional vibrations damper. J. Vibroengineering 2016, 18, 5353–5363. [Google Scholar] [CrossRef]
Wei, N.; Liu, J.; Zhang, H.; Li, X.; Wang, Y. Statistical pattern recognition using wavelet packet transform (WPT) and optimized WPT spectrum for advanced lubrication condition monitoring: Acoustic emission (AE)-based analysis for successful identification of different lubrication conditions. Unique J. Lubr. Cond. Monit. 2019. [Google Scholar]
Moosavian, A.; Najafi, G.; Nadimi, H.; Arab, M. Estimation of Engine Friction Using Vibration Analysis and Artificial Neural Network. In Proceedings of the 2017 International Conference on Mechanical, System and Control Engineering (ICMSC), St. Petersburg, Russia, 19–21 May 2017; pp. 130–135. [Google Scholar] [CrossRef]
Gao, F.; Li, H.; Li, X. Application of Stresswave Analysis Technology in Turbine Generator Condition Monitoring and Fault Diagnosis. In Proceedings of the 2020 Asia Energy and Electrical Engineering Symposium (AEEES), Chengdu, China, 29–31 May 2020; pp. 198–202. [Google Scholar]
Sripakagorn, P.; Mitarai, S.; Kosály, G.; Pitsch, H. Extinction and reignition in a diffusion flame: A direct numerical simulation study. Combust. Theory Model. 2004, 8, 657–674. [Google Scholar] [CrossRef]
Prieler, R.; Moser, M.; Eckart, S.; Krause, H.; Hochenauer, C. Machine learning techniques to predict the flame state, temperature and species concentrations in counter-flow diffusion flames operated with CH4/CO/H2-air mixtures. Fuel 2022, 326, 124915. [Google Scholar] [CrossRef]
Ma, X.; Ma, Y.; Zheng, L.; Li, Y.; Wang, Z.; Shuai, S.; Wang, J. Measurement of soot distribution in two cross-sections in a gasoline direct injection engine using laser-induced incandescence with the laser extinction method. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2019, 233, 211–223. [Google Scholar] [CrossRef]
Masri, A.R. Turbulent Combustion of Sprays: From Dilute to Dense. Combust. Sci. Technol. 2016, 188, 1619–1639. [Google Scholar] [CrossRef]
Huang, Q.; Liu, J.; Ulishney, C.; Dumitrescu, C.E. On the use of artificial neural networks to model the performance and emissions of a heavy-duty natural gas spark ignition engine. Int. J. Engine Res. 2022, 23, 1879–1898. [Google Scholar] [CrossRef]
Xu, X.; Yan, X.; Sheng, C.; Yuan, C.; Xu, D.; Yang, J. A Belief Rule-Based Expert System for Fault Diagnosis of Marine Diesel Engines. IEEE Trans. Syst. Man Cybern. Syst. 2017, 50, 656–672. [Google Scholar] [CrossRef]
Chang, L.; Xu, X.; Liu, Z.-G.; Qian, B.; Xu, X.; Chen, Y.-W. BRB Prediction with Customized Attributes Weights and Tradeoff Analysis for Concurrent Fault Diagnosis. IEEE Syst. J. 2021, 15, 1179–1190. [Google Scholar] [CrossRef]
Muhammad, S.; Rehman, A.U.; Ali, M.; Iqbal, M. Combustion event detection using a peak detection algorithm on STFT and Hilbert transform with smartphone sound recording. J. Combust. Sci. Technol. 2022, 29, 245–259. [Google Scholar]
Ahmed, R.; El Sayed, M.; Gadsden, S.A.; Tjong, J.; Habibi, S. Automotive internal-combustion-engine fault detection and classification using artificial neural network techniques. IEEE Trans. Veh. Technol. 2015, 64, 21–33. [Google Scholar] [CrossRef]
Al-Zeyadi, A.R.; Al-Kabi, M.N.; Jasim, L.A. Deep learning towards intelligent vehicle fault diagnosis. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020. [Google Scholar] [CrossRef]
Nieto González, J.P. Vehicle fault detection and diagnosis combining an AANN and multiclass SVM. Int. J. Interact. Des. Manuf. 2018, 12, 273–279. [Google Scholar] [CrossRef]
Venkatesh, S.N.; Chakrapani, G.; Senapti, S.B.; Annamalai, K.; Elangovan, M.; Indira, V.; Sugumaran, V.; Mahamuni, V.S. Misfire detection in spark ignition engine using transfer learning. Comput. Intell. Neurosci. 2022, 2022, 7606896. [Google Scholar] [CrossRef]
Canal, R.; Riffel, F.K.; Bonomo, J.P.A.; de Carvalho, R.S.; Gracioli, G. Misfire Detection in Combustion Engines Using Machine Learning Techniques. In Proceedings of the 2023 XIII Brazilian Symposium on Computing Systems Engineering (SBESC), Porto Alegre, Brazil, 21–24 November 2023. [Google Scholar]
Qin, C.; Jin, Y.; Zhang, Z.; Yu, H.; Tao, J.; Sun, H.; Liu, C. Anti-noise diesel engine misfire diagnosis using a multi-scale CNN-LSTM neural network with denoising module. CAAI Trans. Intell. Technol. 2023, 8, 963–986. [Google Scholar] [CrossRef]
Wang, X.; Zhang, P.; Gao, W.; Li, Y.; Wang, Y.; Pang, H. Misfire detection using crank speed and long short-term memory recurrent neural network. Energies 2022, 15, 300. [Google Scholar] [CrossRef]
Zhang, P.; Gao, W.; Li, Y.; Wang, Y. Misfire detection of diesel engine based on convolutional neural networks. Proc. IMechE Part D J. Automob. Eng. 2021, 235, 2148–2165. [Google Scholar] [CrossRef]
Okwuosa, C.N.; Hur, J.-W. An intelligent hybrid feature selection approach for SCIM inter-turn fault classification at minor load conditions using supervised learning. IEEE Access 2023, 11, 89907–89920. [Google Scholar] [CrossRef]
Allen-Zhu, Z.; Li, Y. Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning. arXiv 2023, arXiv:2012.09816v3. [Google Scholar]
Weng, C.; Lu, B.; Gu, Q.; Zhao, X. A Novel Multisensor Fusion Transformer and Its Application into Rotating Machinery Fault Diagnosis. IEEE Trans. Instrum. Meas. 2023, 72, 3507512. [Google Scholar] [CrossRef]
Li, X. Construction of transformer fault diagnosis and prediction model based on deep learning. J. Comput. Inf. Technol. 2022, 30, 223–238. [Google Scholar] [CrossRef]
Zhang, Z.; Deng, Y.; Liu, X.; Liao, J. Research on fault diagnosis of rotating parts based on transformer deep learning model. Appl. Sci. 2024, 14, 10095. [Google Scholar] [CrossRef]
Nascimento, E.G.S.; Liang, J.S.; Figueiredo, I.S.; Guarieiro, L.L.N. T4PDM: A Deep Neural Network Based on the Transformer Architecture for Fault Diagnosis of Rotating Machinery. arXiv 2022, arXiv:2204.03725. [Google Scholar]
Kumar, P. AI-driven Transformer Model for Fault Prediction in Non-Linear Dynamic Automotive System. arXiv 2024, arXiv:2408.12638. [Google Scholar]
Cui, D.; Hu, Y. Fault Diagnosis for Marine Two-Stroke Diesel Engine Based on CEEMDAN-Swin Transformer Algorithm. J. Fail. Anal. Preven. 2023, 23, 988–1000. [Google Scholar] [CrossRef]
Zhou, A.; Barati Farimani, A. Faultformer: Transformer-Based Prediction of Bearing Faults. SSRN [Preprint]. 2021. Available online: https://ssrn.com/abstract=4620618 (accessed on 10 February 2025). [CrossRef]
Zhong, D.; Xia, Z.; Zhu, Y.; Duan, J. Overview of Predictive Maintenance Based on Digital Twin Technology. Heliyon 2023, 9, e14534. [Google Scholar] [CrossRef] [PubMed]
Tran, V.D.; Sharma, P.; Nguyen, L.H. Digital Twins for Internal Combustion Engines: A Brief Review. J. Emerg. Sci. Eng. 2023, 1, 29–35. [Google Scholar] [CrossRef]
Wang, D.; Li, Y.; Lu, C.; Men, Z.; Zhao, X. Research on Digital Twin-Assisted Dual-Channel Parallel Convolutional Neural Network-Transformer Rolling Bearing Fault Diagnosis Method. Proc. IMechE Part B J. Eng. Manuf. 2024. [Google Scholar] [CrossRef]
Dong, Y.; Li, Y. Fault Diagnosis of Computer Numerical Control Machine Tools Table Feed System Based on Digital Twin and Machine Learning. Diagnostyka 2024, 25, 2024414. [Google Scholar] [CrossRef]
Reitenbach, S.; Ebel, P.-B.; Grunwitz, C.; Siggel, M.; Pahs, A. Digital Transformation in Aviation: An End-to-End Digital Twin Approach for Aircraft Engine Maintenance–Exploration of Challenges and Solutions. In Proceedings of the Global Power Propulsion Society Conference, Chania, Greece, 4–6 September 2024. Paper gpps24-tc-206. [Google Scholar] [CrossRef]
Acanfora, M.; Mocerino, L.; Giannino, G.; Campora, U. A Comparison between Digital- Twin Based Methodologies for Predictive Maintenance of Marine Diesel Engine. In Proceedings of the 2024 International Symposium on Power Electronics, Electrical Drives, Automation and Motion (SPEEDAM), Napoli, Italy, 19–21 June 2024; pp. 1254–1259. [Google Scholar] [CrossRef]
Liu, S.; Qi, Y.; Gao, X.; Liu, L.; Ma, R. Transfer Learning-Based Multiple Digital Twin-Assisted Intelligent Mechanical Fault Diagnosis. Meas. Sci. Technol. 2023, 35, 025133. [Google Scholar] [CrossRef]
Ogundokun, R.O.; Misra, S.; Maskeliunas, R.; Damasevicius, R. A Review on Federated Learning and Machine Learning Approaches: Categorization, Application Areas, and Blockchain Technology. Information 2022, 13, 263. [Google Scholar] [CrossRef]
Li, T.; Sahu, A.K.; Talwalkar, A.; Smith, V. Federated Learning: Challenges, Methods, and Future Directions. IEEE Signal Process. Mag. 2020, 37, 50–60. [Google Scholar] [CrossRef]
Vijayalakshmi, K.; Amuthakkannan, R.; Ramachandran, K.; Rajkavin, S.A. Federated Learning-Based Futuristic Fault Diagnosis and Standardization in Rotating Machinery. SSRG Int. J. Electron. Commun. Eng. 2024, 11, 223–236. [Google Scholar] [CrossRef]
Ge, Y.; Ren, Y. Federated Transfer Fault Diagnosis Method Based on Variational Auto-Encoding with Few-Shot Learning. Mathematics 2024, 12, 2142. [Google Scholar] [CrossRef]
Zhang, Y.; Xue, X.; Zhao, X.; Wang, L. Federated Learning for Intelligent Fault Diagnosis Based on Similarity Collaboration. Meas. Sci. Technol. 2022, 34, 045103. [Google Scholar] [CrossRef]
Li, Z.; Li, Z.; Gu, F. Intelligent Diagnosis Method for Machine Faults Based on Federated Transfer Learning. Appl. Soft Comput. 2024, 163, 111922. [Google Scholar] [CrossRef]
Guo, Y.; Zhang, J.; Sun, B.; Wang, Y. Adversarial Deep Transfer Learning in Fault Diagnosis: Progress, Challenges, and Future Prospects. Sensors 2023, 23, 7263. [Google Scholar] [CrossRef] [PubMed]
Yang, G.; Su, J.; Du, S.; Duan, Q. Federated Transfer Learning-Based Distributed Fault Diagnosis Method for Rolling Bearings. Meas. Sci. Technol. 2024, 35, 126111. [Google Scholar] [CrossRef]
Han, T.; Liu, C.; Wu, R.; Jiang, D. Deep transfer learning with limited data for machinery fault diagnosis. Appl. Soft Comput. J. 2021, 103, 107150. [Google Scholar] [CrossRef]
Wang, R.; Yan, F.; Yu, L.; Shen, C.; Hu, X.; Chen, J. A Federated Transfer Learning Method with Low-Quality Knowledge Filtering and Dynamic Model Aggregation for Rolling Bearing Fault Diagnosis. Mech. Syst. Signal Process. 2023, 198, 110413. [Google Scholar] [CrossRef]
Yang, W.; Yu, G. Federated Multi-Model Transfer Learning-Based Fault Diagnosis with Peer-to-Peer Network for Wind Turbine Cluster. Machines 2022, 10, 972. [Google Scholar] [CrossRef]
Khelfi, H.; Hamdani, S.; Nacereddine, K.; Chibani, Y. Stator current demodulation using Hilbert transform for inverter-fed induction motor at low load conditions. In Proceedings of the 2018 International Conference on Electrical Sciences and Technologies in Maghreb (CISTEM) 2018, Algiers, Algeria, 28–31 October 2018; pp. 1–5. [Google Scholar] [CrossRef]
Faleh, A.; Laib, A.; Bouakkaz, A.; Mennai, N. New induction motor fault detection method at no-load level employed to start diagnosis of anomalies. In Proceedings of the 2023 2nd International Conference on Electronics 2023, Energy and Measurement (IC2EM), Medea, Algeria, 28–29 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
Kochukrishnan, P.; Rameshkumar, K.; Srihari, S. Piston slap condition monitoring and fault diagnosis using machine learning approach. SAE Int. J. Engines 2023, 16, 923–942. [Google Scholar] [CrossRef]
LeNail, A. NN-SVG: Publication-ready neural network architecture schematics. J. Open-Source Softw. 2019, 4, 747. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gómez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
Ma, C.; Gao, J.; Wang, Z.; Liu, M.; Zou, J.; Zhao, Z.; Yan, J.; Guo, J. Data-Driven Feature Extraction-Transformer: A Hybrid Fault Diagnosis Scheme Utilizing Acoustic Emission Signals. Processes 2024, 12, 2094. [Google Scholar] [CrossRef]

Figure 1. Function block diagram of air–fuel ratio fault-tolerant system of ICEs.

Figure 2. Function block diagram of piston fault diagnosis system of ICEs.

Figure 3. Valve fault diagnosis process of ICEs.

Figure 4. Functional block diagram of bearing fault diagnosis of ICEs.

Figure 5. Sensor fault diagnosis of ICEs.

Figure 6. Ignition fault diagnosis in ICEs.

Figure 7. Injection fault diagnosis of ICEs.

Figure 8. Hybrid fault diagnosis of ICEs.

Figure 9. Engine load classifiers based on ANN.

Figure 10. Other researchers’ unique research contributions in the fault diagnosis of ICEs [17,23,33,50,57,59,65,80,81,82,83].

Figure 11. Engine experimental setup. (a) A—Computer for Acquiring Data, B—NI Data Acquisition (DAQ) Hardware, C—Ambassador Four Cylinder Engine, D—Engine Electric Dynamometer, E—Eclectic Loadcell for the controlling of the 0, 15, and 30% load, F—Cylinder Cutoff switch for each cylinder, (b) G—Tri-accelerometer (Vibration) Sensor in the middle of the Engine Head position, H—Microphone.

Figure 12. Methodology workflow for machine learning-based vibration analysis and classification of a 4-cylinder Ambassador engine cylinder cutoff fault diagnosis.

Figure 13. Tri-accelerometer signal graph: (a) 0% load condition, (b) 15% load condition, (c) 30% load condition vs. normal, first, second, third, fourth cylinder cutoff condition.

Figure 14. Proposed DNN architecture: (a) neural network diagram representation [130], (b) neural network block diagram representation.

Figure 15. Proposed 1D-CNN architecture diagram [130].

Figure 16. Proposed transformer architecture: (a) complete architecture block diagram representation, (b) detailed transformer architecture [131].

Figure 17. Proposed hybrid Transformer-DNN architecture block diagram representation.

Figure 18. Proposed hybrid DNN model architecture neural network diagram representation [130].

Figure 19. Confusion matrix.

Figure 20. Accuracy heatmap for three load conditions: (a) 0%, (b) 15%, and (c) 30% with 16 classifiers and 7 feature selection techniques.

Figure 21. Total cost heatmap for three load conditions: (a) 0%, (b) 15%, and (c) 30% with 16 classifiers and 7 feature selection techniques.

Figure 22. Feature extraction F1 score for 0% load condition: (a) ANOVA, (b) Chi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 23. Feature extraction F1 score for 15% load condition: (a) ANOVA, (b) Chi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 24. Feature extraction F1 score for 30% load condition: (a) ANOVA, (b) Chi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 25. Heatmap plot with 0% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 26. Heatmap plot with 15% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 27. Heatmap plot with 30% load condition for (a) ANOVA, (b) CHi2, (c) Kruskal–Wallis, (d) MRMR, (e) PCA, (f) ReliefF, (g) RAW.

Figure 28. Confusion matrix of DNN architecture: (a) Zero percent load, (b) 15% load, (c) 30% load.

Figure 29. Confusion matrix of 1D-CNN architecture: (a) Zero percent load, (b) 15% load, (c) 30% load.

Figure 30. Confusion matrix of Transformer architecture: (a) Zero percent load, (b) 15% load, (c) 30% load.

Figure 31. Accuracy vs. load plot for the proposed Transformer architecture.

Figure 32. Confusion matrix of hybrid Transformer +DNN architecture: (a) Zero percent load, (b) 15% load, (c) 30% load.

Figure 33. Accuracy vs. load plot for the proposed hybrid Transformer +DNN architecture.

Table 1. Classification of the research papers based on the methods for diagnosing faults in ICEs.

Statistical Pattern Recognition		Model Based Diagnosis	Expert Systems		Hybrid Methods
Iqbal 2022 [3]	Sakthivel 2010 [4]	Haneef 2017 [5]	Leon 2018 [6]	Roy 2019 [7]	Moosvian 2016 [8]
Zhang 2019 [9]	Jafarian 2018 [10]	Sripakagorn 2004 [11]	Cao 2018 [12]	Kumar 2023 [13]	Ftoutou 2017 [14]
Moosavian 2017 [15]	McMahan 2018 [16]	Hoffman 2016 [17]	Zheng 2016 [18]	Zhao 2024 [19]	Khoualdia 2019 [20]
Jiang 2017 [21]	Moosavian 2014 [22]	Kemalkar 2016 [23]	Gritsenko 2020 [24]		Ates 2023 [25]
Figlus 2016 [26]	Hesari 2022 [27]	Wang 2021 [28]	Mulay 2018 [29]		Rameshkumar 2024 [30]
Ghajar 2016 [31]	Firmino 2020 [32]	Kumar 2019 [33]	Ferrari 2019 [34]		Komorska 2019 [35]
Zhou 2023 [36]	Mu 2021 [37]	Sumanth 2019 [38]	Vichi 2016 [39]		Kang 2022 [40]
Tao 2019 [41]	Zhang 2023 [42]		Becciani 2019 [43]		Stojanovic 2016 [11]
Shahid 2022 [44]	Radhika 2024 [45]		Guranowska 2018 [46]		Chen 2013 [47]
Yang 2022 [40]	Shan 2024 [48]		Wu 2022 [49]		Xie 2018 [50]
Ftoutou 2018 [51]	Kannan 2013 [52]		Waligorski 2020 [53]		Sugumaran 2011 [54]
Li 2016 [55]			Yao 2017 [56]		Guranowska 2016 [57]
Alisaraei 2019 [58]			Dayong 2016 [59]		Lilo 2016 [60]
Zhao 2019 [61]			Xu 2020 [62]		Shahbaz 2021 [63]
Bi 2019 [64]			Ramteke 2019 [65]
Liu 2021 [66]			Xu 2021 [67]

Table 2. Technical specification of the engine under study.

Specification	Value
Manufacturer	Hindustan Motors—Ambassador
Brake Power	10 hp or 7.35 kW
Speed Range	500–5500 rpm
Number of Cylinders	Four
Bore	73.02 mm
Stroke	88.9 mm
Cycle of Operation	Four strokes

Table 3. Classifiers used for the Ambassador engine cylinder fault Experimental Investigations.

Count	Classifier	Classifier Name	Classifier Description	Preferred Algorithm	Justification
1	Decision tree	Fine tree (maximum number of splits: 100)	Tree-based classifier using fine splitting	Fine	Fine splitting provides more detailed and accurate decision boundaries compared to coarse or medium.
2, 3	Discriminant	Linear	Linear Discriminant analysis classifier	Linear	Linear Discriminant is simpler and often works well with linearly separable data.
2, 3	Discriminant	Optimizable	Discriminant analysis classifier	N/A	Optimization enhances the discriminant’s performance without favouring a specific variant.
4, 5	Naive bayes	Gaussian naïve	Assuming Gaussian distribution	Gaussian	Assumes normal distribution of features, suitable for Gaussian-distributed data.
4, 5	Naive bayes	Kernel naïve bayes	Using kernel methods	Kernel	Kernel methods allow for non-linear decision boundaries, useful for non-linearly separable data.
6, 7	SVM	Linear SVM	Linear Support Vector Machine classifier	Linear	Suitable for linearly separable data and provides good generalization.
6, 7	SVM	Fine gaussian SVM (Gaussian Kernel with kernel scale: sqrt(P)/4)	Support Vector Machine classifier	Fine Gaussian	Fine Gaussian kernel provides detailed decision boundaries for complex data distributions.
8, 9	KNN	Fine KNN (k: 1)	K-Nearest Neighbors classifier	Fine	Fine-tuning offers more precise classification by considering a smaller neighborhood.
8, 9	KNN	Coarse KNN (k: 100)	K-Nearest Neighbors classifier	Coarse	Coarse tuning considers a larger neighborhood, useful for smoother decision boundaries.
10, 11	Ensemble	Boosted trees (learners: 30)	Ensemble classifier using boosted decision trees	N/A	Boosted trees combine weak learners to improve overall classification accuracy.
10, 11	Ensemble	Bagged trees (learners: 30)	Ensemble classifier using bagged decision trees	N/A	Bagged trees reduce overfitting by averaging predictions from multiple trees.
12–16	Neural network	Narrow NN-01 (Layers: 01 Neurons: 10)	NN classifier	Narrow	Narrow architectures are simpler and less prone to overfitting, suitable for small datasets.
		Narrow NN-03 (Layers: 03 Neurons: 10)	NN classifier	Narrow	Three-layered narrow networks capture more complex patterns while remaining relatively simple.
		Wide NN (Layers: 01 Neurons: 100)	NN classifier	Wide	Wide architectures capture complex patterns by having more neurons in the hidden layer.
		Bilayer NN (Layers: 02 Neurons: 10)	NN classifier	Bilayer	Bilayer networks strike a balance between complexity and generalization.
		Tri-layered NN (Layers: 03 Neurons: 10)	NN classifier	Tri-layered	Tri-layered networks can capture highly complex patterns but may be prone to overfitting.

Table 4. DNN architecture parameters.

Layer Name	Output Size	Parameters
Input	36 features	Feature Input
FC1	16	Fully Connected Layer
Dropout1	-	10% dropout
LayerNorm1	-	Layer Normalization
ReLU1	-	ReLU Activation
FC2	15	Fully Connected Layer
Dropout2	-	10% dropout
LayerNorm2	-	Layer Normalization
FC3	10	Fully Connected Layer
ReLU2	-	ReLU Activation
FC4	5	Fully Connected Layer
SoftMax	-	SoftMax Activation
Output	-	Classification Output (Cross-Entropy Loss)

Table 5. CNN architecture parameters.

Layer Name	Output Size	Parameters
Input Layer	-	Dense (144)
Dense1	144	Linear
Reshape	(18, 8, 16)	Converts Dense1 output to 3D tensor (height, width, channels)
Conv1	(18, 8, 16)	1D Convolution: 161 parameters
Average Pool	-	Adaptive Average Pooling
Conv2	(9, 16, 16)	1D Convolution: 3.1 K parameters
Conv3	(9, 16, 16)	1D Convolution: 3.1 K parameters
Conv4	(9, 16, 16)	1D Convolution: 161 parameters
Average Pool	-	Avg Pool
Flatten	-	Converts 3D tensor to 1D
BatchNorm2	-	Batch Normalization: 576 parameters
Dense2	1.4 K	Fully Connected Layer
Loss Function	-	BCEWithLogitsLoss

Table 6. Transformer architecture parameters of the hybrid Transformer and DNN model.

Component	Value	Details
Input Projection	input_size → 64	Linear projection to d_model
Embedding Dimension	64	d_model dimension
Number of Heads	4	Multi-head attention
Transformer Layers	2	Number of encoder layers
Feed-Forward Dimension	128	Transformer feed-forward dim
Dropout Rate	0.1	In transformer encoder
Positional Encoding	max_len = 5000	Sinusoidal encoding

Table 7. DNN architecture of the hybrid Transformer and DNN model.

Layer	Output Shape	Parameters	Activation
Linear-1	128	d_model × 128 + 128	ReLU
Dropout-1	128	-	(p = 0.2)
Linear-2	64	128 × 64 + 64	ReLU
Dropout-2	64	-	(p = 0.1)
Linear-3	32	64 × 32 + 32	ReLU
Linear-4	num_classes	32 × num_classes	-

Table 8. Training parameters of the hybrid Transformer and DNN model.

Parameter	Value
Optimizer	AdamW
Learning Rate	0.001
Weight Decay	0.00001
Loss Function	CrossEntropyLoss
LR Scheduler	ReduceLROnPlateau
Scheduler Patience	10
Scheduler Factor	0.5
Batch Size	32C
Number of Epochs	2000
Input Sequence Length	1 (unsqueezed input)

Table 9. Feature selection techniques adopted for the classifiers.

S. No.	Feature	Feature Description	Significance of Selection	Advantages	Algorithm Background
1	RAW	Unprocessed data	Baseline comparison, captures all information (potentially redundant)	Useful for initial exploration, but high dimensionality can hinder analysis and model performance	N/A
2	PCA	Principal Component Analysis	Reduces dimensionality while preserving essential variance, improves computational efficiency and interpretability	Reduces noise and redundancy, focuses on informative features, enhances visualization and classification	Eigenvectors and eigenvalues of the covariance matrix
3	MRMR	Minimum Redundancy Maximum Relevance	Selects features with high relevance to class labels (cylinder state) and low redundancy amongst themselves	Improves classification accuracy by focusing on discriminative features, avoids overfitting with redundant information	Maximizes mutual information between features and class labels while minimizing redundancy between selected features
4	Chi-squared (x²)	Chi-squared test	Measures statistical independence between features and class labels, identifies features with significant influence	Highlights features directly impacting cylinder state, aids in understanding feature importance	Computes x statistic for each feature, selects features with high x² values indicating dependence on class labels
5	ReliefF	Relief Feature Filter	Assesses statistical significance of feature variations across different cylinder states (normal operation vs. cutoff)	Identifies features with statistically significant differences between cylinder conditions, supports understanding of contributing factors	Computes F-statistic to test for significant differences in feature means among different classes
6	ANOVA	Analysis of Variance	Non-parametric test for significant differences in feature distributions between multiple cylinder states (normal operation, cutoff for each cylinder)	Detects non-linear relationships and outliers that might be missed by ANOVA, useful for robust feature selection with diverse data	Calculates Kruskal–Wallis H statistic to test for significant differences in feature ranks across multiple groups
7	Kruskal–Wallis	Kruskal–Wallis’s test	Non-parametric test for significant differences in feature distributions between multiple cylinder states (normal operation, cutoff for each cylinder)	Detects non-linear relationships and outliers that might be missed by ANOVA, useful for robust feature selection with diverse data	Calculates Kruskal–Wallis H statistic to test for significant differences in feature ranks across multiple groups

Table 10. Accuracy tread analysis under different load conditions and feature selection techniques.

Classifier	Accuracy Trend (0–15–30% Load)	Highest Accuracy Load % (Feature Selection)
Fine tree	Decrease	0% (Chi2, ReliefF)
Linear Discriminant	High and Stable	15% (PCA)
Optimizable Discriminant	Slight Decrease	0% (RAW, Chi2)
Gaussian naive bayes	Relatively High and Stable	15% and 30% (RAW)
Kernel naive bayes	Slight Increase	30% (Chi2)
Linear SVM	High and Stable	15% (PCA)
Fine gaussian SVM	Slight Increase	30% (MRMR)
Fine KNN	Decrease	0% (Chi2, ReliefF)
Coarse KNN	Decrease	0% (MRMR)
Boosted trees	Decrease	15% (RAW, Chi2, ReliefF, ANOVA)
Bagged trees	High and Stable	15% (RAW, Chi2, ReliefF, ANOVA)
Narrow NN-01	Increase	30% (MRMR)
Narrow NN-03	Increase	30% (Chi2)
Wide NN	High and Stable	15% (MRMR)
Network bilayer neural	Slight Increase	30% (MRMR)
Tri-layered NN	Increase	30% (Kruskal–Wallis)

Table 11. Cost or Miscalculation Matrix for all classifiers for the total cost calculation.

True class	Predicted Class
		Cyl_01	Cyl_02	Cyl_03	Cyl_04	Cyl_ALL
	Cyl_01	0	1	1	1	1
	Cyl_02	1	0	1	1	1
	Cyl_03	1	1	0	1	1
	Cyl_04	1	1	1	0	1
	Cyl_ALL	1	1	1	1	0

Table 12. Total cost tread analysis under different load conditions and feature selection techniques.

Classifier	Cost Trend (0–15–30% Load)	Lowest Cost (Load, Feature Selection)
Fine tree	Slight Increase	0% (Chi2, ReliefF)
Linear Discriminant	Decrease	15% (PCA)
Optimizable Discriminant	Increase (0–15%), Decrease (30%)	0% (RAW, Chi2)
Gaussian naïve bayes	Relatively Constant	0% (MRMR)
Kernal naïve bayes	Decrease (0–15%), Increase (30%)	15% (MRMR)
Linear SVM	Decrease	15% (PCA)
Fine gaussian SVM	Decrease (0–15%), Increase (30%)	15% (MRMR)
Fine KNN	Increase	0% (MRMR)
Coarse KNN	Increase	0% (MRMR)
Boosted trees	Increase	0% (RAW, Chi2, ReliefF, ANOVA)
Bagged trees	Decrease (0–15%), Increase (30%)	15% (RAW, Chi2, ReliefF, ANOVA)
Narrow NN-01	Decrease (0–15%), Increase (30%)	15% (MRMR)
Narrow NN-03	Decrease (0–15%), Increase (30%)	15% (Chi2)
Wide NN	Decrease	15% (PCA)
Bilayer NN	Decrease (0–15%), Increase (30%)	15% (MRMR)
Tri-layered NN	Decrease (0–15%), Increase (30%)	15% (Kruskal–Wallis)

Table 13. Overall performance summary of classifiers—accuracy and total cost.

Classifier	Accuracy Trend	Lowest Cost (Load, Feature Selection)	Overall Performance
Fine tree	Decrease	0% (Chi2, ReliefF)	Mid Performer (Moderate accuracy and cost)
Linear Discriminant	High and Stable	15% (PCA)	Best Performer (High accuracy and low cost)
Optimizable Discriminant	Slight Decrease (0–15%), Increase (30%)	0% (RAW, Chi2)	Mid Performer (Moderate accuracy and cost)
Gaussian naïve bayes	Relatively Constant	0% (MRMR)	Mid Performer (Moderate accuracy and cost)
Kernal naïve bayes	Slight Decrease (0–15%), Increase (30%)	15% (MRMR)	Mid Performer (Moderate accuracy and cost)
Linear SVM	High and Stable	15% (PCA)	Best Performer (High accuracy and low cost)
Fine gaussian SVM	Slight Decrease (0–15%), Increase (30%)	15% (MRMR)	Mid Performer (Moderate accuracy and cost)
Fine KNN	Decrease	0% (MRMR)	Low Performer (Low accuracy and high cost)
Coarse KNN	Decrease	0% (MRMR)	Low Performer (Low accuracy and high cost)
Boosted trees	Increase	0% (RAW, Chi2, ReliefF ANOVA)	Low Performer (Lower accuracy and higher cost)
Bagged trees	Decrease (0–15%), Increase (30%)	15% (RAW, Chi2, ReliefF ANOVA)	Mid Performer (Moderate accuracy and cost)
Narrow NN-01	Decrease (0–15%), Increase (30%)	15% (MRMR)	Mid Performer (Moderate accuracy and cost)
Narrow NN-03	Decrease (0–15%), Increase (30%)	15% (Chi2)	Mid Performer (Moderate accuracy and cost)
Wide neural network	Decrease	15% (PCA)	Mid Performer (Moderate accuracy and cost)
Bilayer NN	Decrease (0–15%), Increase (30%)	15% (MRMR)	Mid Performer (Moderate accuracy and cost)
Tri-layered NN	Decrease (0–15%), Increase (30%)	15% (Kruskal–Wallis)	Mid Performer (Moderate accuracy and cost)

Table 14. Overview of the 0% load classifier analysis.

Feature Selection Method	Top Performers	Bottom Performer
ANOVA	Linear SVM, Linear Discriminant	Coarse KNN
CHi2	Linear Discriminant, Optimizable Discriminant	Kernal naive bayes
Kruskal–Wallis	Linear SVM, Bagged Trees	Coarse KNN
MRMR	Linear Discriminant, Linear SVM	Coarse KNN
ReliefF	Linear Discriminant, Linear SVM	Coarse KNN
PCA	Linear SVM, Linear Discriminant	Fine tree
RAW	Linear Discriminant, Optimizable Discriminant	Kernel naive bayes

Table 15. Overview of the 15% load classifier analysis.

Feature Selection Method	Top Performers	Bottom Performer
ANOVA	Linear SVM, Linear Discriminant	Coarse KNN
CHi2	Linear SVM, Bagged Trees	Coarse KNN
Kruskal–Wallis	Linear Discriminant, Linear SVM	Coarse KNN
MRMR	Linear Discriminant, Linear SVM	Coarse KNN
ReliefF	Linear Discriminant, Linear SVM	Coarse KNN
PCA	Linear Discriminant, Linear SVM	Fine Tree
RAW Data	Linear Discriminant, Linear SVM	Coarse KNN

Table 16. Overview of the 30% load classifier analysis.

Feature Selection Method	Top Performers	Bottom Performer
ANOVA	Linear Discriminant, Linear SVM	Coarse KNN
CHi2	Linear Discriminant, Linear SVM	Coarse KNN
Kruskal–Wallis	Linear Discriminant, Linear SVM	Coarse KNN
MRMR	Kernal Naive Bayes, Narrow NN-01	Coarse KNN
ReliefF	Linear Discriminant, Linear SVM	Coarse KNN
PCA	Linear Discriminant, Linear SVM	Fine Tree
RAW	Linear Discriminant, Linear SVM	Coarse KNN

Table 17. Minimum performing classifier and cylinder identification with respect to the feature selection of the 0% load RoC AuC study.

Feature Selection Method	Classifier	AuC Value	Cylinder
RAW	FINE KNN	0.9147	Cylinder-04
PCA	FINE TREE	0.8057	Cylinder-04
MRMR	NARROW NN -03	0.9123	Cylinder-03
Chi2	FINE KNN	0.9272	Cylinder-03
ReliefF	FINE KNN	0.9063	Cylinder-04
ANOVA	FINE TREE	0.8966	Cylinder-04
Kruskal–Wallis	FINE KNN	0.9001	Cylinder-03

Table 18. Minimum performing classifier and cylinder identification with respect to the feature selection of the 15% load RoC AuC study.

Feature Selection Method	Classifier	AuC Value	Cylinder
RAW	FINE KNN	0.9205	Cylinder-01
PCA	FINE TREE	0.7982	Cylinder-04
MRMR	NARROW NN-03	0.9285	Cylinder-02
Chi2	FINE KNN	0.9190	Cylinder-01
ReliefF	FINE KNN	0.9148	Cylinder-01
ANOVA	FINE TREE	0.9205	Cylinder-01
Kruskal–Wallis	FINE KNN	0.9192	Cylinder-04

Table 19. Minimum performing classifier and cylinder identification with respect to the feature selection of the 30% load RoC AuC study.

Feature Selection Method	Classifier	AuC Value	Cylinder
RAW	FINE KNN	0.8839	Cylinder-02
PCA	FINE TREE	0.7787	Cylinder-04
MRMR	NARROW NN -03	0.8108	Cylinder-02
Chi2	FINE KNN	0.8839	Cylinder-02
ReliefF	FINE KNN	0.8820	Cylinder-02
ANOVA	FINE TREE	0.8879	Cylinder-02
Kruskal–Wallis	FINE KNN	0.8961	Cylinder-02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Srinivaas, A.; Sakthivel, N.R.; Nair, B.B. Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation. Informatics 2025, 12, 25. https://doi.org/10.3390/informatics12010025

AMA Style

Srinivaas A, Sakthivel NR, Nair BB. Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation. Informatics. 2025; 12(1):25. https://doi.org/10.3390/informatics12010025

Chicago/Turabian Style

Srinivaas, A., N. R. Sakthivel, and Binoy B. Nair. 2025. "Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation" Informatics 12, no. 1: 25. https://doi.org/10.3390/informatics12010025

APA Style

Srinivaas, A., Sakthivel, N. R., & Nair, B. B. (2025). Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation. Informatics, 12(1), 25. https://doi.org/10.3390/informatics12010025

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Approaches for Fault Detection in Internal Combustion Engines: A Review and Experimental Investigation

Abstract

1. Introduction

2. Literature Review

2.1. Air–Fuel Ratio

2.2. Piston

2.3. Valve

2.4. Bearing

2.5. Sensor

2.6. Ignition

2.7. Injection

2.8. Hybrid

2.9. Engine Load

2.10. Others

2.11. Combustion

2.12. Deep Learning Approaches in Fault Diagnosis

Advanced Deep Learning Technique

3. Experimental Investigations

3.1. Experimental Setup

3.2. Methodology

3.3. Sensor Signal Graph

3.4. Classifiers Details

3.4.1. Decision Tree Classifier

3.4.2. Discriminant Classifiers

3.4.3. Naive Bayes Classifier

3.4.4. Support Vector Machine (SVM) Classifiers

3.4.5. K-Nearest Neighbors (KNN) Classifier

3.4.6. Ensemble Machine Learning Classifier

3.4.7. Neural Network Classifiers

Proposed DNN Architecture

Proposed 1D-CNN Architecture

Proposed Transformer Architecture

Positional Encoding

Self-Attention

Multi-Headed Attention Mechanism

Encoder Layer

Decoder Layer

Proposed Hybrid Model Combining Transformers and Deep Neural Networks (DNNs) Architecture for Engine Fault Diagnosis

3.5. Importance of Data Transformation and Feature Selection

3.5.1. Feature Ranking

Chi2

ANOVA

ReliefF Iterates Through Data Points

MRMR Utilizes Two Measures to Select Features

Kruskal–Wallis’s Test

4. Result Metrics Calculation and Discussion

4.1. Confusion Matrix

4.1.1. Accuracy

4.1.2. Total Cost

4.1.3. F1 Score

4.1.4. RoC Curve

0% Load Condition ROC Curve AuC Study

15% Load Condition RoC Curve AuC Study

30% Load Condition RoC Curve AuC Study

4.2. Deep Learning Approach Result Discussion

4.2.1. Proposed DNN Architecture Performance Evaluation

4.2.2. Proposed 1D-CNN Architecture Performance Evaluation

4.2.3. Proposed Transformer Architecture Performance Evaluation

4.2.4. Proposed Hybrid—Transformer and DNN Architecture Performance Evaluation

5. Future Scope

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI