1. Introduction
Car faults are an unfortunate reality for many drivers, with unexpected vehicle repairs being a common occurrence in the automotive industry. These faults can range from minor issues (e.g., a blown fuse, a flat tire) to more serious problems that can impact vehicle safety and performance. For instance, a faulty brake system or a malfunctioning engine can pose significant risks to drivers and passengers and can result in costly repairs. In fact, according to a report by the American Automobile Association, the average expenditure on unexpected vehicle repairs in the United States was between USD 500 and USD 600 per vehicle in 2017 [
1]. However, this figure can vary widely based on several factors, such as the age, and model of the vehicle, as well as driving habits and conditions.
Traditional car fault diagnosis methods have relied heavily on the expertise of mechanics or diagnostic tools, which may be available at service centers [
2,
3]. These methods often involve manual inspections, computerized scanning, or diagnostic tests to identify and address potential issues in vehicles, such as engine faults, front-end assembly faults, airflow faults, spark plug, and electrical faults. However, these traditional methods are often costly, time-consuming, and may not always provide accurate results [
4]. Recently, there has been a surge of interest in machine learning-based fault diagnosis systems [
5,
6,
7]. Leveraging advancements in artificial intelligence and data analytics, these systems aim to automate fault detection processes, enhance accuracy, and enable proactive maintenance strategies [
8,
9].
The general structure of machine learning-based fault diagnosis methods encompasses several stages, i.e., data acquisition, preprocessing, feature extraction, and classification [
10,
11]. Data acquisition involves gathering relevant information from vehicles, such as sensor readings and diagnostic codes, and recording relevant signals [
12]. Preprocessing focuses on cleaning and organizing the data to eliminate noise and inconsistencies [
13]. Feature extraction aims to identify key parameters or characteristics indicative of potential faults [
14]. Finally, classification algorithms are applied to categorize the data into different fault types or states [
15].
Two primary types of signals, which are vibration and sound signals, are employed in fault diagnosis [
16]. Vibration signals capture mechanical movements and dynamics within the vehicle and provide insights into structural integrity and component performance. On the other hand, sound signals reflect acoustic emissions associated with engine operation and component interactions and offer valuable clues about the health and functionality of various vehicle parts.
Several studies have explored fault diagnosis using vibration signals and demonstrate the versatility and effectiveness of this approach. For example, Jegadeeshwaran and Sugumaran [
17] presented a method of vibration-based continuous monitoring system and analysis using a machine learning approach. Their study focused on fault diagnosis in hydraulic braking systems by acquiring vibration signals from a piezoelectric transducer under both good and faulty brake conditions. They employed decision tree algorithms to identify the most relevant features among different faulty conditions and they achieved a classification accuracy of 97.45%. Similarly, Barbieri et al. [
18] aimed to identify damages and diagnose damaged components in automotive gearboxes by comparing vibration signals of damaged and undamaged systems. They employed various signal analysis techniques (e.g., wavelet transform and mathematic morphology) to verify damage presence and used a signal processing technique combining pattern spectrum and selective filtering for component failure identification. Jafarian et al. [
19] explored vibration analysis for fault detection in an internal combustion engine and focused on detecting faults related to poppet valve clearance and incomplete combustion. They utilized four accelerometers on the engine body, applied the Principal Component Analysis (PCA) technique for data analysis, and achieved high efficiency in fault classification and detection. These studies collectively highlight the wide applicability and effectiveness of vibration-based fault diagnosis in diverse automotive systems. A summary of studies employing vibration signals is given in
Table 1.
Although high success rates have been reported in vibration signals, it is hard to implement them in a real-word system. Therefore, a practical method is required to distinguish the faults. Sound signals offer distinct advantages in fault diagnosis due to their ease of recording using commonly available devices such as cell phones or microphones. Therefore, this makes sound analysis a practical and cost-effective approach to diagnosing vehicle faults. Studies conducted by various researchers further exemplify the potential of sound-based fault diagnosis in automotive systems. For instance, Madain et al. [
23] identified distinct sounds associated with specific engine malfunctions and developed an algorithm using sound techniques in diagnosis. They reported high error detection rates through analysis of engine sound samples collected from a laboratory environment. Similarly, Jian-Da Wu and Chiu-Hong Liu [
24] developed a fault diagnosis system for internal combustion engines based on the discrete wavelet transform technique applied to sound emission signals and showcased its effectiveness in fault recognition under diverse engine operating conditions. Another study that delved into the application of acoustic signal processing methods for assessing internal combustion engine technical conditions proposed new algorithms for automatic detection of valve clearance issues based on acoustic signal components [
25]. Additionally, Mofleh et al. [
26] conducted a study aimed at detecting faults in spark-ignition engines using acoustic signals and an Artificial Neural Network (ANN) system. It highlighted the high potential of ANN-based fault detection in internal combustion engines using acoustic signals, particularly in identifying simulated spark plug and misfire faults. These studies collectively underscore the practicality and efficacy of sound-based fault diagnosis methods in the automotive industry and offer valuable insights for developing reliable diagnostic systems. A summary of such studies employing sound signals is provided in
Table 2.
Despite recent advancements, current studies in vehicle fault diagnosis often encounter significant drawbacks. These limitations include the reliance on data recorded in controlled laboratory settings, which may not fully represent real-world vehicle conditions. Furthermore, many studies are constrained to specific car models or fault types, limiting the generalizability and applicability of their findings. There is a pressing need for a comprehensive dataset that encompasses diverse vehicle models, real-world vehicle conditions, and a wide range of fault scenarios to enhance the effectiveness and accuracy of fault diagnosis systems.
Our study directly addresses these challenges by leveraging a diverse and extensive dataset comprising real-life vehicle sounds. This dataset captures a wide array of vehicle conditions various vehicle models, and an extensive range of fault scenarios encountered in everyday driving. We employ advanced signal processing techniques, including Mel-Frequency Cepstral Coefficients (MFCCs), Wavelet Transform, and Relief-F methods, for robust feature extraction and feature selection, while using Extreme Learning Machines (ELM) for the classification.
The summary of our study’s approach and key contributions is as follows:
Instead of employing a laboratory-collected dataset, comprehensive data were collected from real-life vehicle conditions and diverse vehicle models.
In order to increase the success of the proposed approach, advanced signal processing techniques, which are MFCC, Wavelet Transform, and Relief-F, were employed.
A thorough frequency analysis was conducted in each fault type, and specific frequency components, which are associated with different types of faults, were identified.
This paper is structured as follows.
Section 2 gives details about utilized data collection methods and focuses on how sound signals were acquired from vehicular systems under various operating conditions.
Section 3 elaborates on employed methodology encompassing employed signal processing techniques and feature extraction methodologies that are used in vehicle fault diagnosis based on sound signals. In
Section 4, we present the obtained experimental results, which includes performance evaluations of the employed diagnostic system, comparisons with existing methods in the literature, and a detailed frequency analysis of each identified fault type. Furthermore, the implications of our findings and insights gained from the experimental outcomes are discussed. Finally,
Section 5 serves as the conclusion of this study and summarizes key contributions made in this research, and proposes directions for future research and development in the field of vehicle fault diagnosis by sound signal analysis.
2. Dataset
Audio signal recordings were collected from vehicles, which were serviced at official Ford or Toyota service centers and ensured a diverse range of cars from these reputable brands. A cellphone served as the recording device, which captured sounds, while the cars were stationary, and their engines were idling at ideal operating temperatures. As seen in
Figure 1, the cellphone was positioned 15 cm above the hood and centered, with the hood closed to mimic real-world conditions. Engine sounds were recorded for 30 s each, sampling at a frequency of 48 kHz to capture detailed acoustic information.
Professional mechanics diagnosed the cars as either healthy or with one of the following faults: spark plug issues, airflow irregularities, electrical malfunctions, engine/turbo problems, or front-end problems. The distribution of each diagnostic class is outlined in
Table 3.
Spark Plug Issue: Typically related to ignition problems, which result in misfires, rough idling, and decreased engine performance.
Airflow Irregularities: Pertaining to issues with the air intake system that affect engine combustion and efficiency.
Electrical Malfunctions: Encompassing faults within the vehicle’s electrical system that directly impact engine performance. This may include issues with sensors, wiring, or other electrical components that affect engine operation and efficiency.
Engine/Turbo Problems: Referring to issues within the engine or turbocharger system that impact power delivery and overall engine performance.
Front-End Problems: Including issues with steering, suspension, or other components affecting the vehicle’s front-end operation.
The collected dataset comprises audio signals, which were collected from a wide range of gasoline-engine vehicles such as Ford Focus (2014–2021), Ford Kuga (2020–2021), Ford Ecosport (2021), Ford Mondeo (2016), Toyota Corolla (2015), and Toyota Auris (2010). It was aimed to ensure diversity in vehicle models to capture a comprehensive range of engine sounds and fault types.
In addition, an example signal for each vehicle diagnostic class is provided in
Figure 2 in order to demonstrate the characteristics of the recorded audio signals for each class.
4. Results and Discussion
In this section, the vehicle fault diagnosis framework was presented and discussed. The performance metrics and analysis are provided for models utilizing MFCC-based features and DWT-based features. Additionally, a detailed frequency analysis to identify the most relevant frequency components for each fault type, was given. Obtained results are also compared with existing studies to highlight the effectiveness and improvements achieved by the employed methodology.
4.1. Results for MFCC-Based Features
The performance of the vehicle fault diagnosis model using MFCC-based features is evaluated and summarized in
Table 5. Various configurations of MFCC parameters, including the number of coefficients and window length, were tested to identify the optimal settings. The table also presents the best hyperparameters found through Grid Search for the ELM classifier, such as the activation function and number of neurons, alongside the resulting precision, recall, F1-score, and accuracy metrics. A 50% window overlap was used in all experiments to enhance feature extraction.
The results, which are given in
Table 5, indicate that using 20 MFCC coefficients with a window length of 0.03 s and a
sine activation function yielded the highest performance with a precision of 92.24%, recall of 92.22%, F1-score of 92.10%, and accuracy of 92.14%. This demonstrates that the choice of MFCC parameters and ELM hyperparameters significantly impacts the classification performance.
From the table, it can be observed that the number of MFCC coefficients and window length significantly affect the performance of the proposed method up to a certain point. Specifically, increasing the number of MFCC coefficients from 5 to 20 generally leads to an improvement in precision, recall, F1-score, and accuracy. However, further increasing the number of coefficients to 30 does not seem to result in any significant improvement. Similarly, increasing the window length from 0.02 to 0.03 or 0.04 generally leads to an improvement in performance, with the best results achieved at a window length of 0.03 for most MFCC parameter configurations. However, the choice of activation function and the number of neurons in the ELM classifier also seem to play a role in achieving the best performance.
The confusion matrix for the model with 20 MFCC coefficients and a window length of 0.02 s, shown in
Figure 5, provides additional insights into the model’s performance. Each cell in the matrix represents the number of instances for which the true class is represented by the row and the predicted class is represented by the column.
Overall, the confusion matrix confirms that the model performs well across most fault types, with particularly high accuracy for airflow, front-end, and spark plug faults. Specifically, the model achieves the highest performance in detecting airflow faults with an accuracy of 98.1%, correctly identifying 51 out of 52 cases. However, the model shows the lowest performance in identifying healthy cases, with an accuracy of 80.0%, correctly predicting 40 out of 50 instances. This suggests that healthy cases are more prone to being misclassified as faults, indicating an area for further refinement.
The results of the experiments using MFCC-based features demonstrate the effectiveness of the proposed method for fault diagnosis in vehicles. The method can accurately diagnose different types of faults using the extracted MFCC features and the ELM classifier. The results also provide insights into the optimal combination of MFCC parameters and ELM hyperparameters, which can be used to improve the performance of the method in future studies. These findings demonstrate the model’s effectiveness in vehicle fault diagnosis and highlight areas for further optimization.
4.2. Results for DWT-Based Features
The performance of the vehicle fault diagnosis model using DWT-based features is evaluated and summarized in
Table 6. Various configurations of DWT parameters, including the decomposition level and widely used wavelets such as db4, db8, db20, sym3, and sym8, were tested to identify the optimal settings [
24,
40]. The table also presents the best hyperparameters found through Grid Search for the ELM classifier, such as the activation function and number of neurons, alongside the resulting precision, recall, F1-score, and accuracy metrics.
From the table, it is clear that different combinations of DWT parameters and ELM hyperparameters result in varying levels of performance. The highest accuracy was achieved using a decomposition level of 3 and a sym8 wavelet with a sine activation function, yielding a precision of 84.17%, recall of 83.60%, F1-score of 83.72%, and accuracy of 83.93%. This suggests that the choice of wavelet and decomposition level significantly influences the classification performance.
The confusion matrix in
Figure 6 provides further insights into the classification performance for the best configuration (decomposition level of 3 and sym8 wavelet). The model achieves high accuracy for electrical faults (93.5%) and front-end faults (91.7%), indicating strong performance in these categories. However, the model shows lower accuracy for spark plug faults (65.0%), suggesting that distinguishing spark plug faults from other fault types remains a challenge. Additionally, healthy cases are identified with an accuracy of 74.0%, indicating some misclassification into fault categories, which highlights an area for further improvement.
Overall, the results of the experiments using DWT-based features demonstrate the effectiveness of the proposed method for fault diagnosis in vehicles. While the method shows strong performance in certain fault categories, it achieved lower performance overall compared to MFCC-based features. These findings provide insights into the optimal combination of DWT parameters and ELM hyperparameters and highlight areas for further optimization to enhance the performance of the method in future studies.
4.3. Results for Frequency Analysis
In this section, the results of the frequency analysis, which was conducted to identify the most relevant frequency components for each fault type, are presented. Different types of engine faults often exhibit distinct frequency components. These components emerge due to the engine’s physical structure and operational principles, with each fault generating unique sounds or vibrations at specific frequency ranges. Therefore, examining these frequency components is crucial for accurate fault diagnosis using sound analysis.
Table 7 comprehensively outlines the relationship between the fault categories and their associated frequency groups.
For spark plug issues, the most relevant frequency bin is 5.2 to 5.3 kHz, indicating that monitoring this high-frequency range is crucial for accurate detection. Airflow irregularities are most prominently indicated by the 3.0 to 3.1 kHz bin, highlighting the need for precise analysis in this low-frequency range. Engine/turbo problems are best identified by the 0.7 to 0.8 kHz bin, suggesting these faults manifest through specific low-frequency sounds. Front-end problems are primarily associated with the 6.6 to 6.7 kHz bin, essential for identifying issues related to components such as the suspension or chassis. Electrical malfunctions affecting the engine show the highest relevance at the 2.1 to 2.2 kHz bin, possibly due to distinctive noise patterns. These findings underscore the importance of focusing on these key frequencies for accurate fault detection.
Overall, the frequency analysis confirms that different fault types are associated with specific and most relevant frequency bins. The identification of these relevant frequency bins provides valuable guidance for future diagnostic efforts, suggesting that including these specific frequencies in the analysis can lead to better and more reliable fault detection outcomes. This insight into the frequency components of various faults enhances our understanding and ability to diagnose vehicle issues more accurately and efficiently.
4.4. Comparison with Other Studies
Numerous publications have addressed the diagnosis of vehicle malfunctions through engine sound analysis, employing a diverse array of methods. For instance, Navea and Sybingco [
27] used Fourier transform and power spectral density to detect engine starting issues, drive belt problems, and valve-related faults, achieving a detection accuracy of 56% using recordings from the 1996–2000 Honda Civic model. Siegel et al. [
28] focused on misfire faults in four vehicles, using Fourier transform, wavelet transform, and MFCC with SVM, resulting in a 99% accuracy rate. Wang et al. [
5] recorded audio from a Santana 2000’s engine cylinder head, using Hilbert–Huang transform and SVMs, achieving up to 90% accuracy. Kemalkar and Bairagi [
41] studied lubrication, chain, crank, and valve faults in Honda Unicorn and Bajaj Pulsar motorcycles, employing MFCC and achieving accuracy rates between 50% and 75%. The reported results in the literature are summarized in
Table 8.
A significant drawback of previous works is their reliance on data from controlled laboratory settings, often using the same brand of vehicles or a limited number of fault types. These controlled conditions do not adequately capture the variability and complexity of real-world scenarios, leading to models that may not perform well outside the specific conditions under which they were trained. Furthermore, the homogeneity of vehicle models in these studies limits the generalizability of their findings, as the diagnostic methods may not be applicable to different vehicle makes and models.
This dataset includes various types of faults and different environmental settings, thereby enhancing the robustness and generalizability of our fault diagnosis framework. By utilizing data from multiple vehicle brands and real-world conditions, the employed approach is designed to reflect the true complexity and variability of vehicle faults.