Next Article in Journal
Analysis of Electromagnetic Vibration in Permanent Magnet Motors Based on Random PWM Technology
Next Article in Special Issue
EnsembleXAI-Motor: A Lightweight Framework for Fault Classification in Electric Vehicle Drive Motors Using Feature Selection, Ensemble Learning, and Explainable AI
Previous Article in Journal
Estimation of Vibration-Induced Fatigue Damage in a Tracked Vehicle Suspension Arm at Critical Locations Under Real-Time Random Excitations
Previous Article in Special Issue
Surface Classification from Robot Internal Measurement Unit Time-Series Data Using Cascaded and Parallel Deep Learning Fusion Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Toward Inclusive Smart Cities: Sound-Based Vehicle Diagnostics, Emergency Signal Recognition, and Beyond

1
Department of Communications and Electronics Engineering, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
2
School of Computational Sciences and Artificial Intelligence (CSAI), Zewail City of Science and Technology, Giza 12578, Egypt
3
Department of Electrical Engineering, College of Engineering, Taif University, P.O. Box 11099, Taif 21944, Saudi Arabia
4
King Salman Center for Disability Research, Riyadh 11614, Saudi Arabia
5
Computer Science and Information Department, Applied College, Taibah University, Medinah 41461, Saudi Arabia
6
Department of Computer Science, College of Computer Science and Engineering, Taibah University, Yanbu 46421, Saudi Arabia
7
Computers and Control Systems Engineering Department, Faculty of Engineering, Mansoura University, Mansoura 35516, Egypt
*
Author to whom correspondence should be addressed.
Machines 2025, 13(4), 258; https://doi.org/10.3390/machines13040258
Submission received: 17 February 2025 / Revised: 15 March 2025 / Accepted: 19 March 2025 / Published: 21 March 2025
(This article belongs to the Special Issue Recent Developments in Machine Design, Automation and Robotics)

Abstract

:
Sound-based early fault detection for vehicles is a critical yet underexplored area, particularly within Intelligent Transportation Systems (ITSs) for smart cities. Despite the clear necessity for sound-based diagnostic systems, the scarcity of specialized publicly available datasets presents a major challenge. This study addresses this gap by contributing in multiple dimensions. Firstly, it emphasizes the significance of sound-based diagnostics for real-time detection of faults through analyzing sounds directly generated by vehicles, such as engine or brake noises, and the classification of external emergency sounds, like sirens, relevant to vehicle safety. Secondly, this paper introduces a novel dataset encompassing vehicle fault sounds, emergency sirens, and environmental noises specifically curated to address the absence of such specialized datasets. A comprehensive framework is proposed, combining audio preprocessing, feature extraction (via Mel Spectrograms, MFCCs, and Chromatograms), and classification using 11 models. Evaluations using both compact (52 features) and expanded (126 features) representations show that several classes (e.g., Engine Misfire, Fuel Pump Cartridge Fault, Radiator Fan Failure) achieve near-perfect accuracy, though acoustically similar classes like Universal Joint Failure, Knocking, and Pre-ignition Problem remain challenging. Logistic Regression yielded the highest accuracy of 86.5% for the vehicle fault dataset (DB1) using compact features, while neural networks performed best for datasets DB2 and DB3, achieving 88.4% and 85.5%, respectively. In the second scenario, a Bayesian-Optimized Weighted Soft Voting with Feature Selection (BOWSVFS) approach is proposed, significantly enhancing accuracy to 91.04% for DB1, 88.85% for DB2, and 86.85% for DB3. These results highlight the effectiveness of the proposed methods in addressing key ITS limitations and enhancing accessibility for individuals with disabilities through auditory-based vehicle diagnostics and emergency recognition systems.

1. Introduction

Intelligent Transportation Systems (ITSs) play a crucial role in developing smart cities through advanced technologies to enhance the efficiency and sustainability of transportation networks [1,2]. These systems integrate data from sensors, cameras, and GPS devices that provide real-time information on traffic flow, weather conditions, and other relevant factors. ITSs can also use this to adjust the traffic signals dynamically, manage toll road usage, and give drivers personalized route recommendations to reduce congestion and lower travel time. Additionally, ITSs can support autonomous vehicles and shared mobility services, further improving overall urban transportation system performance.
In smart cities, ITSs can also contribute to air quality and greenhouse gas emission reduction by making public transportation, cycling, and walking real options instead of private car travel [3]. This can be achieved through incentives for using sustainable modes of transportation by implementing smart parking systems and congestion pricing schemes, thus decreasing the overall demand for fossil fuel-powered vehicles. This would further allow easy intermodal connections, making urban mobility and the general urban environment even more accessible, sustainable, and inclusive.
The role of vision in systems has become an integral part of modern infrastructure, especially in ITSs, for furthering both safety and efficiency. However, these systems have a significant limitation: they cannot “hear” necessary auditory signals such as emergency sirens, mechanical faults, and environmental hazards [4]. There are various limitations of vision-based systems, such as the inability to detect auditory signals [5], environmental noise interference [6], mechanical fault detection [7]. Other than emergency signals and mechanical faults, environmental hazards such as construction noise, falling debris, or wildlife may create serious dangers. In this regard, the vision-based systems cannot detect such hazards until they become visually apparent, which may be too late to act effectively. Thus, sound-based diagnostics can provide early warnings against such dangers and enhance safety. Sound-based diagnostics can improve the reliability and efficiency of existing systems manifoldly. There are several reasons that sound-based diagnosis is highly respected, including enhanced situational awareness [8,9], real-time monitoring and alerts [10], cost-effectiveness [11], accessibility for hearing-impaired people [12], and improved emergency response [12].
ITSs are designed to improve the functionality and safety of transport systems using advanced technologies [13]. Whereas these systems are designed to streamline the process, people with disabilities face unique barriers to their movement and access to essential services in such contexts. These include, but are not limited to, information inaccessibility, lack of warnings in emergencies, navigation barriers, inadequate access to vehicles, and communication barriers. Sound-based systems are essential in offering complementarities and ensuring increased access within ITS environments based on alternative means of communication and information sharing. Some of the possible solutions include the following: audio cues for navigation, visual alerts for the hearing-impaired, sound detection for vehicle fault detection, emergency sirens and alerts, and improved communication systems [14]. ITS application covers a wide range of fields in implementing sound detection. For example, this will help in emergency response for quick and timely mitigation of hazards. In the field of public transportation, increased safety and efficiency for passengers may also be achieved. Smart cars use it to enhance driver awareness and the vehicle’s behavior.
Recent advances in artificial intelligence (AI) have paved the way for sophisticated data analysis techniques that drive innovative applications across various fields, including sound-based diagnostics. Within this AI framework, machine learning (ML)—defined as using algorithms and statistical models that enable computer systems to learn from data and improve their performance on specific tasks without explicit programming—has emerged as a key enabler. By leveraging ML, our study can analyze complex auditory signals from vehicle faults and environmental sounds, transforming raw data into actionable insights for Intelligent Transportation Systems (ITSs).
We begin our approach with foundational ML models that provide robust baseline performance. Techniques such as Logistic Regression (LR) and k-nearest neighbors (kNN) are utilized to establish initial classification capabilities, forming the groundwork for further enhancements. These basic models are critical for understanding the underlying patterns in the audio data and serve as benchmarks against which more sophisticated methods can be compared.
The study introduces three novel datasets that capture various sounds relevant to Intelligent Transportation Systems. The first dataset (DB1) consists of 27 distinct vehicle fault classes featuring critical sounds such as Engine Misfire, Fuel Pump Cartridge Fault, Radiator Fan Failure, and Strut Mount Failure, all directly generated by the vehicle. The second dataset (DB2) comprises 22 environmental sound classes, including emergency signals like sirens and various transportation-related and ambient environmental noises. These datasets provide a rich collection of auditory signals that form the basis for robust sound-based diagnostic systems. The third dataset (DB3) also merges DB1 and DB2 to create a comprehensive collection of 49 classes. This unified dataset enables the framework to classify any sound from vehicle faults or external environmental events into the correct category. The study lays the groundwork for advancing machine learning research in sound-based diagnostics by addressing the scarcity of specialized, publicly available auditory datasets. It contributes to the development of more inclusive and responsive ITS applications.
Building on these fundamentals, our framework incorporates advanced variants of ML to address the challenges of differentiating acoustic similar classes. Ensemble methods such as AdaBoost, Random Forest (RF), and Gradient Boosting (GB) enhance accuracy by combining the strengths of multiple weak learners. Additionally, Support Vector Machines (SVM) and Stochastic Gradient Descent (SGD) optimize decision boundaries in complex feature spaces, while Decision Trees (DTs) provide interpretable classification logic. Complementing these are the CN2 algorithm and Naive Bayes (NB), which handle complex rule-based classification and probabilistic inference. Together, these diverse ML techniques form a comprehensive diagnostic system capable of robust performance in real-world ITS applications.
Integrating auditory intelligence into ITSs addresses several key research challenges, notably the difficulty of distinguishing acoustically similar classes—such as Universal Joint Failure versus Bad CV Joint and Knocking versus Pre-ignition Problem. These challenges demand advanced feature extraction techniques and robust machine learning models that capture subtle differences in sound signatures. Additionally, the scarcity of specialized, publicly available auditory datasets has historically hindered progress in this area; this research overcomes that barrier by introducing a comprehensive, curated dataset that serves as a benchmark for future work.
This study addresses a critical gap in Intelligent Transportation Systems (ITSs) by explicitly defining its aim to detect faults directly from sounds generated by vehicles, such as engine or brake noises, and to classify external alert sounds, including emergency sirens. The intended applications of these predictive outputs are articulated, emphasizing their role in real-time diagnostics for smart vehicle systems and providing auditory-to-visual alert conversions to assist sound-impaired drivers. Additionally, the study highlights the potential of auditory capabilities to enhance vehicle fault detection and accessibility for individuals with disabilities while addressing the scarcity of specialized datasets in this domain. This is achieved through the following key contributions:
  • Introducing a novel dataset comprising vehicle fault sounds, emergency sirens, and environmental noises filling a critical gap in publicly available resources.
  • Developing a comprehensive methodology for audio preprocessing, including normalization, resampling, and segmentation.
  • Proposing robust feature extraction techniques, such as Mel Spectrograms, MFCCs, and Chromatograms, enabling compact and expanded feature representations.
  • Evaluating multiple ML models in the first scenario, including neural networks, Logistic Regression, and Random Forests.
  • Proposing a Bayesian-Optimized Weighted Soft Voting with Feature Selection (BOWSVFS) approach in the second scenario, achieving a classification accuracy of 91.04% on the car fault dataset (DB1) and outperforming the first scenario’s results.
  • Demonstrating the relevance of sound-based ITSs in promoting accessibility by offering real-time alerts and auditory-to-visual conversion solutions for individuals with disabilities.
  • Aligning sound-based diagnostics with broader smart city goals, contributing to the development of safer and more inclusive transportation systems.
This research establishes a strong foundation for integrating auditory intelligence into ITSs, with significant implications for safety, accessibility, and inclusivity in smart cities. The framework demonstrates strong performance overall, with several fault classes being recognized with near-perfect accuracy. For example, classes such as Engine Misfire, Fuel Pump Cartridge Fault, Radiator Fan Failure, Strut Mount Failure, and Suspension Arm Fault consistently achieve 100% accuracy in many cases. This indicates that the framework effectively captures the distinct acoustic signatures associated with these faults. However, challenges remain for acoustically similar classes. Universal Joint Failure, for instance, is occasionally misclassified—often confused with Bad CV Joint—while Bad Wheel Bearing also shows minor misclassifications. More notably, the Knocking and Pre-ignition Problem classes face significant difficulties, with Pre-ignition Problem instances frequently being predicted as Engine Misfire. These misclassifications highlight the areas where further refinement in feature extraction or model tuning may be necessary to better differentiate between closely related acoustic patterns.
The structure of this paper is as follows: A summary of the current literature is given in Section 2, along with potential directions for further research. Section 3 introduces the datasets. Section 4 focuses on materials and explains the proposed methodology. Section 5 provides an overview of the experiments, including the experimental setup, methods, and findings collected, focusing on the performance metrics attained. The overall discussion in Section 6 wraps up with conclusions and future work, summarizing the paper’s key contributions and suggesting directions for subsequent research in this domain.

2. Literature Review

This section discusses earlier attempts at sound recognition and sound-based defect detection in machinery, vehicles, trains, and aircraft systems. The majority of the methods assessed were developed using ML methods. Some, meanwhile, are more recent and rely on deep learning or vision transformers.
Nasim et al. introduced a sound-based early fault detection system for vehicles utilizing ML technology [15]. This system is specifically designed to target the faults in vehicles at their initial stages by analyzing the sound emitted by the vehicle. The system starts working by binary classification, which can decide whether the vehicle is faulty or healthy. They utilized time domain, frequency domain, and time–frequency domain features to detect normal and abnormal vehicle conditions effectively. Additionally, they employed abnormal vehicle data to classify them into fifteen other typical vehicle issues. Through experimentation, the random forest algorithm yielded the best accuracy of 97% for fault detection and 92% for problem classification when utilizing time–frequency features. Hamad et al. proposed a rule-based ML technique that automatically detects engine problems [16]. The generalizability of the system is considered by time domain, frequency domain, and time–frequency domain features. The robustness of the developed system is evaluated using noisy sound data collected under various normal and abnormal conditions. The experimental results demonstrated that the approach outperformed other techniques by 2.6−6.0% and yielded the highest performance accuracy of 98.6%. Yildirim et al. proposed a testing and evaluation procedure on the sound quality of two types of cars [17]. The sound quality is analyzed through the car’s road running test on the provided ground with varying running speeds. They proposed a neural network predictor to model the system for possible experimental applications. In their experiments, only objective factors of loudness, sharpness, speech intelligibility, and sound pressure level are considered essential for sound quality. The computer simulations and experiments show evidence that the neural predictor algorithm provides reasonable accommodation in different cases and allows superior prediction in two-car sound analysis.
Mel-Frequency Cepstral Coefficients (MFCCs), DWT-based features, and the Extreme Learning Machine (ELM) classifier were employed in the vehicle problem diagnostic system that Akbalik et al. presented [18]. The proposed framework uses a big, diversified dataset that includes many vehicle models and real-world operating situations. The experiment results show that the MFCC-based features combined with the ELM classifier outperform the others in terms of accuracy, precision, recall, F1-score, macro F1-score, and weighted F1-score, which are 92.17%, 92.24%, 92.22%, 92.10%, and 92.06%, respectively. Murovec et al. created an acquisition system using the Zero-Crossing Signature (ZCS) technique [19]. To accomplish precise engine type classification, the study used a unique level-crossing (ZCS) feature that demonstrated excellent performance in differentiating engine sounds from surrounding noise. A dataset of 417 vehicle recordings was examined, and the classification performance of the ZCS was compared to the traditional Zero-Crossing (ZC) technique utilizing a Self-Organizing Map (SOM) with a 1D grid of nine neurons. Wang et al. proposed a method for diagnosing engine acoustic signal faults using multi-level supervised learning and time-frequency transformation [20]. First, it decomposes the fault diagnostic problem into feature augmentation, fault detection, and identification. Second, based on several time–frequency studies, it proposes an adaptive fault feature band extraction approach aimed at distinct features from different vehicle data. Finally, a frequency band attention module was designed to focus on the most meaningful frequency range to the characteristics of engine failure.
Boztas et al. proposed a learning model for improving machine fault classification using handcrafted attributes [21]. The approach utilized texture and statistical features in classifying faults with high performance. They developed a hybrid and multilevel feature extraction technique that maintains high efficiency while lowering the complexity associated with deep learning frameworks. Using a Chi2 feature selector to eliminate redundant features, the model focused on the most informative features throughout the classification step. In the MIMII (noisy) dataset, the proposed model effectively classified more than 90% of the five cases. A Variational Autoencoder/Convolutional Neural Network (VAE-CNN) was created by Wang et al. to diagnose rolling bearing faults [22]. The model was developed to extract complex vibration signal features to detect and categorize faults. While the CNN component increases the expressiveness of signal data and successfully handles issues like gradient vanishing and explosion, the VAE component improves noise robustness. The diagnostic accuracy of the VAE-CNN model for various fault types at varying rotational speeds typically exceeded 90%, yielding generally satisfactory diagnostic results. Xinwen Guo developed a defect diagnostic approach based on feature extraction and a word bag model using acoustics and vibration engineering science theories [23]. This approach mainly expands the three-layer structure of the word bag model and constructs codebooks for each layer’s feature vectors based on this model. Thereafter, it develops the failure detection system of a rolling bearing based on the adaptive extended word bag model. The findings revealed that the defect detection technique has excellent diagnostic accuracy and stability, offering dependable technical assistance for regular operation and safe mechanical equipment maintenance.
Li et al. developed a defect diagnosis system for railway turnout switch machines based on sound signals [24]. The method used Eigenmode Decomposition to improve the sound signal, reduce noise, and extract important statistical information from the time and frequency domains. The ReliefF algorithm is used for feature selection, dimension reduction, and fault classification with weighted parameters to address redundant information in high-dimensional features. The selected feature parameters are then utilized to train the Support Vector Machine. The results showed a defect diagnostic accuracy of 98% in the positioning work mode and 95.67% in the reversing work mode. Kreuzera et al. proved that diagnosing bearing defects in railway vehicles using aerial sound data is possible, even in complex real-world settings [25]. To that purpose, many characteristics are investigated, including Mel Frequency Cepstral Coefficients (MFCCs), which are best suited for diagnosing bearing problems by analyzing airborne sound. The MFCCs were utilized to train an MLP classifier. The suggested technique is assessed using real-world data from a cutting-edge commuter train car in a dedicated measurement effort. The classification results showed that the chosen MFCC features allowed for the reliable detection of bearing defects, including those not included in the training. Eunsun Yun and Minjoong Jeong proposed a feature extraction technique for fault sound identification in EPS motors [26]. This technique reduced the feature dimensionality while preserving the original raw waveform, which is crucial to maintaining the essential features in the waveform for anomaly detection. They combined DFMT with MFCC to optimize feature extraction. They applied LSTM-AE to classify data by segregating standard data from abnormal ones using reconstruction error metrics. The experimental results of the proposed method were proved efficient with an accuracy of 99.2%, recall of 94.0%, precision of 95.6%, and F1-score of 94.7%.
A sound-based engine classification was proposed by Shajie et al. to detect flaws in engine ball bearings [27]. They used sound-based component extraction techniques to find reoccurring patterns across time. They proposed modifications to the ResNet and hybrid CNN models based on the NASA-bearing dataset. To identify TIM-bearing faults, they employed time and frequency features that may be inferred from the signals and their spectra. The experiments considered realistic scenarios found in real-world industrial settings. They gained insights into the method’s performance with reasonable accuracy rates. To improve industrial productivity and minimize machinery downtime, Khan et al. developed a technique for diagnosing robotic manipulator faults using motor sound analysis [28]. They investigate the efficiency of deep learning and conventional ML in detecting motor abnormalities using a dataset created with a specifically designed robotic manipulator. It obtained an F1-score of over 92%, outperforming the traditional methods significantly, hence proving the potential of sound analysis for automatic defect identification in robotic systems using the proposed custom CNN and 1D-CNN models. Kim et al. proposed a deep denoising autoencoder method to filter out various industrial noise levels from audio data [29]. They applied unsupervised learning models for rapid and accurate anomaly detection. They preprocessed audio data to adapt the denoising technique to the noise levels of different industrial contexts. Several experiments using different industrial equipment types demonstrated the proposed technique’s effectiveness, efficiency, and processing speed. Senanayaka et al. diagnosed machinery defects by isolating audio sources from complex mixtures of sound waves [30]. First, they activated fault sound isolation and separated distinct fault noises from a complicated blend of sound signals. Then, the isolated fault noises were passed through a 1D-CNN classifier to ensure correct classification. A machine fault simulator by Spectra Quest equipped with a condenser mic was employed to evaluate the proposed model. To improve early vehicle defect recognition, Hameed et al. investigated the application of ML for real-time engine knocking detection [31]. They analyzed several machine-learning techniques and retrieved frequency modulation amplitude demodulation features from engine sound data. With a classification accuracy of 66.01%, the coarse decision tree approach proved the most successful. The accuracy was then increased by employing deep learning models; a deep learning recurrent neural network (RNN) model in LSTM attained 90% accuracy.
Naryanto et al. developed a deep learning model to detect and classify damage or defects in diesel engines using artificial neural networks and convolutional neural networks [32]. They utilized the DEFault dataset, which has 3500 rows of data organized under four distinct labels. Results showed that ANN outperformed CNN for noisier datasets, but it outperformed for less noisy datasets. Yuan et al. proposed a defect detection approach for new energy vehicle engines using wavelet transforms and Support Vector Machines [33]. First, an abnormal noise signal identification model for vehicle engine faults is developed, and the time–frequency parameters of the basis function are adaptively changed. The engine surface radiation noise is then split into the inner mechanical and battery excitation components. The new energy vehicle engine failure signal was decomposed using feature decomposition and multiscale separation. Furthermore, fuzzy clustering and time–frequency analysis of fault signals in the fractional Fourier domain were used to detect faults in new energy vehicle engines. Chu et al. proposed an intelligent identification model for diesel engine faults based on mixed attention [34]. They proposed a multi-cylinder whole-machine fault diagnosis model that integrates 1D-CNN with self- and mutual attention mechanisms. Single-cylinder sensor data were integrated using self-attention in the model, and signal features of each cylinder were fused using the mutual attention mechanism. Simultaneously considering the mechanism knowledge of cylinder structural consistency and signal time delay similarity, this approach utilized single-cylinder fault data to develop a comprehensive fault recognition model for all cylinders. The average diagnosis accuracy reached 100% in known fault data and about 96.65% in unknown fault data.
Lee et al. proposed a bearing failure detection using an LSTM autoencoder with self-attention based on graph convolution networks [35]. Accordingly, they trained their model using data from the Fault Simulator Testbed and the Case Western Reserve University dataset. Results demonstrated that the proposed model attained an accuracy of 97.3% and 99.9%,, respectively, in the CWRU dataset and Fault Simulator Dataset. Using a single microphone and a data-driven approach, Spadini et al. developed a model for intelligent fault diagnosis in rotating equipment that successfully identified 42 classes of defect kinds and severities [36]. They considered reliable data from the unbalanced MaFaulDa dataset to balance high performance and minimal resource consumption. The model achieved remarkable performance in terms of the analysis by time, frequency, mel-frequency, and statistical parameters with an accuracy of 99.54% and F-Beta of 99.52%. Using sound samples, Gantert et al. proposed a multiclass method for identifying anomalous samples in industrial machinery [37]. Integrating binary models commonly found in the literature aims to improve the model’s generality while decreasing the number of classifiers. Using MIMII and ToyADMOS, two industrial sound datasets, they compared the proposed multiclass models with the binary alternative. Experiments revealed that 98% of the Toy-ADMOS dataset and 93% of the MIMII dataset were correctly classified. Table 1 summarizes the papers mentioned in this study highlighting the main characteristics and problems of each one.
Research gap: While Intelligent Transportation Systems have advanced significantly through vision-based technologies, a critical gap exists in integrating sound-based fault detection mechanisms. This gap is particularly evident in three areas: (1) the limited development of audio-based diagnostic systems that utilize real-time analysis of vehicle-generated sounds (e.g., engine or brake noises) and external emergency alert sounds (e.g., sirens), (2) the scarcity of comprehensive public datasets designed explicitly for vehicle sound analysis, and (3) insufficient attention to accessibility needs for individuals with disabilities within ITS frameworks. These limitations hinder the development of more inclusive and comprehensive transportation monitoring systems. To address these limitations, this study aims to develop a comprehensive dataset that serves as a “conscious ear” for intelligent systems in modern cities, vehicles, and transportation networks. This effort seeks to enhance the auditory capabilities of smart systems, enabling them to respond effectively to complex auditory scenarios, thereby enhancing safety and functionality in various applications.

3. Dataset Creation

The main problem with transportation sound-based fault diagnosis is the availability of datasets. Therefore, data from various sources are collected to create a tailored dataset for car sound analysis. Reliable audio samples are built by downloading videos from YouTube related to car faults, animal sounds, car crashes, siren sounds, etc. After this step, the model splits videos into segments and extracts those sections that may contain the target audio. Then, it converts the files into audio format. To expand the dataset, additional audio is supplemented from Kaggle datasets: FSC22 [38], Google AudioSet [39], Audio Classifier Dataset [40], Sound Classification of Animal Voice [41], and Vehicle Sounds Dataset [42]. Lastly, the model ensures that every sample is labeled and verified.
In this vein, the dataset was created and reviewed using a combination of publicly available datasets and real-world recordings, covering a wide range of vehicle faults, crashes, emergency sirens (police, ambulance, fire truck), wild animal sounds, car and truck horns, and other environmental road sounds. This approach ensures a diverse and realistic dataset that enhances model performance in detecting road-related events. The dataset creation procedure involves different stages as shown in Figure 1.

3.1. Data Collection and Annotation

We carefully selected publicly available datasets that include real traffic scenarios and vehicle fault cases. Additionally, we extracted relevant frames and sequences from YouTube videos, ensuring a diverse representation of traffic conditions and vehicle behaviors. Each data sample was manually labeled based on predefined criteria, focusing on vehicle states, traffic interactions, and specific fault conditions.

3.2. Expert Review and Validation

To enhance reliability, domain experts with extensive experience in automotive engineering and machine learning reviewed the dataset. The experts cross-checked and validated the labels to ensure accuracy and consistency with real-world vehicle behaviors.

3.3. Publicly Available Datasets Referenced

We utilized multiple datasets containing wild animal sounds, vehicle faults, and environmental noises to build a comprehensive dataset. The key datasets referenced include:
  • FSC22 Dataset: A collection focused on various sound categories, including vehicle sounds and environmental noises, useful for sound classification models [38].
  • Google AudioSet: A large-scale collection of audio data across thousands of categories, aimed at improving sound classification models [39].
  • Audio Data: A dataset containing diverse audio clips across various categories, useful for developing classification models [40].
  • Sound Classification of Animal Voice: A dataset containing sounds from different animals, useful for animal sound classification tasks [41].
  • DCASE 2024 Challenge: A dataset designed for the DCASE 2024 challenge, covering environmental sound classification tasks [43].
  • UrbanSound8K Dataset: Contains 8732 labeled sound excerpts from urban environments, categorized into 10 classes such as car horns and sirens [44].
  • AudioSet by Google Research: A vast dataset with over 2 million human-labeled 10 s sound clips spanning thousands of audio categories [45].
  • Vehicle Sounds Dataset: Contains various vehicle sounds useful for training models focused on transportation-related sound classification [42].
This dataset has been meticulously designed and validated to provide a diverse and realistic representation of road-related sounds, ensuring high-quality training data for machine learning models in the domain of automotive fault detection, traffic event classification, and environmental sound analysis. To standardize the process, files should have the same duration; however, this is not the case. Preprocessing is performed as a solution to ensure samples of the same duration. Algorithm 1 contains the pseudo-code for this stage. The primary steps in the algorithm are:
  • Repeat the audio until it achieves the required duration.
  • Normalize the audio to standardize levels.
  • Resample the audio to a consistent sampling rate (e.g., 16 kHz or 44.1 kHz).
  • Segment lengthy recordings into shorter clips (e.g., 2–5 s each).
Algorithm 1: Pseudo-code for Audio Preprocessing Script
Machines 13 00258 i001

4. Methodology

This study aims to develop an advanced sound-based early diagnosis system to support Intelligent Transportation Systems (ITS) by enabling real-time detection of vehicle faults and identification of emergency sounds. The main steps of this system are illustrated in Figure 2. To achieve this objective, addressing the primary challenges encountered in this field is essential, starting with the absence of comprehensive public datasets specifically designed for vehicle sound analysis. The initial phase of the proposed model involves the creation of a dataset that contains recordings of car fault sounds, emergency sirens, and ambient noises. This process includes audio data collection and preprocessing. Subsequently, the most significant features are extracted in two versions: a compact version with 52 features and an expanded one with 126 features.
In the final step, both sets of extracted features are classified using 11 distinct ML models. Another phase of optimization is provided by the system to enhance the accuracy of classification. It utilizes the best ML models with the highest-ranked features to build an ensemble optimization model. The following subsections provide further details on each stage of this model.
The key steps in our audio preprocessing pipeline to provide further insight into how the system operates are as follows:
  • Fixed Time Windows for Feature Extraction:
    • We preprocess the raw audio files by extending them to a minimum duration of 10 s (MIN_DURATION_MS = 10,000 ms), normalizing their levels, and resampling them to a target sample rate of 16 kHz (TARGET_SAMPLE_RATE = 16,000 Hz).
    • The preprocessed audio is then segmented into fixed-length clips of 2.5 s (SEGMENT_DURATION_MS = 2500 ms).
    • Only segments meeting the required length are retained for further processing.
  • Time Window Length Determination:
    • The nature of the vehicle fault guided the choice of segment duration sounds, which are typically periodic and repetitive over short durations.
    • To ensure consistency across all samples, we set a minimum duration of 10 s for all audio files. If an audio file is shorter than this threshold, it is repeated and trimmed to match the minimum duration.
    • The fixed 2.5 s time window used for feature extraction ensures that features such as MFCCs, Mel Spectrogram, and Chroma Features capture sufficient temporal and spectral characteristics of the sounds.
  • Sliding Factor Consideration:
    • A fixed windowing approach is used over overlapping sliding windows during segmentation. This ensures non-redundant segments while maintaining dataset balance.
    • However, future work could explore the impact of using overlapping windows to capture more temporal variations while controlling data redundancy.
This structured approach ensures that the extracted sound features represent the fault categories well while maintaining computational efficiency.
By the end of this phase, audio files are sampled, labeled, and normalized to build the dataset. Three datasets are created: the first one contains car faults (DB1) with 133 audio files and 27 distinct classes, the second dataset contains other sounds (DB2) with 1031 audio files and 22 distinct classes, and the third dataset is a merged version between the latter two (DB3) with 1164 audio files and 49 different classes. Table 2 and Table 3 show the labels and the corresponding file counts for DB1 and DB2, respectively.

4.1. Feature Extraction

After preparing the datasets, the next stage in the proposed system is feature extraction. Audio feature extraction is a significant task in processing an audio signal for the purpose of sound classification. From an audio signal, meaningful features can be extracted to analyze and understand the content of the audio. Figure 3 shows some key features commonly extracted from audio signals.
The essential features in our study are extracted in two versions: a compact version with 52 features and an expanded one with 126 features. In the compact version, Mel Spectrogram [46], MFCCs [47], and Chroma Features [48] were used. Figure 4 shows an example of a Mel Spectrogram.
For generating the expanded version, Spectral Features [49], Zero-Crossing Rate [19], Root Mean Square Energy (RMSE) [50], Chroma Features, MFCCs, and Extended MFCCs [51] were used. Table 4 defines these features, including their counts and the version(s), Compact (C) or Expanded (E), they appeared in. Figure 5 shows two-dimensional data projection DB1 compact features.
The pseudo-code for extracting compact and expanded feature lists from audio files are listed in Algorithms 2 and 3, respectively.
Algorithm 2: Pseudo-code for extracting Compact feature list
Machines 13 00258 i002
Algorithm 3: Pseudo-code for extracting Expanded feature list
Machines 13 00258 i003

4.2. Classification

In this proposed system, the input audio is classified using ML techniques. The two versions of feature lists are used to test eleven different models on the three datasets created.
Neural Network (NN): A computational model consisting of interconnected neurons [52]. It is used for both regression and classification tasks. A neural network can be formulated by:
Y = f ( W X + b )
where f is an activation function, W are weights, x is input, and b is bias.
Naive Bayes (NB): A probabilistic classifier based on Bayes’ theorem, assuming independence among predictors [53]. The NB equation is given by:
P ( C | X ) = P ( X | C ) P ( C ) P ( X )
where C is the class and X is the feature vector.
Logistic Regression (LR): A statistical method for predicting binary classes [54]. The outcome is modeled using a logistic function, which outputs probabilities. Logistic Regression is formulated as follows:
P ( Y = 1 | X ) = 1 1 + e ( β 0 + β 1 X 1 + β 2 X 2 + . . . + β n X n )
Stochastic Gradient Descent (SGD): An iterative method for optimizing an objective function, commonly used in training ML models, particularly neural networks [55].
θ = θ η J ( θ )
where η is the learning rate and J ( θ ) is the gradient of the loss function.
k-Nearest Neighbors (kNN): A non-parametric method used for classification and regression by finding the k nearest data points in the feature space [56].
y ^ = mode ( y i ) for i k nearest neighbors
Decision Tree (DT): A flowchart-like structure where each internal node represents a feature test, each branch represents an outcome, and each leaf node represents a class label [57].
Class = leaf node based on features
Random Forest (RF): An ensemble learning method that constructs multiple decision trees during training and outputs the mode of their predictions for classification tasks [58].
H ( x ) = mode ( h 1 ( x ) , h 2 ( x ) , . . . , h T ( x ) )
Support Vector Machine (SVM): A supervised learning model that finds the optimal hyperplane that best separates different classes in the feature space [59].
f ( x ) = sign ( W · X + b )
CN2 Rule Induction: An algorithm for inducing classification rules from examples [60]. It generates rules based on the attributes of the training data.
Class = if ( A 1 A 2 . . . A n ) then C
where A i are conditions based on attributes and C is the class label.
Adaptive Boosting (AdaBoost): An ensemble method that combines multiple weak classifiers to create a strong classifier by focusing on errors made by previous classifiers [61].
H ( x ) = t = 1 T α t h t ( x )
where α t is the weight of the classifier h t .
Gradient Boosting (GB): An ensemble technique that builds models sequentially, with each new model correcting errors made by the previous ones [62].
F ( x ) = F m 1 ( x ) + ν h m ( x )
where ν is the learning rate and h m is the new model.

4.3. Feature Ranking

Feature ranking is considered a very crucial step in machine learning and data analysis and can select relevant features that contribute to the predictive models [63]. Four important feature ranking methods have been applied to estimate the importance of the different features used in previous classification experiments, including information gain, analysis of variance, ReliefF, and fast correlation-based filters. Information Gain (IG) measures the reduction in entropy or uncertainty after splitting a dataset based on a feature [64]. IG calculates the difference between the entropy of the target variable and the conditional entropy given the feature. Features with higher IG values are more informative.
I G = H ( Y ) H ( Y | X )
where H ( Y ) is the entropy of the target variable, and H ( Y | X ) is the conditional entropy.
Analysis of Variance (ANOVA) measures the ratio of between-class variance to within-class variance for a feature [65]. Features with higher ANOVA values are more discriminative.
A N O V A = V a r i a n c e b e t w e e n c l a s s e s V a r i a n c e w i t h i n c l a s s
ReliefF is an extension of the Relief algorithm, estimating feature relevance by measuring the difference between the feature’s values for nearest neighbours from different classes [66].
W ( F ) = W ( F ) + 1 k j = 1 k ( Δ ( F , H j ) Δ ( F , M j ) )
where W ( F ) is the current weight of feature F, k is the number of nearest neighbors considered, H j are the nearest neighbors from the same class (hits), and M j are the nearest neighbors from different classes (misses).
Fast Correlation-Based Filter (FCBF) evaluates feature relevance using correlation and redundancy [67]. It selects features with a high correlation to the target variable and low redundancy.
F C B F = C o r r e l a t i o n ( X , Y ) R e d u n d a n c y ( X )

4.4. Bayesian-Optimized Weighted Soft Voting with Feature Selection (BOWSVFS)

Ensemble learning is currently one of the most powerful methods in machine learning, combining many models to produce predicted performance better to that of standalone models [68]. Optimal combination in model weight determination remains an optimization challenge. To address this issue, the proposed approach employs Bayesian Optimization combined with Weighted Soft Voting.
In this model, WSV is the central process, with a few classifiers voting on a final prediction and assigning weights to each. Soft voting, unlike hard voting, does not involve direct class prediction; instead, it uses cross-class probability distributions [69]. Each prediction is weighted based on the perceived importance of each classifier in the entire ensemble, and the weighted probabilities are combined to obtain the final prediction.
The weights assigned to each classifier play a crucial role in the ensemble’s performance. Traditional approaches often use equal weights or weights determined through grid search. However, these methods can be computationally expensive and may not find the optimal weight configuration, especially when dealing with multiple classifiers and features. Algorithm 4 depicts the procedures for calculating WSV. Figure 6 shows the steps of Bayesian-Optimized Weighted Soft Voting procedure.
Algorithm 4: Pseudo-code for calculating WSV
Machines 13 00258 i004
Bayesian optimization provides a more systematic and efficient technique to determining optimal weights [70]. Bayesian optimization uses a probabilistic model to predict the link between hyperparameters, weight, and feature counts, as well as model performance. This model is typically a Gaussian Process variation. This strategy is particularly useful since it swiftly explores the hyperparameter space by creating a proxy model of the objective function. It strikes a compromise between exploring unknown regions and exploiting known favorable locations, requiring fewer iterations than grid or random search algorithms. The steps for implementing the proposed Bayesian Optimization for Weighted Soft Voting are given in Algorithm 5, and these steps are:
  • Feature Ranking Using ANOVA: Rank features based on ANOVA (analysis of variance) scores.
  • Data Preprocessing: Select the top-k features based on ANOVA ranking using the following equation.
    F k = SelectTopKFeatures ( F , ANOVA , k )
  • Training Three Models: Train Logistic Regression (LR), Multilayer Perceptron (MLP), and AdaBoost using the selected features with 10-fold cross-validation to ensure robustness using the following equation.
    p ( j ) ( x ) = f j ( x | F k ) , j { L R , M L P , A B }
  • Computing Weighted Probabilities:
    • Calculate the prediction probabilities for each model.
    • Compute the weighted sum of these probabilities using the following equation.
      P s u m ( c | x ) = j { L R , M L P , A B } w j p ( j ) ( x ) c
    • Apply softmax to the final weighted sum of probabilities using the following equation.
      P ( c | x ) = exp ( P s u m ( c | x ) ) c exp ( P s u m ( c | x ) )
  • Bayesian Optimization: Optimize the model weights ( w 1 , w 2 , w 3 ) and the feature count (k) using Bayesian optimization techniques to maximize accuracy using the following equation.
    { w 1 * , w 2 * , w 3 * , k * } = arg max w 1 , w 2 , w 3 , k Accuracy ( y , y ^ ( x ) )
  • Final Predictions Using Optimized Parameters: Use the optimized weights and features for the final soft voting decision using the following equation.
    y ^ ( x ) = arg max c P ( c | x )
Algorithm 5: Pseudo-code for Bayesian Optimization Weighted Soft Voting
Machines 13 00258 i005
Implementing the proposed approach in practice can be performed by detecting the onset of an alert or emergency sound through a preprocessing step such as a sound event or voice activity detection (VAD). By distinguishing between background noise and relevant sound events, these techniques can help the system identify when a sound starts, even in noisy environments. Additionally, microphones or sensors can be strategically placed in or around the vehicle to capture sound more accurately, such as in isolated engine compartments with noise-canceling technology to improve sound capture quality. Also, the system can be integrated with existing vehicle monitoring systems to automatically trigger sound detection when certain conditions are met, such as abnormal engine behavior, sudden changes in vehicle speed, or other sensor data that might indicate a fault or emergency event.

5. Experiments and Discussion

In this study, we evaluated the performance of eleven distinct machine learning models on three datasets, utilizing two versions of feature lists: a compact version comprising 52 features and an expanded version consisting of 126 features. The models were assessed based on several performance metrics, including Area Under the Curve (AUC), Classification Accuracy (CA), F1-score (F1), Precision (Prec), Recall, Matthews Correlation Coefficient (MCC), Specificity (Spec), and Logarithmic Loss (LogLoss).
Data acquisition involved online tools for downloading YouTube videos, while segmenting and audio extraction utilized FFmpeg (v6.0) and Veed.io. Audio conversion was performed using Online Audio Converter. Audio processing, feature extraction, and analysis were conducted using Python (v3.12) (Jupyter Notebook and Spyder IDE) on a computer equipped with an Intel Core i7 processor and 16 GB RAM. Key libraries employed include Librosa (v0.10.1) and Pydub (v0.25.1). All processes were completed using standard software tools. To ensure transparency and reproducibility, all datasets and codes are publicly available in our GitHub (v3.15) repository, along with comprehensive documentation and 72 references for dataset collection, including publicly available sources and YouTube audio samples.

5.1. Performance Metrics

This study used several performance indicators to analyze the efficiency of the models under evaluation. One significant metric utilized is the Area Under the Curve (AUC), which measures a model’s ability to distinguish between positive and negative classes by computing the Area under the Receiver Operating Characteristic (ROC) curve. The AUC is determined using the following formula:
AUC = 0 1 ROC ( x ) d x
where ROC is the true positive rate plotted against the false positive rate at various threshold settings.
Another important metric is Classification Accuracy (CA), calculated as the ratio of correctly predicted instances to the total number of instances in a dataset. The formula runs as follows:
CA = T P + T N T P + T N + F P + F N
where: TP = True Positives, TN = True Negatives, FP = False Positives, FN = False Negatives.
  • Precision is the ratio of true positive predictions the model provides to total positive predictions, indicating how precise the model is when producing positive predictions. Precision can be calculated as:
    Precision = T P T P + F P
Recall, also known as sensitivity, is the ratio of true positive predictions compared to all actual positive instances. It is one of the key metrics for evaluating the performance of a predictive model by its ability to identify positive instances correctly. It is calculated using the following formula:
Recall = T P T P + F N
Specificity, on the other hand, measures the proportion of true negative predictions among all actual negative instances and is calculated as:
Specificity = T N T N + F P
The F1-score comprehensively evaluates a model’s performance by calculating the harmonic mean of Precision and recall. It’s calculated with the following formula:
F 1 = 2 × Precision × Recall Precision + Recall
The Matthews Correlation Coefficient (MCC) is a well-balanced measure that considers all four categories of the confusion matrix, providing a more comprehensive metric for binary classification. The MCC is calculated as follows:
MCC = T P × T N F P × F N ( T P + F P ) ( T P + F N ) ( T N + F P ) ( T N + F N )
Logarithmic Loss (LogLoss) is a metric used to evaluate the performance of a classification model. It measures the accuracy of the probabilities assigned to each class. The LogLoss is calculated as:
LogLoss = 1 N i = 1 N ( y i log ( p i ) + ( 1 y i ) log ( 1 p i ) )
where N = total number of instances, y i = actual label (0 or 1), p i = predicted probability of the positive class.

5.2. Hyperparameters

Hyperparameter tuning is one of the essential steps in any machine learning classification process [71,72]. It basically involves the selection of the best hyperparameters that control the learning process of a model. These hyperparameters are fixed before training since they are not learned from the data during training. Well-set hyperparameters can significantly improve the accuracy of a model and its generalization on unseen data. They form the basis for finding a good trade-off between bias and variance. Appropriate hyperparameters can speed up the training process, which means faster model development and deployment. Efficient hyperparameter settings can optimize resource utilization, thus reducing training time and costs.
Table 5 lists the configurations used for each model. The following are some common hyperparameters and their effect:
  • Learning Rate: This hyperparameter controls the step size during gradient descent when moving toward the minimum. A very high learning rate results in instability, while a very low one slows down training.
  • Number of Trees/Estimators: The number of trees in ensembling techniques like Random Forest and Gradient Boosting. More trees provide higher accuracy, but training a model takes longer.
  • Tree Depth: The hyperparameter for tree-based models defines each tree’s maximum depth. Deep trees can easily capture complex patterns but tend to overfit much more.
  • Regularization: Methods such as L1 and L2 regularization prevent overfitting by penalizing large weights. The strength of regularization is a hyperparameter that needs tuning.
  • Number of Hidden Layers and Neurons: This governs the model’s architecture in neural networks.
Cross-validation is one of the best methods for hyperparameter tuning, and it was employed in this study. It evaluates model performance using techniques such as k-fold cross-validation to obtain a more accurate assessment of its performance. The sampling type used was a 10-fold cross-validation.

5.3. Car Faults DB1 Evaluation

Various ML models were tested on the car faults dataset (DB1), analyzing two versions of feature lists: compact and expanded. Table 6 and Table 7 display the measured performance metrics in both cases.
It was shown in the results of testing DB1 by the compact version of the extracted feature list that the Logistic Regression has the highest classification accuracy with the lowest Log Loss value among all evaluated models. Logistic Regression had the best classification accuracy but the second lowest Log Loss value when testing DB1 with the expanded version of the list of extracted features.

5.4. Other Sounds DB2 Evaluation

ML models were tested on other sound datasets (DB2), analyzing two versions of feature lists: compact and expanded. Table 8 and Table 9 display the measured performance metrics in both cases.
Based on the DB2 testing results of the compact version of the list of features extracted, the Neural Networks model has the highest accuracy classification with the lowest Log Loss value among all models evaluated. Using the expanded version of the list of extracted features, Neural Networks obtained the second-highest accuracy after AdaBoost.

5.5. DB3 Evaluation

ML models were tested on the merged dataset (DB3), analyzing two versions of feature lists: compact and expanded. Table 10 and Table 11 display the measured performance metrics in both cases.
The DB3 test results for the compact form of the list of features extracted showed that the Neural Networks model presents the minimum Log Loss value and the maximum accuracy of classification compared with the other models. Using the expanded version of the list of extracted features, Neural Networks reached the second-highest accuracy after AdaBoost.

5.6. Feature Ranking

Feature ranking was performed using the compact feature list on the DB1 dataset. Table 12 shows the rankings of the 52 features of the compact list. The table’s rankings demonstrate that the top features across approaches were MFCC features (mean_10, mean_3, mean_2, mean_4) and Mel Spectrogram features (mean). The dominance of MFCC methods is evident. MFCC mean features are statistically significant across all measures, consistently outperforming standard deviation features. Chromagram characteristics, notably standard deviations, have a lower overall relevance. However, a few exceptions, such as chromagram_mean_7, have moderate rankings.

5.7. Evaluation of BOWSV

To incorporate Bayesian Optimization and Weighted Soft Voting into the proposed model, the previously selected features are ranked using ANOVA F-scores to select the most relevant features. Standardization is performed to scale the variables to the same scale. Multiple classifiers with diverse bases are included. Optimization starts with defining the bounds for classifier weights and feature counts. Then, cross-validation is applied to the results to obtain robust performance estimates, and acquisition function guides the search for optimal parameters. The optimization objective function evaluates the ensemble’s performance using cross-validation to ensure that the estimates of the generalization performance are reliable. It converges to the best answer by iteratively proposing different weight combinations and assessing how well they work. Table 13 shows the metrics of the three datasets, DB1, DB2, and DB3, after applying ensemble optimization. For each iteration, the weights (w1, w2, w3), the number of features used, and the achieved accuracy are depicted.
Due to the large number of classes (27 for car faults, 22 for environmental sounds, and 49 for the merged dataset), a full confusion matrix would be impractical. Instead, key examples of classification performance have been summarized. Several classes—including Engine Misfire, Fuel Pump Cartridge Fault, Radiator Fan Failure, Strut Mount Failure, Suspension Arm Fault, and others—are classified with 100% accuracy, and the Bad CV Joint class achieves around 75% accuracy. Furthermore, the Bayesian-Optimized Weighted Soft Voting with Feature Selection (BOWSVFS) approach demonstrates the robustness of the model by achieving an overall accuracy of 91.04% on the car fault dataset (DB1).
However, some classes present challenges. For instance, the Universal Joint Failure or Steering class has an 80% correct classification rate, with misclassifications primarily as engine rattling noise. The Knocking class, in particular, exhibits significant difficulty, with only 40% of instances correctly classified and misclassifications distributed across categories such as bad wheel bearing, squeaky, and squeaky brake (or grinding brake). These examples highlight the strengths and areas for improvement within the proposed framework.

5.8. Outlook and Future Perspectives

The practical implications of this research are far-reaching. The framework enhances overall transportation safety through timely interventions and improved emergency response by enabling early fault detection and real-time classification of vehicle and environmental sounds. Furthermore, the ability to accurately interpret auditory cues supports the development of more accessible and inclusive ITSs. For instance, auditory alerts can be transformed into visual or haptic signals, thereby assisting individuals with disabilities and ensuring that critical safety information is disseminated effectively. These advances pave the way for smarter, more responsive urban transportation systems that improve efficiency and significantly elevate the safety and quality of life in smart cities.
Although the proposed classification methods can achieve a high degree of accuracy in sound-based early fault detection for vehicles in ITSs, there remains potential for further enhancement through the incorporation of explicit user feedback, such as ratings of the classification results. The efficacy of machine learning systems can be notably augmented by fostering a collaborative relationship with users, improving the system’s accuracy and enhancing user understanding and trust in the system [73,74,75]. Users can contribute to the classification model by providing explicit collective feedback regarding its classification accuracy and the early detection of faults for vehicles in ITSs. This feedback can subsequently be utilized to refine the overall accuracy of the classification model. For instance, users might assign scores or ratings to the accuracy of detected faults. Nonetheless, sustaining user motivation for continuous feedback poses a challenge, as many users exhibit limited interest in participating in such evaluations [76].
The gamification concept is employed as a behavioral change strategy to enhance user motivation toward engaging in desired behaviours, such as providing feedback on the classification accuracy of detected faults for vehicles in ITS [77,78]. A prevalent application of gamification involves incorporating elements of video games, such as points and levels, into non-gaming contexts, such as educational settings [79]. Gamification has demonstrated successful implementation across various domains, including the promotion of healthy lifestyle choices [80], the enhancement of student engagement in academic courses [81], and the improvement of quality and productivity within business environments [82]. There are four primary elements of gamification commonly utilized in non-gaming contexts [83]:
  • Points: Many gamification strategies rely on point systems, which may include features such as levels and leaderboards. The classification accuracy of detected faults can be quantified through user ratings regarding the quality of fault detection for vehicles in ITS. Points accumulated or lost will subsequently inform the classification model’s training to enhance its ability to detect sound-based faults for vehicles in ITSs early. Nevertheless, points should be integrated with other gamification elements to effectively motivate users [83].
  • Digital Badges: Users may receive digital badges as recognition for acquiring specific skills, knowledge, or achievements, thereby showcasing their accomplishments [84]. These badges are typically awarded based on predefined criteria [85,86,87]. For example, users might earn digital badges by reaching a specified number of points corresponding to their ratings on the classification accuracy of early detected sound-based faults for vehicles in ITSs.
  • Levels: Users must accumulate points to advance to higher levels. Upon reaching a predetermined point threshold, they can level up, thereby unlocking additional features within the system [88].
  • Leaderboards: Users can establish leaderboards to reflect their achievements or points earned or to track progress toward specific goals [86].
A recent study [89] identifies several factors that affect users’ perceptions and responses to gamification elements utilized for feedback collection, revealing diverse preferences in this context. This underscores the necessity of systematically gathering users’ explicit and collective feedback, which can be instrumental in optimizing our proposed classification model to align with user preferences. Neglecting this aspect could result in overseeing critical factors that enhance classification accuracy. To address this, one can utilize the application-independent conceptual framework proposed by [89], which can be adapted to gamify the feedback collection process regarding the accuracy of our sound-based early detection of faults for vehicles in the ITS system. This framework articulates the variations in user perceptions and needs concerning gamification elements, aiming to motivate users to provide high-quality feedback on the classification accuracy of our proposed system. It serves as a guiding resource for software engineers in encouraging users to offer their explicit and collective feedback, thereby facilitating further training of our classification model and potentially improving its early fault detection accuracy for vehicles in the ITS. Additionally, a category representing normal operational conditions or safe car sounds can be included to better differentiate faults from irrelevant auditory data.

6. Conclusions and Future Work

This study highlights the critical significance of sound-based diagnostics in improving Intelligent Transportation Systems (ITSs) in smart cities. The potential for real-time vehicle fault identification and enhanced accessibility for people with disabilities was demonstrated by creating a new dataset of automotive malfunction sounds and combining audio processing techniques with ML. The high accuracy rates attained by various ML models demonstrate the efficacy of sound-based techniques as a complement to classic vision-based systems. Finally, this study leads to a more inclusive and responsive transportation infrastructure, which aligns with the overall goals of smart city development.
Future research includes (i) expanding the dataset to cover diverse vehicles, faults, and real-world scenarios through collaborations with the automotive and public transport sectors; (ii) integrating sound data with visual and environmental sensors, which can enhance system robustness; (iii) developing real-time sound-based detection systems for urban applications and exploring advanced machine learning methods, such as deep learning and transfer learning, which will improve accuracy; (iv) incorporating the “no sound” class in implementing the proposed approach in practice. (v) investigating generative model-based data augmentation strategies to boost dataset diversity and model resilience; and (vi) exploring domain adaptation techniques, few-shot learning, or data augmentation strategies to enhance generalization across a wider range of vehicles.

Author Contributions

A.R.: Conceptualization, Methodology, Software, Writing—Original Draft. Y.A.: Data Curation, Methodology, Investigation, Writing—Original Draft. T.A.F.: Visualization, Software. M.A.: Conceptualization, Methodology, Writing—Original Draft. A.B.: Data Curation, Investigation, Writing—Original Draft. M.B.: Methodology, Writing—Review and Editing, Supervision. M.A.E.: Supervision, Methodology, Writing—Review and Editing, Project Administration. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by funds from the King Salman Centre for Disability Research (Group no.: KSRG-2024-240).

Data Availability Statement

In this study, data from various sources are collected to create a tailored dataset for car sound analysis: Reliable audio samples are built by downloading videos from YouTube related to car faults, animal sounds, car crashes, siren sounds, etc. Additional audio is supplemented from Kaggle datasets: FSC22 [38], Google AudioSet [39], Audio Classifier Dataset [40], Sound classification of animal voice [41], and Vehicle Sounds dataset [42]. The data collected by the authors are available at: https://github.com/amrrashed/Sound-Based-Vehicle-Diagnostics-Emergency-Signal-Recognition/tree/main (accessed on 31 January 2025). Code availability: The code used is available at: https://github.com/amrrashed/Sound-Based-Vehicle-Diagnostics-Emergency-Signal-Recognition/tree/main/codes (accessed on 31 January 2025).

Acknowledgments

The authors extend their appreciation to the King Salman Centre for Disability Research for funding this work through Research Group No. KSRG-2024-240.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Gong, T.; Zhu, L.; Yu, F.R.; Tang, T. Edge Intelligence in Intelligent Transportation Systems: A Survey. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8919–8944. [Google Scholar] [CrossRef]
  2. Khalil, R.A.; Safelnasr, Z.; Yemane, N.; Kedir, M.; Shafiqurrahman, A.; Saeed, N. Advanced Learning Technologies for Intelligent Transportation Systems: Prospects and Challenges. IEEE Open J. Veh. Technol. 2024, 5, 397–427. [Google Scholar] [CrossRef]
  3. Sarwatt, D.S.; Lin, Y.; Ding, J.; Sun, Y.; Ning, H. Metaverse for Intelligent Transportation Systems (ITS): A Comprehensive Review of Technologies, Applications, Implications, Challenges and Future Directions. IEEE Trans. Intell. Transp. Syst. 2024, 25, 6290–6308. [Google Scholar] [CrossRef]
  4. Wang, B.; Li, Q.; Mao, Q.; Wang, J.; Chen, C.L.P.; Shangguan, A.; Zhang, H. A Survey on Vision-Based Anti Unmanned Aerial Vehicles Methods. Drones 2024, 8, 518. [Google Scholar] [CrossRef]
  5. Masal, K.M.; Bhatlawande, S.; Shingade, S.D. Development of a visual to audio and tactile substitution system for mobility and orientation of visually impaired people: A review. Multimed. Tools Appl. 2024, 83, 20387–20427. [Google Scholar] [CrossRef]
  6. Liu, F.; Lu, Z.; Lin, X. Vision-based environmental perception for autonomous driving. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2023, 239, 39–69. [Google Scholar] [CrossRef]
  7. Kiranyaz, S.; Can Devecioglu, O.; Alhams, A.; Sassi, S.; Ince, T.; Avci, O.; Gabbouj, M. Exploring Sound Versus Vibration for Robust Fault Detection on Rotating Machinery. IEEE Sens. J. 2024, 24, 23255–23264. [Google Scholar] [CrossRef]
  8. Alqudaihi, K.S.; Aslam, N.; Khan, I.U.; Almuhaideb, A.M.; Alsunaidi, S.J.; Ibrahim, N.M.A.R.; Alhaidari, F.A.; Shaikh, F.S.; Alsenbel, Y.M.; Alalharith, D.M.; et al. Cough Sound Detection and Diagnosis Using Artificial Intelligence Techniques: Challenges and Opportunities. IEEE Access 2021, 9, 102327–102344. [Google Scholar] [CrossRef]
  9. Vranken, E.; Mounir, M.; Norton, T. Sound-Based Monitoring of Livestock. In Encyclopedia of Smart Agriculture Technologies; Springer International Publishing: Berlin/Heidelberg, Germany, 2023; pp. 1–12. [Google Scholar] [CrossRef]
  10. Pervez, F.; Shoukat, M.; Suresh, V.; Farooq, M.U.B.; Sandhu, M.; Qayyum, A.; Usama, M.; Girardi, A.; Latif, S.; Qadir, J. Medicine’s New Rhythm: Harnessing Acoustic Sensing via the Internet of Audio Things for Healthcare. IEEE Open J. Comput. Soc. 2024, 5, 491–510. [Google Scholar] [CrossRef]
  11. Kim, J.; Kim, J.; Kim, H. A Study on Gear Defect Detection via Frequency Analysis Based on DNN. Machines 2022, 10, 659. [Google Scholar] [CrossRef]
  12. Koh, P.; Kim, S. Designing a Augmented Reality Auditory Training Game for in-situ training and diagnostic tool for the hearing impaired. In Proceedings of the Audio Engineering Society Conference: AES 2024 International Audio for Games Conference, Tokyo, Japan, 27–29 April 2024. [Google Scholar]
  13. Oladimeji, D.; Gupta, K.; Kose, N.A.; Gundogan, K.; Ge, L.; Liang, F. Smart Transportation: An Overview of Technologies and Applications. Sensors 2023, 23, 3880. [Google Scholar] [CrossRef] [PubMed]
  14. Mohammed, H.B.M.; Cavus, N. Utilization of Detection of Non-Speech Sound for Sustainable Quality of Life for Deaf and Hearing-Impaired People: A Systematic Literature Review. Sustainability 2024, 16, 8976. [Google Scholar] [CrossRef]
  15. Nasim, F.; Masood, S.; Jaffar, A.; Ahmad, U.; Rashid, M. Intelligent Sound-Based Early Fault Detection System for Vehicles. Comput. Syst. Sci. Eng. 2023, 46, 3175–3190. [Google Scholar] [CrossRef]
  16. Hamad, A.A.; Nasim, M.F.; Jaffar, A.; Khalaf, O.I.; Ouahada, K.; Hamam, H.; Akram, S.; Siddique, A. Cognitive Inspired Sound-Based Automobile Problem Detection: A Step Toward Xai. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4814232 (accessed on 25 January 2025).
  17. Yildirim, S.; Bingol, M.S. Design of a Proposed Neural Network for Sound Quality Analysis of Different Types for Car Systems. Int. J. Mechatron. Appl. Mech. 2024, 16, 76–81. [Google Scholar]
  18. Akbalık, F.; Yıldız, A.; Ertuğrul, Ö.F.; Zan, H. Engine Fault Detection by Sound Analysis and Machine Learning. Appl. Sci. 2024, 14, 6532. [Google Scholar] [CrossRef]
  19. Murovec, J.; Prezelj, J.; Ćirić, D.; Milivojčević, M. Zero Crossing Signature: A Time-Domain Method Applied to Diesel and Gasoline Vehicle Classification. IEEE Sens. J. 2024, 25, 3. [Google Scholar] [CrossRef]
  20. Wang, S.; Xu, Q.; Zhu, S.; Wang, B. Making transformer hear better: Adaptive feature enhancement based multi-level supervised acoustic signal fault diagnosis. Expert Syst. Appl. 2025, 264, 125736. [Google Scholar] [CrossRef]
  21. Boztas, G.; Tuncer, T.; Aydogmus, O.; Yildirim, M. A DCSLBP based intelligent machine malfunction detection model using sound signals for industrial automation systems. Comput. Electr. Eng. 2024, 119, 109541. [Google Scholar] [CrossRef]
  22. Wang, Y.; Li, D.; Li, L.; Sun, R.; Wang, S. A novel deep learning framework for rolling bearing fault diagnosis enhancement using VAE-augmented CNN model. Heliyon 2024, 10, e35407. [Google Scholar] [CrossRef]
  23. Guo, X. Fault Diagnosis of Rolling Bearings Based on Acoustics and Vibration Engineering. IEEE Access 2024, 12, 139632–139648. [Google Scholar] [CrossRef]
  24. Li, Y.; Tao, X.; Sun, Y. A Fault Diagnosis Method for Turnout Switch Machines Based on Sound Signals. Electronics 2024, 13, 4839. [Google Scholar] [CrossRef]
  25. Kreuzer, M.; Schmidt, D.; Wokusch, S.; Kellermann, W. Real-World Airborne Sound Analysis for Health Monitoring of Bearings in Railway Vehicles. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4923626 (accessed on 19 January 2025).
  26. Yun, E.; Jeong, M. Acoustic Feature Extraction and Classification Techniques for Anomaly Sound Detection in the Electronic Motor of Automotive EPS. IEEE Access 2024, 12, 149288–149307. [Google Scholar] [CrossRef]
  27. Shajie, D.; Juliet, S.; Ezra, K.; Annie Flora, J.B. Diagnostic Sonance: Sound-Based Approach to Assess Engine Ball Bearing Health in Automobiles. Prz. Elektrotechniczny 2024, 1, 74–78. [Google Scholar] [CrossRef]
  28. Khan, F.A.; Jamil, A.; Khan, S.A.; Hameed, A.A. Enhancing robotic manipulator fault detection with advanced machine learning techniques. Eng. Res. Express 2024, 6, 025204. [Google Scholar] [CrossRef]
  29. Kim, S.M.; Soo Kim, Y. Enhancing Sound-Based Anomaly Detection Using Deep Denoising Autoencoder. IEEE Access 2024, 12, 84323–84332. [Google Scholar] [CrossRef]
  30. Senanayaka, A.; Lee, P.; Lee, N.; Dickerson, C.; Netchaev, A.; Mun, S. Enhancing the Accuracy of Machinery Fault Diagnosis through Fault Source Isolation of Complex Mixture of Industrial Sound Signals. Int. J. Adv. Manuf. Technol. 2024, 133, 5627–5642. [Google Scholar] [CrossRef]
  31. Hameed, U.; Masood, S.; Nasim, F.; Jaffar, A.; Ahmed, Z.; Khan, R.; Hussain, A.; Ali, S.; Mehmood, A.; Shah, R. Exploring the Accuracy of Machine Learning and Deep Learning in Engine Knock Detection. Bull. Bus. Econ. 2024, 13, 203–210. [Google Scholar] [CrossRef]
  32. Naryanto, R.F.; Delimayanti, M.K.; Naryaningsih, A.; Adi, R.; Setiawan, B.A. Fault Detection in Diesel Engines using Artificial Neural Networks and Convolutional Neural Networks. J. Theor. Appl. Inf. Technol. 2024, 102, 683–690. [Google Scholar]
  33. Yuan, G.; Yang, Y. Fault detection method of new energy vehicle engine based on wavelet transform and support vector machine. Int. J. Knowl. Based Intell. Eng. Syst. 2024, 28, 718–731. [Google Scholar] [CrossRef]
  34. Chu, S.; Zhang, J.; Liu, F.; Kong, X.; Jiang, Z.; Mao, Z. Fault identification model of diesel engine based on mixed attention: Single-cylinder fault data driven whole-cylinder diagnosis. Expert Syst. Appl. 2024, 255, 124769. [Google Scholar] [CrossRef]
  35. Lee, D.; Choo, H.; Jeong, J. GCN-Based LSTM Autoencoder with Self-Attention for Bearing Fault Diagnosis. Sensors 2024, 24, 4855. [Google Scholar] [CrossRef]
  36. Spadini, T.; Nose-Filho, K.; Suyama, R. Intelligent Fault Diagnosis of Type and Severity in Low-Frequency, Low Bit-Depth Signals. arXiv 2024, arXiv:2411.06299. [Google Scholar] [CrossRef]
  37. Gantert, L.; Zeffiro, T.; Sammarco, M.; Campista, M.E.M. Multiclass classification of faulty industrial machinery using sound samples. Eng. Appl. Artif. Intell. 2024, 136, 108943. [Google Scholar] [CrossRef]
  38. Bandara, M.; Jayasundara, R.; Ariyarathne, I.; Meedeniya, D.; Perera, C. FSC22 Dataset. 2022. Available online: https://www.kaggle.com/datasets/irmiot22/fsc22-dataset (accessed on 19 January 2025).
  39. Gemmeke, J.F.; Ellis, D.P.W.; Freedman, D.; Jansen, A.; Lawrence, W.; Moore, R.C.; Plakal, M.; Ritter, M. Audio Set: An ontology and human-labeled dataset for audio events. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LO, USA, 5–9 March 2017. [Google Scholar] [CrossRef]
  40. Jacob, I. Audio Classifier Dataset. 2025. Available online: https://www.kaggle.com/datasets/aklimarimi/audio-classifier-dataset (accessed on 19 January 2025).
  41. Putthewad, R.B. Sound Classification of Animal Voice. 2025. Available online: https://www.kaggle.com/datasets/rushibalajiputthewad/sound-classification-of-animal-voice (accessed on 19 January 2025).
  42. Abderrahim, J. Vehicle Sounds Dataset. 2025. Available online: https://www.kaggle.com/datasets/janboubiabderrahim/vehicle-sounds-dataset (accessed on 19 January 2025).
  43. Community, D. DCASE 2024 Challenge. 2024. Available online: https://dcase.community/challenge2024/index (accessed on 19 January 2025).
  44. Community, D. UrbanSound8K Dataset. DCASE 2024 Challenge. 2024. Available online: https://www.kaggle.com/code/prabhavsingh/urbansound8k-classification (accessed on 19 January 2025).
  45. Research, G. AudioSet. Available online: https://www.kaggle.com/datasets/akela91/google-audioset (accessed on 19 January 2025).
  46. Li, H.; Wang, Z. Anomaly identification of wind turbine blades based on Mel-Spectrogram Difference feature of aerodynamic noise. Measurement 2025, 240, 115428. [Google Scholar] [CrossRef]
  47. Lakdari, M.W.; Ahmad, A.H.; Sethi, S.; Bohn, G.A.; Clink, D.J. Mel-frequency cepstral coefficients outperform embeddings from pre-trained convolutional neural networks under noisy conditions for discrimination tasks of individual gibbons. Ecol. Inform. 2024, 80, 102457. [Google Scholar] [CrossRef]
  48. Pandeya, Y.R.; Lee, J. GlocalEmoNet: An optimized neural network for music emotion classification and segmentation using timbre and chroma features. Multimed. Tools Appl. 2024, 83, 74141–74158. [Google Scholar] [CrossRef]
  49. Constantinescu, C.; Brad, R. An Overview on Sound Features in Time and Frequency Domain. Int. J. Adv. Stat. IT&C Econ. Life Sci. 2023, 13, 45–58. [Google Scholar] [CrossRef]
  50. Balingbing, C.; Kirchner, S.; Siebald, H.; Van Hung, N.; Hensel, O. Determining the sound signatures of insect pests in stored rice grain using an inexpensive acoustic system. Food Secur. 2024, 16, 1529–1538. [Google Scholar] [CrossRef]
  51. Sanchez-Morillo, D.; Sales-Lerida, D.; Priego-Torres, B.; León-Jiménez, A. Cough Detection Using Acceleration Signals and Deep Learning Techniques. Electronics 2024, 13, 2410. [Google Scholar] [CrossRef]
  52. Rizvi, S.; Pettee, M.; Nachman, B. Learning likelihood ratios with neural network classifiers. J. High Energy Phys. 2024, 2024, 1–41. [Google Scholar] [CrossRef]
  53. Peretz, O.; Koren, M.; Koren, O. Naive Bayes classifier—An ensemble procedure for recall and precision enrichment. Eng. Appl. Artif. Intell. 2024, 136, 108972. [Google Scholar] [CrossRef]
  54. Khashei, M.; Etemadi, S.; Bakhtiarvand, N. A New Discrete Learning-Based Logistic Regression Classifier for Bankruptcy Prediction. Wirel. Pers. Commun. 2024, 134, 1075–1092. [Google Scholar] [CrossRef]
  55. Azimjonov, J.; Kim, T. Stochastic gradient descent classifier-based lightweight intrusion detection systems using the efficient feature subsets of datasets. Expert Syst. Appl. 2024, 237, 121493. [Google Scholar] [CrossRef]
  56. Sun, Y.; Liu, Q. Collaborative filtering recommendation based on K-nearest neighbor and non-negative matrix factorization algorithm. J. Supercomput. 2025, 81, 79. [Google Scholar] [CrossRef]
  57. Larisa, L. Optimized Composition of Business Process Web Services via QoS-Based Categorization Using Decision Tree Classifier and Knowledge-Based Decision Support. Am. J. Bus. Oper. Res. 2025, 12, 1–14. [Google Scholar] [CrossRef]
  58. Bouke, M.A.; Alramli, O.I.; Abdullah, A. XAIRF-WFP: A novel XAI-based random forest classifier for advanced email spam detection. Int. J. Inf. Secur. 2025, 24, 5. [Google Scholar] [CrossRef]
  59. Li, Y.; Xie, X. Two novel deep multi-view support vector machines for multiclass classification. Appl. Intell. 2025, 55, 1–17. [Google Scholar] [CrossRef]
  60. Maszczyk, C.; Sikora, M.; Wróbel, Ł. Classification, Regression, and Survival Rule Induction with Complex and M-of-N Elementary Conditions. Mach. Learn. Knowl. Extr. 2024, 6, 554–579. [Google Scholar] [CrossRef]
  61. Kumpf, K.; Protic, M.; Jovanovic, L.; Cajic, M.; Zivkovic, M.; Bacanin, N. Insider Threat Detection Using Bidirectional Encoder Representations From Transformers and Optimized AdaBoost Classifier. In Proceedings of the 2024 International Conference on Circuit, Systems and Communication (ICCSC), Fez, Morocco, 28–29 June 2024; pp. 1–6. [Google Scholar] [CrossRef]
  62. Theerthagiri, P. Liver disease classification using histogram-based gradient boosting classification tree with feature selection algorithm. Biomed. Signal Process. Control 2025, 100, 107102. [Google Scholar] [CrossRef]
  63. Aljohani, M.; AbdulAzeem, Y.; Balaha, H.M.; Badawy, M.; Elhosseini, M.A. Advancing feature ranking with hybrid feature ranking weighted majority model: A weighted majority voting strategy enhanced by the Harris hawks optimizer. J. Comput. Des. Eng. 2024, 11, 308–325. [Google Scholar] [CrossRef]
  64. Gao, J.; Wang, Z.; Jin, T.; Cheng, J.; Lei, Z.; Gao, S. Information gain ratio-based subfeature grouping empowers particle swarm optimization for feature selection. Knowl. Based Syst. 2024, 286, 111380. [Google Scholar] [CrossRef]
  65. Jamil, M.A.; Khanam, S. Influence of One-Way ANOVA and Kruskal–Wallis Based Feature Ranking on the Performance of ML Classifiers for Bearing Fault Diagnosis. J. Vib. Eng. Technol. 2024, 12, 3101–3132. [Google Scholar] [CrossRef]
  66. Yan, M.; Deng, J.; Zhang, S.; Chen, P. Feature Selection Method Based on Improved Differential Evolution and ReliefF. In Proceedings of the 2024 Guangdong-Hong Kong-Macao Greater Bay Area International Conference on Digital Economy and Artificial Intelligence, DEAI 2024, Dongguan, China, 19–21 January 2024; pp. 539–543. [Google Scholar] [CrossRef]
  67. Zhang, S.; Wang, T.; Worden, K.; Sun, L.; Cross, E.J. Canonical-correlation-based fast feature selection for structural health monitoring. Mech. Syst. Signal Process. 2025, 223, 111895. [Google Scholar] [CrossRef]
  68. Liu, Z. Ensemble Learning. In Artificial Intelligence for Engineers; Springer Nature: Cham, Switzerland, 2025; pp. 221–242. [Google Scholar] [CrossRef]
  69. Chhillar, I.; Singh, A. An improved soft voting-based machine learning technique to detect breast cancer utilizing effective feature selection and SMOTE-ENN class balancing. Discov. Artif. Intell. 2025, 5, 4. [Google Scholar] [CrossRef]
  70. Mahboubi, N.; Xie, J.; Huang, B. Point-by-point transfer learning for Bayesian optimization: An accelerated search strategy. Comput. Chem. Eng. 2025, 194, 108952. [Google Scholar] [CrossRef]
  71. Iturbe-Araya, J.I.; Rifà-Pous, H. Enhancing unsupervised anomaly-based cyberattacks detection in smart homes through hyperparameter optimization. Int. J. Inf. Secur. 2025, 24, 45. [Google Scholar] [CrossRef]
  72. Widardo, F.; Chowanda, A. Hyperparameter tuning for deep learning model used in multimodal emotion recognition data. Bull. Electr. Eng. Inform. 2025, 14, 261–267. [Google Scholar] [CrossRef]
  73. Stumpf, S.; Rajaram, V.; Li, L.; Burnett, M.; Dietterich, T.; Sullivan, E.; Drummond, R.; Herlocker, J. Toward harnessing user feedback for machine learning. In Proceedings of the 12th International Conference on Intelligent User Interfaces, IUI07, Honolulu, HI, USA, 28–31 January 2007; pp. 82–91. [Google Scholar] [CrossRef]
  74. Lee, T.Y.; Smith, A.; Seppi, K.; Elmqvist, N.; Boyd-Graber, J.; Findlater, L. The human touch: How non-expert users perceive, interpret, and fix topic models. Int. J. Hum. Comput. Stud. 2017, 105, 28–42. [Google Scholar] [CrossRef]
  75. Liao, Q.V.; Gruen, D.; Miller, S. Questioning the AI: Informing Design Practices for Explainable AI User Experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, Honolulu, HI, USA, 25–30 April 2020; pp. 1–15. [Google Scholar] [CrossRef]
  76. Almaliki, M.; Ali, R. Persuasive and Culture-Aware Feedback Acquisition. In Persuasive Technology; Springer International Publishing: Berlin/Heidelberg, Germany, 2016; pp. 27–38. [Google Scholar] [CrossRef]
  77. Deterding, S.; Dixon, D.; Khaled, R.; Nacke, L. From game design elements to gamefulness: Defining “gamification”. In Proceedings of the 15th International Academic MindTrek Conference: Envisioning Future Media Environments, MindTrek ’11, Tampere, Finland, 28–30 September 2011; pp. 9–15. [Google Scholar] [CrossRef]
  78. Herzig, P.; Ameling, M.; Schill, A. A Generic Platform for Enterprise Gamification. In Proceedings of the 2012 Joint Working IEEE/IFIP Conference on Software Architecture and European Conference on Software Architecture, Helsinki, Finland, 20–24 August 2012; pp. 219–223. [Google Scholar] [CrossRef]
  79. Nicholson, S. A RECIPE for Meaningful Gamification. In Gamification in Education and Business; Reiners, T., Torstenand Wood, L.C., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 1–20. [Google Scholar] [CrossRef]
  80. Johnson, D.; Deterding, S.; Kuhn, K.A.; Staneva, A.; Stoyanov, S.; Hides, L. Gamification for health and wellbeing: A systematic review of the literature. Internet Interv. 2016, 6, 89–106. [Google Scholar] [CrossRef]
  81. Pløhn, T.; Aalberg, T. Using gamification to motivate smoking cessation. In Proceedings of the European Conference on Games Based Learning, Steinkjer, Norway, 8–9 October 2015; p. 431. [Google Scholar]
  82. Simões, J.; Redondo, R.D.; Vilas, A.F. A social gamification framework for a K-6 learning platform. Comput. Hum. Behav. 2013, 29, 345–353. [Google Scholar] [CrossRef]
  83. Lister, M. Gamification: The effect on student motivation and performance at the post-secondary level. Issues Trends Educ. Technol. 2015, 3, 2. [Google Scholar] [CrossRef]
  84. Abramovich, S.; Schunn, C.; Higashi, R.M. Are badges useful in education?: It depends upon the type of badge and expertise of learner. Educ. Technol. Res. Dev. 2013, 61, 217–232. [Google Scholar] [CrossRef]
  85. Ahn, J.; Pellicone, A.; Butler, B.S. Open badges for education: What are the implications at the intersection of open systems and badging? Res. Learn. Technol. 2014, 22, 563. [Google Scholar] [CrossRef]
  86. Domínguez, A.; Saenz-de Navarrete, J.; De-Marcos, L.; Fernández-Sanz, L.; Pagés, C.; Martínez-Herráiz, J.J. Gamifying learning experiences: Practical implications and outcomes. Comput. Educ. 2013, 63, 380–392. [Google Scholar] [CrossRef]
  87. Hanus, M.D.; Fox, J. Assessing the effects of gamification in the classroom: A longitudinal study on intrinsic motivation, social comparison, satisfaction, effort, and academic performance. Comput. Educ. 2015, 80, 152–161. [Google Scholar] [CrossRef]
  88. Goehle, G. Gamification and Web-based Homework. PRIMUS 2013, 23, 234–246. [Google Scholar] [CrossRef]
  89. Almaliki, M. Misinformation-Aware Social Media: A Software Engineering Perspective. IEEE Access 2019, 7, 182451–182458. [Google Scholar] [CrossRef]
Figure 1. Dataset creation.
Figure 1. Dataset creation.
Machines 13 00258 g001
Figure 2. Main system phases and steps.
Figure 2. Main system phases and steps.
Machines 13 00258 g002
Figure 3. Common audio features.
Figure 3. Common audio features.
Machines 13 00258 g003
Figure 4. Example of a Mel Spectrogram of a bad CV joint.
Figure 4. Example of a Mel Spectrogram of a bad CV joint.
Machines 13 00258 g004
Figure 5. Two-dimensional data projection using t-SNE for compacted features extracted from DB1.
Figure 5. Two-dimensional data projection using t-SNE for compacted features extracted from DB1.
Machines 13 00258 g005
Figure 6. Steps for Bayesian-Optimized Weighted Soft Voting.
Figure 6. Steps for Bayesian-Optimized Weighted Soft Voting.
Machines 13 00258 g006
Table 1. Comparison between literature review papers.
Table 1. Comparison between literature review papers.
StudyDomains of UsageUnderlying MethodologiesDatasetSizeTypeClassificationAccuracyProblems
[15]VehiclesMLRecordings351Car sounds15 classes92%Performance depends on the quality and representativeness of the training data.
[16]VehiclesMLRecordings555Car sounds15 classes98.6%Difficult to adapt to new or unexpected problems; require extensive manual tuning.
[17]VehiclesNNRecordings2Two CarsBinary R 2 = 0.99 Limited to the specific sound quality factors considered; may not capture subjective aspects of sound quality; requires labeled data for training.
[18]VehiclesMLRecordings280Car sounds6 classes92.17%Sensitive to the choice of hidden nodes; require careful tuning; may not capture all relevant information.
[19]VehiclesNNRecordings417Car soundsBinary F 1 = 0.86 Sensitive to noise and may not generalize well to unseen engine types; require careful parameter tuning.
[20]EnginesDLRecordings100Vehicles/ Induction motors6 classes F 1 = 0.95 Time-frequency transformation can be computationally expensive; performance depends on the effectiveness of the adaptive fault feature band extraction.
[21]IndustrialMLMIMII5101Machine sounds5 classes95%Handcrafted features may not capture all relevant information; performance may be limited by the quality of the feature selector.
[22]Rolling bearingVAE-CNNCWRU2048Rolling bearing10 classes96.62%VAE-CNN models can be complex and computationally expensive to train; require a large amount of data to achieve good performance.
[24]RailwayMLRecordings1600Turnout switch machine10 classes98%Eigenmode Decomposition can be computationally expensive; SVMs can be sensitive to the choice of kernel and parameters.
[25]RailwayMLRecordings25,000Commuter train datasetBinary97.04%MFCCs may not capture all relevant information in the sound signal; MLP classifiers can be prone to overfitting.
[26]EPS motorsLSTM-AERecordings29,759Rolling bearingBinary99.2%LSTM-AE models can be complex and computationally expensive to train; performance depends on the quality of the reconstruction error metric.
[27]VehiclesCNNNASA Bearing Dataset9463Engine Ball BearingBinary91%CNN models can be data-hungry and computationally expensive to train; performance depends on the choice of network architecture and hyperparameters.
[28]Robotic manipulatorCNNRecordings181MotorsBinary92.34%Custom CNN models may be difficult to generalize to other robotic manipulators; performance depends on the quality of the custom dataset.
[29]IndustrialAEMIMII5101Machine soundsBinary96.51%Denoising autoencoders may require careful tuning of the noise level; performance depends on the characteristics of the industrial noise.
[30]IndustrialCNNRecordings60 stem filesEngine
Ball Bearing
4 classes99.58%Audio source isolation can be challenging in complex environments; 1D-CNN models may not capture all relevant spatial information in the sound field.
[31]VehiclesDLRecordings153Engine Sounds KnockingBinary90%LSTM models can be computationally expensive; performance depends on the quality of the frequency modulation amplitude demodulation features.
[32]Diesel enginesCNNDEFault3500Engine Sounds4 classes99.37%ANN performance can be sensitive to parameter initialization and network architecture; CNN performance dependent on the noise level; dataset size.
[33]New energy vehiclesMLRecordingsN/AEngine SoundsBinary90%Wavelet transforms can be computationally expensive; SVMs can be sensitive to the choice of kernel and parameters.
[34]Diesel enginesMixed attentionN/AN/AEngine SoundsBinary98.17%Complexity, computational cost, reliance on single-cylinder data for whole-machine diagnosis.
[35]Rolling bearingAE with Self-AttentionCWRUN/ABearing faultN/A97.3%Complexity, computational cost, potential overfitting, and reliance on specific datasets.
[36]IndustrialMLMaFaulDa1951Machine sounds6 classes99.54%Complexity, computational cost, difficulty generalizing to new datasets.
[37]IndustrialMLMIMII5101Machine
sounds
4 classes93%Possible overfitting for limited datasets, computational cost for
multiclass integrations, and difficulty generalizing to new datasets.
ToyADMOSN/A3 classes98%
Table 2. DB1 labels and file counts.
Table 2. DB1 labels and file counts.
LabelsCountLabelsCount
Bad Wheel Bearing21Squeaky Belt4
Universal Joint Failure or Steering Rack Failure10Seized Engine4
Knocking5Pre-Ignition4
Wheel Bearing, Transmission Whining Noise and Catalytic Converter Issues5Bad Transmission4
Bad CV Joint4Strut Mount Failure4
Radiator Fan Failure4Lose Exhaust Shield4
Turning Front End Clicking Bad CV Axle4Lifter Ticking4
Steering Noise4Flooded Engin4
Steering Groaning Whining Low Power Steering Fluid4Engine Rattle Noise4
Squeaky Brake/Grinding Brake4Engine Misfire4
Muffler Running Loud Exhaust Leak4Thrown Rod4
Clunking Over Bumps Bad Stabilizer Link Noise4Suspension Arm Fault4
Engine Chirping/Squealing Belt4Vacuum Leak4
Fuel Pump Cartridge Fault4Total (27)133
Table 3. DB2 labels and file counts.
Table 3. DB2 labels and file counts.
GroupLabelsCountGroupLabelsCountGroupLabelsCount
AnimalsCats200Vehicles and TransportationCar Crashes103Emergency VehiclesPolice Car Siren41
Sheep80Fire Truck Siren37
Bear68Car Horn24Ambulance Siren30
Dog68Motorcycle20Construction and MachineryDrilling24
Monkey60Bus20
Lions48Bike20Weapons and ExplosionsGunshot14
Wolf47Train20
Horse40Truck20Total (22)1031
Mouse28Truck Horn19
Table 4. Features extracted in the study.
Table 4. Features extracted in the study.
Features DomainFeatureDefinitionCountUsage
Time-DomainZero-Crossing RateHow often the signal changes from positive to negative.2E
RMSEMeasures the energy of the audio signal.2E
Frequency-Domain
(Spectral Features)
Spectral CentroidIndicates where the center of mass of the spectrum is located.2E
Spectral BandwidthMeasures the width of the spectrum around its centroid.2E
Spectral Roll-offThe frequency below which 85% of the total spectral energy resides.2E
Spectral ContrastRefers to the difference in amplitude between peaks and valleys in the spectrum of an audio signal.14E
Time-FrequencyMel SpectrogramA visual representation of the frequency spectrum of an audio signal over time.2C
Chroma FeaturesRepresent the energy distribution across the 12 different pitch classes (notes) of the Western music scale.24C & E
MFCCsCapture spectral features related to the timbre of audio. They represent the short-term power spectrum of sound.26C & E
MFCCs Delta26E
MFCCs Delta226E
Table 5. Hyperparameters for each model and their values.
Table 5. Hyperparameters for each model and their values.
ModelHyperparameterValueModelHyperparameterValue
SGDAlgorithmCN2 Rule InducerRule orderingOrdered
Classification loss functionHingeCovering algorithmExclusive
Regression loss functionSquared LossRule search
Regularization methodRidge (L2)Evaluation measureEntropy
Regularization strength ( α )0.00001Beam width5
Learning parametersRule filtering
Learning rateConstantMin. rule coverage1
Initial learning rate0.01Max. rule length5
Neural NetworkNo. of hidden neurons100SVMCost1
ActivationReLURegression loss ( ϵ )0.1
SolverSGDKernelRBF
Regularization ( α )0.0001Gamma ( γ )Auto
No. of iterations1000Numerical tolerance0.001
TrainingReplicableIteration limit100
Decision TreeTypeInduce binary treeAdaBoostBase estimatorTree
Min. instances in leaves2No. of estimators50
Split subsets>5Learning rate1
Max. tree depth100Classification algorithmSAMME
Stop when majority95%Regression loss functionExponential
Random ForestNo. of trees10Logistic RegressionRegularization typeLasso (L1)
Split subsets>5Strength1
kNNNo. of neighbors5Gradient BoostingNo. of trees100
MetricEuclideanLearning rate0.1
WeightUniformTree depth3
Naive BayesDefault SettingsTraining instances1
Table 6. Performance metrics of DB1 with compacted features.
Table 6. Performance metrics of DB1 with compacted features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
LR13.7240.060.9650.8650.8630.8770.8650.8590.9950.844
NN5.4660.1290.9750.8420.8330.8540.8420.8340.9890.852
SGD0.3870.1180.890.7890.7830.7920.7890.7790.9887.271
AdaBoost4.1020.1640.9120.7220.7060.7110.7220.7070.9843.834
kNN0.0970.2430.9640.7140.6940.7530.7140.7040.9881.715
RF0.2550.060.9460.6840.6710.7070.6840.6680.9832.047
SVM0.4680.1360.920.6390.60.6630.6390.6280.9521.909
NB0.1680.1040.9750.6390.5710.5760.6390.640.9895.678
DT0.63700.8340.6240.6180.6730.6240.6060.9810.742
GB36.6810.0810.8340.5040.5090.6120.5040.4720.9515.033
CN271.0230.0640.680.4060.4020.4350.4060.3760.9682.848
Table 7. Performance metrics of DB1 with expanded features.
Table 7. Performance metrics of DB1 with expanded features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
LR131.7350.1510.9260.6920.6710.6690.6920.6760.981.837
SGD0.60.2990.8120.6470.6380.6530.6470.6260.97412.205
NN5.7680.3160.9320.6240.6140.6250.6240.6030.9771.519
AdaBoost13.6980.2590.8740.6240.6110.6550.6240.6020.9714.449
DT1.31100.8270.6090.6060.6590.6090.5920.98311.262
NB0.4130.1570.9650.5940.520.480.5940.5940.98710.384
RF0.4150.140.9150.5860.5680.6060.5860.5610.9672.75
kNN0.2170.320.90.4590.4210.4110.4590.4310.9654.502
GB73.660.170.8230.4360.4280.4770.4360.3980.9455.396
CN2160.2730.1240.6830.4060.4140.4710.4060.3720.9632.856
SVM0.8340.3060.8110.3230.2240.2140.3230.3110.8832.534
Table 8. Performance metrics of DB2 with compact features.
Table 8. Performance metrics of DB2 with compact features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
NN25.5260.1430.990.8840.8840.8860.8840.8740.9920.462
LR132.9680.0710.9790.8470.8440.8450.8470.8340.9890.71
AdaBoost27.6210.2360.9660.8450.8430.8440.8450.8310.9892.206
SVM3.5190.3360.9870.8310.8270.8440.8310.8170.9850.636
RF0.6590.0790.9750.8310.8280.8360.8310.8170.9871.371
SGD0.7450.1690.8940.8070.7980.8030.8070.7910.9866.667
kNN0.1460.3560.9660.8060.8020.8110.8060.790.9892.09
GB232.3450.2240.9760.8050.8050.8140.8050.7880.9850.933
DT2.4520.0020.8760.7420.7420.7470.7420.720.9837.533
CN2666.8810.1020.8870.6820.680.6860.6820.6540.9771.818
NB0.3480.0760.9330.5770.5830.6910.5770.5560.9825.333
Table 9. Performance metrics of DB2 with expanded features.
Table 9. Performance metrics of DB2 with expanded features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
AdaBoost64.350.4880.9690.8440.8410.8440.8440.830.9892.032
NN32.2430.4350.9830.840.8390.8420.840.8260.9890.7
GB539.1370.3360.9790.8350.8340.8420.8350.8210.9870.844
SGD1.4180.3040.9030.8220.8190.8190.8220.8060.9886.164
LR191.8230.1980.980.8150.8110.8170.8150.7990.9870.707
RF1.0040.1610.9720.8120.8050.8120.8120.7960.9861.485
SVM6.3110.5880.9820.790.780.7920.790.7720.9830.732
DT5.39100.8710.7260.7260.7320.7260.7020.9827.822
kNN0.2870.4460.9410.6960.6810.680.6960.670.9793.288
CN21256.6120.180.8930.6870.6880.6940.6870.6590.9791.8
NB0.620.1860.950.6370.6460.7290.6370.6180.9856.159
Table 10. Performance metrics of DB3 with compact features.
Table 10. Performance metrics of DB3 with compact features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
NN33.2060.1560.9880.8550.8510.8570.8550.8450.9930.6
LR498.0420.1320.9770.810.8050.810.810.7970.990.879
AdaBoost52.8260.3260.9550.7860.7790.7820.7860.7720.992.775
kNN0.1180.3680.9660.7660.7580.7680.7660.7510.992.205
RF0.930.0810.9620.7610.7490.7580.7610.7450.9862.198
SVM5.950.5130.9850.7590.7340.7320.7590.7440.9841.006
GB1989.6760.2190.940.7370.7350.7540.7370.720.9842.304
SGD0.9440.1290.8580.7350.720.7240.7350.7170.9889.169
DT3.69600.8580.6760.670.6730.6760.6540.9848.976
CN2993.3240.0850.8530.6050.6060.6130.6050.5780.9812.666
NB0.2550.0570.9370.2060.1950.3590.2060.2070.99614.929
Table 11. Performance metrics of DB3 with expanded features.
Table 11. Performance metrics of DB3 with expanded features.
ModelTrainTestAUCCAF1PrecRecallMCCSpecLogLoss
AdaBoost126.2290.4460.9530.7990.7920.8040.7990.7860.9892.525
NN38.350.3340.9810.780.7730.7840.780.7650.9880.96
RF1.370.2520.9620.7720.7550.7530.7720.7570.9872.217
SGD2.0120.3110.8740.7640.7550.7530.7640.7480.9918.16
LR401.9360.1670.9720.7410.7270.730.7410.7230.9861.024
GB3372.3780.3050.9290.7170.7170.7410.7170.6990.9842.674
SVM10.610.8060.9760.7030.670.670.7030.6840.9791.196
DT7.7430.0020.8430.6550.6530.660.6550.6320.9859.877
kNN0.2610.4640.9270.6310.6120.6090.6310.6060.9814.366
CN21853.4440.1830.8640.6210.6190.6280.6210.5960.9832.614
NB0.4510.2760.9320.0680.0330.0650.0680.0690.99729.18
Table 12. Feature ranking for compact feature list applied on DB1.
Table 12. Feature ranking for compact feature list applied on DB1.
FeaturesIGANOVAReliefFFCBFFeaturesIGANOVAReliefFFCBF
TypeNameTypeName
mfccmean_31.530810.36790.11380.8767chromagrammean_21.01315.09410.04020.4475
mfccmean_21.449110.62470.11710.7928chromagrammean_41.008712.99940.07020.4447
mel_spectrogramstd1.39371.16540.01030.0001chromagrammean_100.99876.05280.0490.4384
mfccmean_41.373422.94510.1070.7215chromagramstd_100.98596.15120.0470
mfccmean_11.349816.78390.10060.7004chromagramstd_80.98577.75650.05780
mfccmean_61.341812.01470.08320.6934chromagramstd_110.97165.33610.04170.4214
mfccmean_101.291525.82260.07370.6505chromagramstd_60.96975.49780.03060.4203
mfccmean_71.29118.4280.05880.6501chromagramstd_40.95297.78050.04210.41
mfccmean_51.285717.40850.0850.6457chromagramstd_30.95224.03280.02160.4096
mfccmean_111.235917.1770.07050.6056mfccstd_40.94625.29360.0550.406
mel_spectrogrammean1.219417.67410.0750.5926chromagramstd_50.92894.25780.03420
mfccmean_81.202814.38870.05510.5799chromagramstd_20.90777.52260.03970
mfccmean_01.22.72370.0190.5778mfccstd_10.8743.20420.03820.3637
chromagrammean_71.17768.51380.06780.561mfccstd_30.86572.91550.03730.359
mfccmean_91.170911.23580.04940.556chromagramstd_10.86546.83580.04180.3588
mfccmean_121.168218.09020.09140.554mfccstd_20.85162.97430.04030.3511
chromagrammean_81.152610.72710.06230.5426mfccstd_70.80778.0990.04460.3271
chromagrammean_01.12979.61630.05970.5262mfccstd_90.79957.60180.03280.3227
chromagrammean_51.10437.34690.06480.5083chromagramstd_00.79546.620.05220.3205
chromagrammean_61.09976.32850.05550.5051mfccstd_60.77914.74040.04130.3119
chromagrammean_91.08127.66790.05420mfccstd_80.77195.57750.04590.3082
chromagrammean_11.07138.76430.06050.4857mfccstd_50.7364.82130.04170.2897
chromagrammean_31.06655.88730.04250.4825mfccstd_120.6692.85050.04640.2565
chromagramstd_71.05276.24750.03760.4733mfccstd_110.66651.7250.03120.2553
chromagramstd_91.01888.54980.0560.4512mfccstd_00.66070.4204−0.00350.2525
chromagrammean_111.01625.48590.0610mfccstd_100.64272.64430.05560.244
Table 13. Metrics of datasets after ensemble optimization.
Table 13. Metrics of datasets after ensemble optimization.
iterDB1 DatasetDB2 DatasetDB3 Dataset
W1W2W3FeaturesAccuracyW1W2W3FeaturesAccuracyW1W2W3FeaturesAccuracy
10.950.730.60240.88790.950.730.6023.750.87880.950.730.6027.490.8557
20.160.060.87220.84290.160.060.8721.560.83320.160.060.8723.120.8058
30.710.020.97260.87970.710.020.9726.010.79340.710.020.9732.020.7861
40.210.180.18280.87970.210.180.1828.320.87390.210.180.1836.650.8668
50.520.430.29230.89510.520.430.2923.040.87780.520.430.2926.080.8488
60.140.290.37260.88020.140.290.3726.120.87970.140.290.3732.240.8685
70.790.200.51250.86480.790.200.5124.560.8390.790.200.5129.120.817
80.050.610.17260.87970.050.610.1725.920.88850.050.610.1731.850.8556
90.950.970.81210.88130.950.970.8120.650.870.950.970.8121.30.8419
100.100.680.44230.91040.100.680.4423.050.88070.100.680.4426.090.8556
110.010.960.14230.90330.930.680.6223.810.87980.170.050.1936.680.8341
120.050.850.89230.90330.010.910.1826.570.88070.020.600.7220.980.8453
130.020.990.72230.87310.020.970.9723.870.87680.040.470.3520.090.8496
140.130.690.48230.91040.150.040.0226.950.83120.710.990.3739.420.8668
150.020.290.33230.90330.281.000.0223.640.88170.160.800.1029.430.8591
160.150.070.98230.82750.010.010.1323.540.82540.710.451.0038.770.8496
170.250.880.41240.90270.590.860.9923.080.88460.880.230.3735.50.8281
180.220.500.05230.90270.870.990.2722.60.88070.620.250.7723.870.8109
190.140.991.00240.85770.260.330.3829.40.87780.460.980.5320.130.8487
200.620.990.15230.90270.210.990.8928.780.87680.530.300.0234.850.853
210.990.990.11230.86480.980.930.1528.950.87290.530.260.5337.740.8513
220.970.010.07220.850.970.010.0722.060.78470.970.010.0724.120.7294
230.090.050.84240.81260.840.050.9828.970.81380.090.050.8428.970.8445
240.960.970.30270.87310.020.890.0129.140.87680.800.920.9721.660.847
250.850.410.89300.8720.020.990.6329.950.87680.290.690.6727.290.8582
260.020.860.08200.88850.700.760.0129.940.87880.610.900.7639.740.8685
270.850.030.24200.85050.010.980.2928.040.87680.750.040.4034.280.8032
280.100.040.94270.80440.230.950.26200.87680.720.300.5436.910.8419
290.120.960.08290.87250.190.130.0029.990.8730.810.690.6121.690.8402
300.020.940.94200.88850.290.021.0020.010.76820.970.730.0435.850.859
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Rashed, A.; Abdulazeem, Y.; Farrag, T.A.; Bamaqa, A.; Almaliki, M.; Badawy, M.; Elhosseini, M.A. Toward Inclusive Smart Cities: Sound-Based Vehicle Diagnostics, Emergency Signal Recognition, and Beyond. Machines 2025, 13, 258. https://doi.org/10.3390/machines13040258

AMA Style

Rashed A, Abdulazeem Y, Farrag TA, Bamaqa A, Almaliki M, Badawy M, Elhosseini MA. Toward Inclusive Smart Cities: Sound-Based Vehicle Diagnostics, Emergency Signal Recognition, and Beyond. Machines. 2025; 13(4):258. https://doi.org/10.3390/machines13040258

Chicago/Turabian Style

Rashed, Amr, Yousry Abdulazeem, Tamer Ahmed Farrag, Amna Bamaqa, Malik Almaliki, Mahmoud Badawy, and Mostafa A. Elhosseini. 2025. "Toward Inclusive Smart Cities: Sound-Based Vehicle Diagnostics, Emergency Signal Recognition, and Beyond" Machines 13, no. 4: 258. https://doi.org/10.3390/machines13040258

APA Style

Rashed, A., Abdulazeem, Y., Farrag, T. A., Bamaqa, A., Almaliki, M., Badawy, M., & Elhosseini, M. A. (2025). Toward Inclusive Smart Cities: Sound-Based Vehicle Diagnostics, Emergency Signal Recognition, and Beyond. Machines, 13(4), 258. https://doi.org/10.3390/machines13040258

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop