Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (198)

Search Parameters:
Keywords = Mel frequency cepstral coefficients (MFCC) features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 2057 KB  
Article
Comparative Analysis of Feature Extraction Methods for ECG Arrhythmia Classification Using Ensemble Learning
by Victor Adeleye and Mahmoud Elbattah
BioMedInformatics 2026, 6(3), 33; https://doi.org/10.3390/biomedinformatics6030033 - 27 May 2026
Viewed by 109
Abstract
Electrocardiogram (ECG) arrhythmia classification remains critical for automated cardiac diagnosis, yet feature extraction methods are frequently adopted without systematic comparative evaluation. This study presents a controlled comparative analysis of four signal processing techniques—Mel-Frequency Cepstral Coefficients (MFCC), Discrete Wavelet Transform (DWT), Hilbert–Huang Transform (HHT), [...] Read more.
Electrocardiogram (ECG) arrhythmia classification remains critical for automated cardiac diagnosis, yet feature extraction methods are frequently adopted without systematic comparative evaluation. This study presents a controlled comparative analysis of four signal processing techniques—Mel-Frequency Cepstral Coefficients (MFCC), Discrete Wavelet Transform (DWT), Hilbert–Huang Transform (HHT), and Synchrosqueezing Wavelet Transform (SSWT)—for ECG feature extraction. Using the MIT-BIH Arrhythmia Database with ANSI/AAMI EC57:1998 standard mapping, we trained Cascade Forest classifiers on each feature set under identical preprocessing and SMOTE-based class balancing conditions to ensure a fair comparison. DWT features achieved superior performance (accuracy: 98.79%, macro-F1: 92.93%, precision: 94.39%) compared to MFCC (88.30% macro-F1), SSWT (84.54% macro-F1), and HHT (83.59% macro-F1), particularly for clinically challenging minority arrhythmia classes. However, DWT’s performance advantage incurred substantial computational cost (10,050 s), while MFCC provided competitive results with a 62% lower computational burden. These findings provide evidence-based guidance for feature extraction method selection in interpretable ECG classification systems, demonstrating critical performance-efficiency trade-offs relevant to clinical deployment contexts. Full article
Show Figures

Figure 1

25 pages, 1430 KB  
Article
Acoustic Signatures of Hive: Detecting Queen Bee Absence Through Machine Learning of Short Audio Segments
by Pablo Ormeño-Arriagada, Cristopher Jiménez, Ramón Arias Gilart, Daniel Ramírez and Karen Yañez
Insects 2026, 17(6), 547; https://doi.org/10.3390/insects17060547 - 25 May 2026
Viewed by 218
Abstract
Honeybee population decline poses a serious threat to global biodiversity and agricultural productivity, underscoring the need for continuous and non-invasive hive monitoring solutions. In particular, early detection of queen absence is critical for maintaining colony viability. This study investigates the effectiveness of machine [...] Read more.
Honeybee population decline poses a serious threat to global biodiversity and agricultural productivity, underscoring the need for continuous and non-invasive hive monitoring solutions. In particular, early detection of queen absence is critical for maintaining colony viability. This study investigates the effectiveness of machine learning and deep learning models for acoustic-based queen-presence detection using short-duration hive audio recordings. Audio data collected from multiple sources were processed to extract spectrogram, Mel-spectrogram, and Mel-frequency cepstral coefficient features, which were evaluated using classical ML classifiers and convolutional neural networks. Experimental results indicate that MFCC-based representations consistently outperform spectrogram-based features across segment lengths, achieving higher accuracy and greater stability. The best performance was obtained with Mel features using convolutional neural networks for short segments and gradient-boosted models for longer windows. These findings demonstrate that brief acoustic segments are sufficient for reliable classification, supporting real-time monitoring under realistic urban recording conditions with moderate environmental noise. The proposed approach offers a scalable and low-cost framework for precision beekeeping and contributes to sustainable beekeeping through early, automated anomaly detection. The proposed framework supports real-time, low-cost deployment scenarios, enabling scalable precision apiculture solutions. Full article
(This article belongs to the Special Issue Biology and Conservation of Honey Bees)
Show Figures

Graphical abstract

24 pages, 3864 KB  
Article
Machine Learning Approaches to Early Detection of Parkinson’s Disease Using Speech Analysis Technique
by Mohammad Amran Hossain, Enea Traini and Francesco Amenta
Neurol. Int. 2026, 18(5), 88; https://doi.org/10.3390/neurolint18050088 - 10 May 2026
Viewed by 221
Abstract
Background: Parkinson’s disease (PD) is a progressive neurodegenerative disorder that affects millions globally, particularly those in the elderly population. Several occupational exposures typical of maritime environments are recognized or suspected risk factors for PD, warranting attention within occupational health frameworks. The disease is [...] Read more.
Background: Parkinson’s disease (PD) is a progressive neurodegenerative disorder that affects millions globally, particularly those in the elderly population. Several occupational exposures typical of maritime environments are recognized or suspected risk factors for PD, warranting attention within occupational health frameworks. The disease is characterized by motor symptoms such as tremor, rigidity, and bradykinesia, as well as non-motor impairments including speech abnormalities. Objective: Early diagnosis is crucial for effective disease management but remains challenging due to symptoms overlapping with normal aging and other neurological conditions. This study presents a machine learning (ML)-based approach for the early diagnosis of PD using speech signal analysis. Methods: We employed six supervised ML classifiers to differentiate between PD patients and healthy controls based on vocal features. The experimental dataset, MDVR-KCL, consists of speech recordings from both reading tasks and spontaneous dialogs, collected via mobile devices. From these recordings, we extracted Mel-Frequency Cepstral Coefficients (MFCCs), Gammatone Frequency Cepstral Coefficients (GTCCs), and acoustic features such as jitter, shimmer, and harmonic-to-noise ratio. These features capture a broad range of prosodic, spectral, and articulatory characteristics associated with PD-related speech impairments. Speaker diarization was applied in spontaneous dialog recordings to separate participant speech. Hyperparameter tuning was performed using GridSearchCV with 10-fold cross-validation, while final model evaluation was conducted using Leave-One-Subject-Out Cross-Validation (LOSOCV) to ensure subject-independent performance assessment. Results: In the read-text task, the SVM model performed exceptionally, yielding 95.45% accuracy, 94.62% sensitivity, 95.97% specificity, an F1-score of 94.12%, and an AUC of 0.98 with an MCC value of 0.90, for GTCCs with the acoustic features. In the spontaneous dialog task, the XGB model demonstrated the highest overall performance across all metrics, with a test accuracy of 83.7%, a sensitivity of 76.3.9%, a specificity of 88.9%, an F1-score of 79.5%, an AUC value of 0.88, and an MCC value of 0.66. Conclusions: Comparable results were obtained on both spontaneous dialog and reading speech subsets, demonstrating the robustness of the approach across different speaking contexts. These results demonstrate the effectiveness of integrating cepstral and acoustic features with machine learning models for non-invasive PD classification. The findings support the use of speech-based digital biomarkers in early PD detection and highlight the potential for developing scalable tools. This work highlights the potential of speech-based digital diagnostics to support clinical decision-making and improve patient outcomes. Full article
(This article belongs to the Collection Advances in Neurodegenerative Diseases)
Show Figures

Figure 1

23 pages, 5016 KB  
Article
Audio-Based Characterization of Gait Parameters in Mangalarga Marchador, Campolina, and Piquira Horses Using Deep Learning
by Alan Freire, Alisson Vitor da Silva, Laura Patterson Rosa, Paulo Henrique Sales Guimarães, Brennda Paula Gonçalves Araujo, Carlos Augusto Freitas Silva, Larissa Raffaela Trindade Borges, Antônio Gilberto Bertechini and Sarah Laguna Conceição Meirelles
Animals 2026, 16(9), 1283; https://doi.org/10.3390/ani16091283 - 22 Apr 2026
Viewed by 430
Abstract
The evaluation of biomechanical parameters in four-beat gaited horses remains limited by the subjectiveness and complexity of current standard methods. Through a deep learning approach, we aimed to infer dissociation % using only acoustic signals. A total of 268 audio samples were extracted [...] Read more.
The evaluation of biomechanical parameters in four-beat gaited horses remains limited by the subjectiveness and complexity of current standard methods. Through a deep learning approach, we aimed to infer dissociation % using only acoustic signals. A total of 268 audio samples were extracted from publicly available videos featuring three Brazilian horse breeds (Mangalarga Marchador, Campolina, and Piquira) performing marcha batida and marcha picada. Acoustic features, including root mean square energy (RMS), zero-crossing rate (ZCR), and 13 Mel-frequency cepstral coefficients (MFCCs), were extracted and used to train a long short-term memory (LSTM) neural network. The model accurately predicted the time intervals between successive hoof–ground contacts (R2 = 0.98; MAE = 0.0071), enabling the calculation of the dissociation %. While no significant differences were found between gait types and dissociation %, breed-related differences in both mean hoof–ground contact interval and dissociation were observed, with 8 acoustic features demonstrating discriminative power. Our results suggest that hoof–ground contact patterns can be quantified objectively from audio alone, offering a practical and non-invasive method for gait analysis. The approach holds potential for applications in breed standardization, selection, and digital locomotion phenotyping of horse populations. Full article
(This article belongs to the Section Equids)
Show Figures

Figure 1

24 pages, 12239 KB  
Article
Measurement Method for Mold Slag Thickness in Continuous Casting Mold Using Millimeter-Wave Radar and Eddy Current Sensors
by Yi An, Zhichun Wang and Junsheng Xiao
Sensors 2026, 26(7), 2141; https://doi.org/10.3390/s26072141 - 31 Mar 2026
Viewed by 544
Abstract
To address the existing challenges in mold slag thickness measurement—such as the susceptibility of contact sensors to high-temperature degradation and the limitation of non-contact methods to detecting only the upper slag surface—this study proposes an integrated approach that fuses millimeter-wave radar and eddy [...] Read more.
To address the existing challenges in mold slag thickness measurement—such as the susceptibility of contact sensors to high-temperature degradation and the limitation of non-contact methods to detecting only the upper slag surface—this study proposes an integrated approach that fuses millimeter-wave radar and eddy current sensors for measuring mold slag thickness in a continuous casting mold. The method innovatively combines two sensing principles: the millimeter-wave radar employs an improved FFT-CZT2 high-precision ranging algorithm to perform high-resolution scanning of the solid slag upper surface, reconstructing its topography (error: ±1 mm), while Mel-frequency cepstral coefficients (MFCC) are applied to extract features from the radar intermediate-frequency signals, combined with an enhanced PSO-BP neural network algorithm to predict the thickness of the solid slag layer (error: ±5 mm). Concurrently, an eddy current sensor monitors the liquid slag–molten steel interface position (error: ±1 mm). Through dual-sensor data fusion, the upper surface topography data and solid slag thickness obtained from the radar are spatially registered in three dimensions with the molten steel level information derived from the eddy current sensor. This integration ultimately enables the non-contact synchronous measurement of three key parameters within the mold: solid slag layer thickness, liquid slag layer thickness inversion, and molten steel level. Furthermore, by reconstructing the upper slag surface morphology, the method successfully resolves practical issues such as uneven material distribution, local material deficiency, or excessive feeding. Preliminary experimental verification confirms that the proposed method maintains stable performance even under high-temperature and complex environmental conditions. It thus provides a real-time, accurate, and full-cross-section monitoring solution for mold slag in continuous casting, offering significant practical value for the development of smart steel plants. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

24 pages, 1020 KB  
Article
Research on the Diagnosis of Abnormal Sound Defects in Automobile Engines Based on Fusion of Multi-Modal Images and Audio
by Yi Xu, Wenbo Chen and Xuedong Jing
Electronics 2026, 15(7), 1406; https://doi.org/10.3390/electronics15071406 - 27 Mar 2026
Viewed by 485
Abstract
Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. [...] Read more.
Against the global carbon neutrality target, predictive maintenance (PdM) of automotive engines represents a core technical strategy to advance the sustainable development of the automotive industry. Conventional single-modal diagnostic approaches for engine abnormal sound defects suffer from low accuracy and weak anti-interference capability. Existing multi-modal fusion methods fail to deeply mine the physical coupling between cross-modal features and often entail excessive model complexity, hindering deployment on resource-constrained on-board edge devices. To resolve these limitations, this study proposes a Physical Prior-Embedded Cross-Modal Attention (PPE-CMA) mechanism for lightweight multi-modal fusion diagnosis of engine abnormal sound defects. First, wavelet packet decomposition (WPD) and mel-frequency cepstral coefficients (MFCC) are integrated to extract time-frequency features from engine audio signals, while a channel-pruned ResNet18 is employed to extract spatial features from engine thermal imaging and vibration visualization images. Second, the PPE-CMA module is designed to adaptively assign attention weights to audio and image features by exploiting the physical coupling between engine fault acoustic and visual characteristics, enabling efficient cross-modal feature fusion with redundant information suppression. A rigorous theoretical derivation is provided to link cosine similarity with the physical correlation of engine fault acoustic-visual features, justifying the attention weight constraint (β = 1 − α) from the perspective of fault feature physical coupling. Third, an improved lightweight XGBoost classifier is constructed for fault classification, and a hybrid data augmentation strategy customized for engine multi-modal data is proposed to address the small-sample challenge in industrial applications. Ablation experiments on ResNet18 pruning ratios verify the optimal trade-off between diagnostic performance and computational efficiency, while feature distribution analysis validates the authenticity and effectiveness of the hybrid augmentation strategy. Experimental results on a self-constructed multi-modal dataset show that the proposed method achieves 98.7% diagnostic accuracy and a 98.2% F1-score, retaining 96.5% accuracy under 90 dB high-level environmental noise, with an end-to-end inference speed of 0.8 ms per sample (including preprocessing, feature extraction, and classification). Cross-engine and cross-domain validation on a 2.0T diesel engine small-sample dataset and the open-source SEMFault-2024 dataset yield average accuracies of 94.8% and 95.2%, respectively, demonstrating strong generalization. This method effectively enhances the accuracy and robustness of engine abnormal sound defect diagnosis, offering a lightweight technical solution for on-board real-time fault diagnosis and in-plant online quality inspection. By reducing engine fault-induced energy loss and spare parts waste, it further promotes energy conservation and emission reduction in the automotive industry. Quantified experimental data on fuel efficiency improvement and carbon emission reduction are provided to substantiate the ecological benefits of the proposed framework. Full article
Show Figures

Figure 1

21 pages, 4345 KB  
Article
Real-World Airborne Sound Analysis for Health Monitoring of Bearings in Railway Vehicles
by Matthias Kreuzer, David Schmidt, Simon Wokusch and Walter Kellermann
Sensors 2026, 26(6), 1947; https://doi.org/10.3390/s26061947 - 20 Mar 2026
Viewed by 407
Abstract
In this paper, the task of detecting bearing faults in railway vehicles during regular operation by analyzing acoustic (airborne sound) data is addressed. To that end, various features are studied, among which the Mel Frequency Cepstral Coefficients (MFCCs) are best suited for detecting [...] Read more.
In this paper, the task of detecting bearing faults in railway vehicles during regular operation by analyzing acoustic (airborne sound) data is addressed. To that end, various features are studied, among which the Mel Frequency Cepstral Coefficients (MFCCs) are best suited for detecting bearing faults by analyzing airborne sound. The MFCCs are used to train a Multi-Layer Perceptron (MLP) classifier. The proposed method is evaluated with real-world data for a state-of-the-art commuter railway vehicle in a dedicated measurement campaign. Classification results demonstrate that the chosen MFCC features allow for reliable detection of bearing damages, even for damages that were not included in training. Full article
Show Figures

Figure 1

26 pages, 3000 KB  
Article
Material Classification from Non-Line-of-Sight Acoustic Echoes Using Wavelet-Acoustic Hybrid Feature Fusion
by Dilan Onat Alakuş and İbrahim Türkoğlu
Sensors 2026, 26(5), 1577; https://doi.org/10.3390/s26051577 - 3 Mar 2026
Viewed by 590
Abstract
Acoustic material classification under non-line-of-sight (NLOS) conditions—where direct sound paths are obstructed—is a challenging task due to echo attenuation, complex reflections, and noise effects. This study aims to improve NLOS material recognition by introducing a novel wavelet–acoustic hybrid feature fusion method integrated with [...] Read more.
Acoustic material classification under non-line-of-sight (NLOS) conditions—where direct sound paths are obstructed—is a challenging task due to echo attenuation, complex reflections, and noise effects. This study aims to improve NLOS material recognition by introducing a novel wavelet–acoustic hybrid feature fusion method integrated with deep recurrent neural network architectures. Echo signals from nine different materials were collected using the newly developed ANLOS-R (Acoustic Non-Line-of-Sight Recognition) dataset, which was specifically designed to simulate realistic NLOS propagation environments. From these recordings, time-domain acoustic features and multi-scale wavelet-based energy and entropy statistics were extracted using ten wavelet families. The resulting 70-dimensional hybrid feature set was used to train several deep learning architectures, including Long Short-Term Memory (LSTM), Bidirectional LSTM (BiLSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network–LSTM (CNN–LSTM). Among these, the CNN–LSTM achieved the highest balanced accuracy and macro-F1 score of 0.99, showing strong generalization and convergence performance. SHapley Additive exPlanations (SHAP) analysis indicated that Mel-Frequency Cepstral Coefficients (MFCCs) and wavelet entropy–energy features play complementary roles in material discrimination. The proposed approach provides a robust and interpretable framework for real-time NLOS acoustic sensing, bridging data-driven deep learning with the physical understanding of acoustic material behavior. Full article
(This article belongs to the Section Sensor Materials)
Show Figures

Figure 1

24 pages, 2038 KB  
Article
Evaluating the Managerial Feasibility of an AI-Based Tooth-Percussion Signal Screening Concept for Dental Caries: An In Silico Study
by Stefan Lucian Burlea, Călin Gheorghe Buzea, Irina Nica, Florin Nedeff, Diana Mirila, Valentin Nedeff, Lacramioara Ochiuz, Lucian Dobreci, Maricel Agop and Ioana Rudnic
Diagnostics 2026, 16(4), 638; https://doi.org/10.3390/diagnostics16040638 - 22 Feb 2026
Viewed by 663
Abstract
Background: Early detection of dental caries is essential for effective oral health management. Current diagnostic workflows rely heavily on radiographic imaging, which involves infrastructure requirements, workflow coordination, and resource considerations that may limit frequent use in high-throughput or resource-constrained settings. These contextual factors [...] Read more.
Background: Early detection of dental caries is essential for effective oral health management. Current diagnostic workflows rely heavily on radiographic imaging, which involves infrastructure requirements, workflow coordination, and resource considerations that may limit frequent use in high-throughput or resource-constrained settings. These contextual factors motivate exploration of adjunct screening concepts that could support front-end triage decisions within existing care pathways. This study evaluates, in simulation, whether modeled tooth-percussion response signals contain sufficient discriminative information to justify further translational and managerial investigation. Implementation costs, workflow optimization, and economic outcomes are not evaluated directly; rather, the objective is to assess whether the technical preconditions for a potentially scalable screening concept are satisfied under controlled in silico conditions. Methods: An in silico model of tooth percussion was developed in which enamel, dentin, and pulp/root structures were represented as a simplified layered mechanical system. Impulse responses generated from simulated tapping were used to compute the modeled surface-vibration response (enamel-layer displacement), which served as a proxy for a measurable percussion-related signal (e.g., contact vibration), rather than a recorded acoustic waveform. Carious conditions were simulated through depth-dependent reductions in stiffness and effective mass and increases in damping to represent enamel and dentin demineralization. A synthetic dataset of labeled simulated signals was generated under varying structural parameters and measurement-noise assumptions. Machine-learning models using Mel-frequency cepstral coefficient (MFCC) features were trained to classify healthy teeth, enamel caries, and dentin caries at a screening (triage) level. Results: Under baseline simulation conditions, the classifier achieved an overall accuracy of 0.97 with balanced macro-averaged F1-score (0.97). Misclassifications occurred primarily between healthy and enamel-caries categories, whereas dentin-caries cases were most consistently identified. When measurement noise and structural variability were increased, performance declined gradually, reaching approximately 0.90 accuracy under the most challenging simulated scenario. These results indicate that discriminative information is present within the modeled signals at a screening (triage) level, meaning that higher-risk categories can be distinguished probabilistically rather than with definitive diagnostic certainty. Sensitivity and specificity trade-offs were not optimized in this study, as the objective was to assess separability rather than to define clinical decision thresholds. Conclusions: Within the constraints of the in silico model, simulated tooth-percussion response signals demonstrated discriminative patterns between healthy, enamel caries, and dentin caries categories at a screening (triage) level. These findings establish technical plausibility under controlled simulation conditions and support further investigation of percussion-based screening as a potential adjunct to clinical assessment. From a healthcare management perspective, the present results address a prerequisite question—whether such signals contain sufficient information to justify translational research, rather than demonstrating workflow optimization, cost reduction, or system-level impact. Clinical validation, threshold optimization, and implementation studies are required before managerial or operational benefits can be evaluated. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

36 pages, 14856 KB  
Article
Multi-Source Fusion CNN-RF Framework for Intelligent Fault Diagnosis of Head Sheave Devices in Mining Hoists
by Chi Ma, Jian Fei, Zhiyuan Shi, Md Abdur Rob, Md Ashraful Islam and Md Habibullah
Machines 2026, 14(2), 244; https://doi.org/10.3390/machines14020244 - 21 Feb 2026
Cited by 1 | Viewed by 574
Abstract
Accurate fault diagnosis of mining hoisting head sheave systems is critical for ensuring operational safety in harsh underground environments. This study proposes a multi-source fault diagnosis framework that fuses vibration and acoustic information using a Convolutional Neural Network and Random Forest (CNN-RF). To [...] Read more.
Accurate fault diagnosis of mining hoisting head sheave systems is critical for ensuring operational safety in harsh underground environments. This study proposes a multi-source fault diagnosis framework that fuses vibration and acoustic information using a Convolutional Neural Network and Random Forest (CNN-RF). To support mechanism understanding and validate the experimental platform, finite element and multi-body dynamics simulations (ANSYS/ADAMS) are employed for physical verification and fault signature analysis, while the CNN-RF model is trained and tested exclusively using experimentally acquired vibration and acoustic data. For feature construction, vibration signals are transformed into time–frequency representations (including STFT, CWT, and generalized S-Transform (GST)), and acoustic signals are characterized using Mel-Frequency Cepstral Coefficients (MFCCs). Experimental results demonstrate that vibration–acoustic fusion improves diagnostic performance compared with single-modality baselines; the best performance is achieved by GST+MFCC with the proposed CNN-RF classifier, reaching an accuracy of 98.96%. Future work will conduct cross-condition validation under varying speeds and loads and investigate missing-modality robustness to further assess generalization and deployment reliability. Full article
Show Figures

Figure 1

21 pages, 1582 KB  
Article
Tile Debonding Detection Based on Acoustic Signal Features and a Dual-Branch Convolutional Neural Network
by Dejiang Wang and Bo Kang
Buildings 2026, 16(4), 870; https://doi.org/10.3390/buildings16040870 - 21 Feb 2026
Viewed by 566
Abstract
Tiles are commonly used as architectural finishing materials, but are prone to debonding defects due to construction and environmental factors in engineering applications. Therefore, effective detection of tile debonding holds significant engineering relevance. This study proposes a tile debonding detection method based on [...] Read more.
Tiles are commonly used as architectural finishing materials, but are prone to debonding defects due to construction and environmental factors in engineering applications. Therefore, effective detection of tile debonding holds significant engineering relevance. This study proposes a tile debonding detection method based on impact sound signal features and a dual-branch convolutional neural network. The sound signals collected through tapping are transformed into two types of two-dimensional feature maps using Mel-frequency cepstral coefficients (MFCCs) and continuous wavelet transform (CWT), which are then fed in parallel into the dual-branch convolutional neural network for feature extraction and fusion. Finally, tile debonding classification is performed in the classifier module. Experimental results show that the proposed model achieves a classification accuracy of 98.5% under laboratory conditions. Moreover, it demonstrates strong robustness under varying noise levels and sound pressure conditions, maintaining an accuracy of 82% in a 75 dB human voice noise environment. Field validation in real-world engineering environments yields an accuracy of 91.5%. These findings indicate that the proposed method, which combines MFCC and CWT features with a dual-branch convolutional neural network architecture, enables high-precision identification of tile debonding defects. Full article
Show Figures

Figure 1

41 pages, 2850 KB  
Article
Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations
by Jack C. Johnson and Yue Rong
Sensors 2026, 26(2), 715; https://doi.org/10.3390/s26020715 - 21 Jan 2026
Cited by 1 | Viewed by 891
Abstract
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection [...] Read more.
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

29 pages, 808 KB  
Review
Spectrogram Features for Audio and Speech Analysis
by Ian McLoughlin, Lam Pham, Yan Song, Xiaoxiao Miao, Huy Phan, Pengfei Cai, Qing Gu, Jiang Nan, Haoyu Song and Donny Soh
Appl. Sci. 2026, 16(2), 572; https://doi.org/10.3390/app16020572 - 6 Jan 2026
Cited by 1 | Viewed by 4356
Abstract
Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency [...] Read more.
Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a range of machine learning techniques such as convolutional neural networks, which had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks. Full article
(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)
Show Figures

Figure 1

11 pages, 1725 KB  
Article
Tool Wear Detection in Milling Using Convolutional Neural Networks and Audible Sound Signals
by Halil Ibrahim Turan and Ali Mamedov
Machines 2026, 14(1), 59; https://doi.org/10.3390/machines14010059 - 2 Jan 2026
Cited by 3 | Viewed by 1066
Abstract
Timely tool wear detection has been an important target for the metal cutting industry for decades because of its significance for part quality and production cost control. With the shift toward intelligent and sustainable manufacturing, reliable tool-condition monitoring has become even more critical. [...] Read more.
Timely tool wear detection has been an important target for the metal cutting industry for decades because of its significance for part quality and production cost control. With the shift toward intelligent and sustainable manufacturing, reliable tool-condition monitoring has become even more critical. One of the main challenges in sound-based tool wear monitoring is the presence of noise interference, instability and the highly volatile nature of machining acoustics, which complicates the extraction of meaningful features. In this study, a Convolutional Neural Network (CNN) model is proposed to classify tool wear conditions in milling operations using acoustic signals. Sound recordings were collected from tools at different wear stages under two cutting speeds, and Mel-Frequency Cepstral Coefficients (MFCCs) were extracted to obtain a compact representation of the short-term power spectrum. These MFCC matrices enabled the CNN to learn discriminative spectral patterns associated with wear. To evaluate model stability and reduce the effects of algorithmic randomness, training was repeated three times for each cutting speed. For the 520 rpm dataset, the model achieved an average validation accuracy of 96.85 ± 2.07%, while for the 635 rpm dataset it achieved 93.69 ± 2.07%. The results demonstrate the feasibility of using acoustic signals, despite inherent noise challenges, as a complementary approach for identifying suitable tool replacement intervals in milling. Full article
(This article belongs to the Special Issue Intelligent Tool Wear Monitoring)
Show Figures

Figure 1

20 pages, 2548 KB  
Article
Fault Diagnosis of Motor Bearing Transmission System Based on Acoustic Characteristics
by Long Ma, Yan Zhang and Zhongqiu Wang
Sensors 2026, 26(1), 259; https://doi.org/10.3390/s26010259 - 31 Dec 2025
Cited by 1 | Viewed by 1238
Abstract
Traditional vibration-based methods for bearing fault diagnosis, while prevalent, often require contact measurement, and sound signal is a broadband signal relative to the vibration signal. To overcome these limitations, this paper explores the advantages of acoustic signals, non-contact sensing, and rich broadband information [...] Read more.
Traditional vibration-based methods for bearing fault diagnosis, while prevalent, often require contact measurement, and sound signal is a broadband signal relative to the vibration signal. To overcome these limitations, this paper explores the advantages of acoustic signals, non-contact sensing, and rich broadband information and proposes a fault diagnosis framework based on acoustic features and deep learning. The core of our method is a CNN–attention mechanism–LSTM model, specifically designed to process one-dimensional sequential features: the 1D-CNN extracts local features from Mel frequency cepstral coefficient (MFCC) features, the attention mechanism (selecting ECA as the optimal solution) selectively enhances features, and the LSTM captures temporal dependencies, collectively enabling effective classification of fault types. Furthermore, to enhance model efficiency, a ReliefF-based feature selection algorithm is employed to identify and retain only the most discriminative acoustic features. Experimental results demonstrate that the proposed method achieves an average diagnostic accuracy of 99.90% in distinguishing normal, inner-ring, outer-ring, and mixed-defect bearings. Notably, results show that after using the feature selection algorithm, the number of parameters and the estimated total size are significantly reduced while ensuring that the accuracy remains basically unchanged. This work validates the effectiveness of non-contact solutions for bearing fault diagnosis using acoustic features and has enormous potential for industrial applications. Full article
Show Figures

Figure 1

Back to TopTop