Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (520)

Search Parameters:
Keywords = audio classification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
41 pages, 5589 KB  
Review
Advances in Audio-Based Artificial Intelligence for Respiratory Health and Welfare Monitoring in Broiler Chickens
by Md Sharifuzzaman, Hong-Seok Mun, Eddiemar B. Lagua, Md Kamrul Hasan, Jin-Gu Kang, Young-Hwa Kim, Ahsan Mehtab, Hae-Rang Park and Chul-Ju Yang
AI 2026, 7(2), 58; https://doi.org/10.3390/ai7020058 - 4 Feb 2026
Viewed by 175
Abstract
Respiratory diseases and welfare impairments impose substantial economic and ethical burdens on modern broiler production, driven by high stocking density, rapid pathogen transmission, and limited sensitivity of conventional monitoring methods. Because respiratory pathology and stress directly alter vocal behavior, acoustic monitoring has emerged [...] Read more.
Respiratory diseases and welfare impairments impose substantial economic and ethical burdens on modern broiler production, driven by high stocking density, rapid pathogen transmission, and limited sensitivity of conventional monitoring methods. Because respiratory pathology and stress directly alter vocal behavior, acoustic monitoring has emerged as a promising non-invasive approach for continuous flock-level surveillance. This review synthesizes recent advances in audio classification and artificial intelligence for monitoring respiratory health and welfare in broiler chickens. We have reviewed the anatomical basis of sound production, characterized key vocal categories relevant to health and welfare, and summarized recording strategies, datasets, acoustic features, machine-learning and deep-learning models, and evaluation metrics used in poultry sound analysis. Evidence from experimental and commercial settings demonstrates that AI-based acoustic systems can detect respiratory sounds, stress, and welfare changes with high accuracy, often enabling earlier intervention than traditional methods. Finally, we discuss current limitations, including background noise, data imbalance, limited multi-farm validation, and challenges in interpretability and deployment, and outline future directions for scalable, robust, and practical sound-based monitoring systems in broiler production. Full article
(This article belongs to the Section AI Systems: Theory and Applications)
5 pages, 1305 KB  
Proceeding Paper
Audiovisual Fusion Technique for Detecting Sensitive Content in Videos
by Daniel Povedano Álvarez, Ana Lucila Sandoval Orozco and Luis Javier García Villalba
Eng. Proc. 2026, 123(1), 11; https://doi.org/10.3390/engproc2026123011 - 2 Feb 2026
Viewed by 227
Abstract
The detection of sensitive content in online videos is a key challenge for ensuring digital safety and effective content moderation. This work proposes the Multimodal Audiovisual Attention (MAV-Att), a multimodal deep learning framework that jointly exploits audio and visual cues to improve detection [...] Read more.
The detection of sensitive content in online videos is a key challenge for ensuring digital safety and effective content moderation. This work proposes the Multimodal Audiovisual Attention (MAV-Att), a multimodal deep learning framework that jointly exploits audio and visual cues to improve detection accuracy. The model was evaluated on the LSPD dataset, comprising 52,427 video segments of 20 s each, with optimized keyframe extraction. MAV-Att consists of dual audio and image branches enhanced by attention mechanisms to capture both temporal and cross-modal dependencies. Trained using a joint optimisation loss, the system achieved F1-scores of 94.9% on segments and 94.5% on entire videos, surpassing previous state-of-the-art models by 6.75%. Full article
(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)
Show Figures

Figure 1

34 pages, 1776 KB  
Article
Interpretable Acoustic Features from Wakefulness Tracheal Breathing for OSA Severity Assessment
by Ali Mohammad Alqudah, Walid Ashraf, Brian Lithgow and Zahra Moussavi
J. Clin. Med. 2026, 15(3), 1081; https://doi.org/10.3390/jcm15031081 - 29 Jan 2026
Viewed by 128
Abstract
Background: Obstructive Sleep Apnea (OSA) is one of the most prevalent sleep disorders associated with cardiovascular complications, cognitive impairments, and reduced quality of life. Early and accurate diagnosis is essential. The present gold standard, polysomnography, is expensive and resource-intensive. This work develops [...] Read more.
Background: Obstructive Sleep Apnea (OSA) is one of the most prevalent sleep disorders associated with cardiovascular complications, cognitive impairments, and reduced quality of life. Early and accurate diagnosis is essential. The present gold standard, polysomnography, is expensive and resource-intensive. This work develops a non-invasive machine-learning-based framework to classify four OSA severity groups (non, mild, moderate, and severe) using tracheal breathing sounds (TBSs) and anthropometric variables. Methods: A total of 199 participants were recruited, and TBS were recorded whilst awake (wakefulness) using a suprasternal microphone. The workflow included the following steps: signal preprocessing (segmentation, filtering, and normalization), multi-domain feature extraction representing spectral, temporal, nonlinear, and morphological features, adaptive feature normalization, and a three-stage feature selection that combined univariate filtering, Shapley Additive Explanations (SHAP)-based ranking, and recursive feature elimination (RFE). The classification included training ensemble learning models via bootstrap aggregation and validating them using stratified k-fold cross-validation (CV), while preserving the OSA severity and anthropometric distributions. Results: The proposed framework performed well in discriminating among OSA severity groups. TBS features, combined with anthropometric ones, increased classification performance and reliability across all severity classes, providing proof for the efficacy of non-invasive audio biomarkers for OSA screening. Conclusions: TBS-based model’s features, coupled with anthropometric information, offer a promising alternative or supplement to PSG for OSA severity detection. The approach provides scalability and accessibility to extend screening and potentially enables earlier detection of OSA, compared to cases that might remain undiagnosed without screening. Full article
Show Figures

Figure 1

16 pages, 3367 KB  
Article
Utilizing Multimodal Logic Fusion to Identify the Types of Food Waste Sources
by Dong-Ming Gao, Jia-Qi Song, Zong-Qiang Fu, Zhi Liu and Gang Li
Sensors 2026, 26(3), 851; https://doi.org/10.3390/s26030851 - 28 Jan 2026
Viewed by 118
Abstract
It is a challenge to identify food waste sources in all-weather industrial environments, as variable lighting conditions can compromise the effectiveness of visual recognition models. This study proposes and validates a robust, interpretable, and adaptive multimodal logic fusion method in which sensor dominance [...] Read more.
It is a challenge to identify food waste sources in all-weather industrial environments, as variable lighting conditions can compromise the effectiveness of visual recognition models. This study proposes and validates a robust, interpretable, and adaptive multimodal logic fusion method in which sensor dominance is dynamically assigned based on real-time illuminance intensity. The method comprises two foundational components: (1) a lightweight MobileNetV3 + EMA model for image recognition; and (2) an audio model employing Fast Fourier Transform (FFT) for feature extraction and Support Vector Machine (SVM) for classification. The key contribution of this system lies in its environment-aware conditional logic. The image model MobileNetV3 + EMA achieves an accuracy of 99.46% within the optimal brightness range (120–240 cd m−2), significantly outperforming the audio model. However, its performance degrades significantly outside the optimal range, while the audio model maintains an illumination-independent accuracy of 0.80, a recall of 0.78, and an F1 score of 0.80. When light intensity falls below the threshold of 84 cd m−2, the audio recognition results take precedence. This strategy ensures robust classification accuracy under variable environmental conditions, preventing model failure. Validated on an independent test set, the fusion method achieves an overall accuracy of 90.25%, providing an interpretable and resilient solution for real-world industrial deployment. Full article
(This article belongs to the Special Issue Multi-Sensor Data Fusion)
Show Figures

Figure 1

13 pages, 1699 KB  
Article
Applying Multiple Machine Learning Models to Classify Mild Cognitive Impairment from Speech in Community-Dwelling Older Adults
by Renqing Zhao, Zhiyuan Zhu and Zihui Huang
J. Intell. 2026, 14(2), 17; https://doi.org/10.3390/jintelligence14020017 - 26 Jan 2026
Viewed by 149
Abstract
This study aims to develop effective screening tools for cognitive impairment by integrating optimised speech classification features with various machine learning models. A total of 65 patients diagnosed with early-stage Mild Cognitive Impairment (MCI) and 55 healthy controls (HCs) were included. Audio data [...] Read more.
This study aims to develop effective screening tools for cognitive impairment by integrating optimised speech classification features with various machine learning models. A total of 65 patients diagnosed with early-stage Mild Cognitive Impairment (MCI) and 55 healthy controls (HCs) were included. Audio data were collected through a picture description task and processed using the Python-based Librosa library for speech feature extraction. Three machine learning models were constructed: the Random Forest (RF) and Support Vector Machine (SVM) models utilised speech classification features optimised via the Sequential Forward Selection (SFS) algorithm, while the Extreme Gradient Boosting (XGBoost) model was trained on preprocessed speech data. After parameter tuning, the Librosa library successfully extracted 41 speech classification features from all participants. The application of the SFS optimisation strategy and the use of preprocessed data significantly improved identification accuracy. The SVM model achieved an accuracy of 0.825 (AUC: 0.91), the RF model reached 0.88 (AUC: 0.86), and the XGBoost model attained 0.92 (AUC: 0.91). These results suggest that speech-based machine learning models markedly improve the accuracy of distinguishing MCI patients from healthy older adults, providing reliable support for early cognitive deficit identification. Full article
Show Figures

Figure 1

41 pages, 2850 KB  
Article
Automated Classification of Humpback Whale Calls Using Deep Learning: A Comparative Study of Neural Architectures and Acoustic Feature Representations
by Jack C. Johnson and Yue Rong
Sensors 2026, 26(2), 715; https://doi.org/10.3390/s26020715 - 21 Jan 2026
Viewed by 211
Abstract
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection [...] Read more.
Passive acoustic monitoring (PAM) using hydrophones enables collecting acoustic data to be collected in large and diverse quantities, necessitating the need for a reliable automated classification system. This paper presents a data-processing pipeline and a set of neural networks designed for a humpback-whale-detection system. A collection of audio segments is compiled using publicly available audio repositories and extensively curated via manual methods, undertaking thorough examination, editing and clipping to produce a dataset minimizing bias or categorization errors. An array of standard data-augmentation techniques are applied to the collected audio, diversifying and expanding the original dataset. Multiple neural networks are designed and trained using TensorFlow 2.20.0 and Keras 3.13.1 frameworks, resulting in a custom curated architecture layout based on research and iterative improvements. The pre-trained model MobileNetV2 is also included for further analysis. Model performance demonstrates a strong dependence on both feature representation and network architecture. Mel spectrogram inputs consistently outperformed MFCC (Mel-Frequency Cepstral Coefficients) features across all model types. The highest performance was achieved by the pretrained MobileNetV2 using mel spectrograms without augmentation, reaching a test accuracy of 99.01% with balanced precision and recall of 99% and a Matthews correlation coefficient of 0.98. The custom CNN with mel spectrograms also achieved strong performance, with 98.92% accuracy and a false negative rate of only 0.75%. In contrast, models trained with MFCC representations exhibited consistently lower robustness and higher false negative rates. These results highlight the comparative strengths of the evaluated feature representations and network architectures for humpback whale detection. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

28 pages, 435 KB  
Review
Advances in Audio Classification and Artificial Intelligence for Respiratory Health and Welfare Monitoring in Swine
by Md Sharifuzzaman, Hong-Seok Mun, Eddiemar B. Lagua, Md Kamrul Hasan, Jin-Gu Kang, Young-Hwa Kim, Ahsan Mehtab, Hae-Rang Park and Chul-Ju Yang
Biology 2026, 15(2), 177; https://doi.org/10.3390/biology15020177 - 18 Jan 2026
Viewed by 391
Abstract
Respiratory diseases remain one of the most significant health challenges in modern swine production, leading to substantial economic losses, compromised animal welfare, and increased antimicrobial use. In recent years, advances in artificial intelligence (AI), particularly machine learning and deep learning, have enabled the [...] Read more.
Respiratory diseases remain one of the most significant health challenges in modern swine production, leading to substantial economic losses, compromised animal welfare, and increased antimicrobial use. In recent years, advances in artificial intelligence (AI), particularly machine learning and deep learning, have enabled the development of non-invasive, continuous monitoring systems based on pig vocalizations. Among these, audio-based technologies have emerged as especially promising tools for early detection and monitoring of respiratory disorders under real farm conditions. This review provides a comprehensive synthesis of AI-driven audio classification approaches applied to pig farming, with focus on respiratory health and welfare monitoring. First, the biological and acoustic foundations of pig vocalizations and their relevance to health and welfare assessment are outlined. The review then systematically examines sound acquisition technologies, feature engineering strategies, machine learning and deep learning models, and evaluation methodologies reported in the literature. Commercially available systems and recent advances in real-time, edge, and on-farm deployment are also discussed. Finally, key challenges related to data scarcity, generalization, environmental noise, and practical deployment are identified, and emerging opportunities for future research including multimodal sensing, standardized datasets, and explainable AI are highlighted. This review aims to provide researchers, engineers, and industry stakeholders with a consolidated reference to guide the development and adoption of robust AI-based acoustic monitoring systems for respiratory health management in swine. Full article
(This article belongs to the Section Zoology)
24 pages, 5019 KB  
Article
A Dual Stream Deep Learning Framework for Alzheimer’s Disease Detection Using MRI Sonification
by Nadia A. Mohsin and Mohammed H. Abdul Ameer
J. Imaging 2026, 12(1), 46; https://doi.org/10.3390/jimaging12010046 - 15 Jan 2026
Viewed by 237
Abstract
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the [...] Read more.
Alzheimer’s Disease (AD) is an advanced brain illness that affects millions of individuals across the world. It causes gradual damage to the brain cells, leading to memory loss and cognitive dysfunction. Although Magnetic Resonance Imaging (MRI) is widely used in AD diagnosis, the existing studies rely solely on the visual representations, leaving alternative features unexplored. The objective of this study is to explore whether MRI sonification can provide complementary diagnostic information when combined with conventional image-based methods. In this study, we propose a novel dual-stream multimodal framework that integrates 2D MRI slices with their corresponding audio representations. MRI images are transformed into audio signals using a multi-scale, multi-orientation Gabor filtering, followed by a Hilbert space-filling curve to preserve spatial locality. The image and sound modalities are processed using a lightweight CNN and YAMNet, respectively, then fused via logistic regression. The experimental results of the multimodal achieved the highest accuracy in distinguishing AD from Cognitively Normal (CN) subjects at 98.2%, 94% for AD vs. Mild Cognitive Impairment (MCI), and 93.2% for MCI vs. CN. This work provides a new perspective and highlights the potential of audio transformation of imaging data for feature extraction and classification. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

19 pages, 1607 KB  
Article
Real-Time Bird Audio Detection with a CNN-RNN Model on a SoC-FPGA
by Rodrigo Lopes da Silva, Gustavo Jacinto, Mário Véstias and Rui Policarpo Duarte
Electronics 2026, 15(2), 354; https://doi.org/10.3390/electronics15020354 - 13 Jan 2026
Viewed by 307
Abstract
Monitoring wildlife has become increasingly important for understanding the evolution of species and ecosystem health. Acoustic monitoring offers several advantages over video-based approaches, enabling continuous 24/7 observation and robust detection under challenging environmental conditions. Deep learning models have demonstrated strong performance in audio [...] Read more.
Monitoring wildlife has become increasingly important for understanding the evolution of species and ecosystem health. Acoustic monitoring offers several advantages over video-based approaches, enabling continuous 24/7 observation and robust detection under challenging environmental conditions. Deep learning models have demonstrated strong performance in audio classification. However, their computational complexity poses significant challenges for deployment on low-power embedded platforms. This paper presents a low-power embedded system for real-time bird audio detection. A hybrid CNN–RNN architecture is adopted, redesigned, and quantized to significantly reduce model complexity while preserving classification accuracy. To support efficient execution, a custom hardware accelerator was developed and integrated into a Zynq UltraScale+ ZU3CG FPGA. The proposed system achieves an accuracy of 87.4%, processes up to 5 audio samples per second, and operates at only 1.4 W, demonstrating its suitability for autonomous, energy-efficient wildlife monitoring applications. Full article
Show Figures

Figure 1

19 pages, 8336 KB  
Article
Dendritic Spiking Neural Networks with Combined Membrane Potential Decay and Dynamic Threshold for Sequential Recognition
by Qian Zhou, Wenjie Wang and Mengting Qiao
Appl. Sci. 2026, 16(2), 748; https://doi.org/10.3390/app16020748 - 11 Jan 2026
Viewed by 358
Abstract
Spiking neural networks (SNNs) aim to simulate human neural networks with biologically plausible neurons. However, conventional SNNs based on point neurons ignore the inherent dendritic computation of biological neurons. Additionally, these point neurons usually employ single membrane potential decay and a fixed firing [...] Read more.
Spiking neural networks (SNNs) aim to simulate human neural networks with biologically plausible neurons. However, conventional SNNs based on point neurons ignore the inherent dendritic computation of biological neurons. Additionally, these point neurons usually employ single membrane potential decay and a fixed firing threshold, which is in contrast to the heterogeneity of real neural networks and limits the neuronal dynamic diversity needed when dealing with multi-scale sequential tasks. In this work, we propose a dendritic spiking neuron model with combined membrane potential decay and a dynamic firing threshold. Then, we extend the neuron model to the feedforward network level, termed dendritic spiking neural network with combined membrane potential decay and dynamic threshold (CD-DT-DSNN). By learning the heterogeneous neuronal decay factors, which combine two different membrane potential decay mechanisms, and learning adaptive factors, our networks can rapidly respond to input signals and dynamically regulate neuronal firing rates, which help the extraction of multi-scale spatio-temporal features. Experiments on four spike-based audio and image sequential datasets demonstrate that our CD-DT-DSNN outperformed state-of-the-art heterogeneous SNNs and dendritic compartment SNNs with higher classification accuracy and fewer parameters. This work suggests that heterogeneity in neuronal membrane potential decay and neural firing thresholds is a critical component in learning multi-timescale temporal dynamics and maintaining long-term memory, providing a novel perspective for constructing high biologically plausible neuromorphic computing models. It provides a solution for multi-timescale temporal sequential tasks, such as speech recognition, EEG signal recognition, and robot place recognition. Full article
Show Figures

Figure 1

27 pages, 3371 KB  
Article
An Airflow-Orchestrated AI Pipeline for Podcast Transcription, Topic Modeling, and Recommendation System
by Ioannis Kazlaris, Georgios Papadopoulos, Konstantinos Diamantaras, Marina Delianidi, Eftychia Touliou and Anagnostis Yenitzes
Multimedia 2026, 2(1), 1; https://doi.org/10.3390/multimedia2010001 - 9 Jan 2026
Viewed by 505
Abstract
This study presents a production-ready AI pipeline for audio content processing, implemented within the Youth Radio platform, which serves as an extension of the European School Radio initiative. The system uses a multi-server architecture: an AI Server that runs batch/offline jobs, orchestrated by [...] Read more.
This study presents a production-ready AI pipeline for audio content processing, implemented within the Youth Radio platform, which serves as an extension of the European School Radio initiative. The system uses a multi-server architecture: an AI Server that runs batch/offline jobs, orchestrated by Apache Airflow, and two Web Servers that deliver all the Backend as well as the Frontend applications, configured with load balancing and redundancy to ensure high availability and fault tolerance. The implemented AI Pipeline includes tasks such as preprocessing, transcription, audio classification and topic modeling. Processed Podcasts are indexed in a Qdrant vector database to facilitate both dense and sparse retrieval while a recommendation system enriches the user’s experience. We summarize design choices and report system-level metrics and task-level indicators (ASR quality after correction, retrieval effectiveness) to guide similar deployments. Full article
Show Figures

Graphical abstract

29 pages, 808 KB  
Review
Spectrogram Features for Audio and Speech Analysis
by Ian McLoughlin, Lam Pham, Yan Song, Xiaoxiao Miao, Huy Phan, Pengfei Cai, Qing Gu, Jiang Nan, Haoyu Song and Donny Soh
Appl. Sci. 2026, 16(2), 572; https://doi.org/10.3390/app16020572 - 6 Jan 2026
Viewed by 752
Abstract
Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency [...] Read more.
Spectrogram-based representations have grown to dominate the feature space for deep learning audio analysis systems, and are often adopted for speech analysis also. Initially, the primary motivation behind spectrogram-based representations was their ability to present sound as a two-dimensional signal in the time–frequency plane, which not only provides an interpretable physical basis for analysing sound, but also unlocks the use of a range of machine learning techniques such as convolutional neural networks, which had been developed for image processing. A spectrogram is a matrix characterised by the resolution and span of its dimensions, as well as by the representation and scaling of each element. Many possibilities for these three characteristics have been explored by researchers across numerous application areas, with different settings showing affinity for various tasks. This paper reviews the use of spectrogram-based representations and surveys the state-of-the-art to question how front-end feature representation choice allies with back-end classifier architecture for different tasks. Full article
(This article belongs to the Special Issue AI in Audio Analysis: Spectrogram-Based Recognition)
Show Figures

Figure 1

24 pages, 2626 KB  
Article
Markov Chain Wave Generative Adversarial Network for Bee Bioacoustic Signal Synthesis
by Kumudu Samarappuli, Iman Ardekani, Mahsa Mohaghegh and Abdolhossein Sarrafzadeh
Sensors 2026, 26(2), 371; https://doi.org/10.3390/s26020371 - 6 Jan 2026
Viewed by 304
Abstract
This paper presents a framework for synthesizing bee bioacoustic signals associated with hive events. While existing approaches like WaveGAN have shown promise in audio generation, they often fail to preserve the subtle temporal and spectral features of bioacoustic signals critical for event-specific classification. [...] Read more.
This paper presents a framework for synthesizing bee bioacoustic signals associated with hive events. While existing approaches like WaveGAN have shown promise in audio generation, they often fail to preserve the subtle temporal and spectral features of bioacoustic signals critical for event-specific classification. The proposed method, MCWaveGAN, extends WaveGAN with a Markov Chain refinement stage, producing synthetic signals that more closely match the distribution of real bioacoustic data. Experimental results show that this method captures signal characteristics more effectively than WaveGAN alone. Furthermore, when integrated into a classifier, synthesized signals improved hive status prediction accuracy. These results highlight the potential of the proposed method to alleviate data scarcity in bioacoustics and support intelligent monitoring in smart beekeeping, with broader applicability to other ecological and agricultural domains. Full article
(This article belongs to the Special Issue AI, Sensors and Algorithms for Bioacoustic Applications)
Show Figures

Figure 1

26 pages, 2345 KB  
Article
NeuroStrainSense: A Transformer-Generative AI Framework for Stress Detection Using Heterogeneous Multimodal Datasets
by Dalel Ben Ismail, Wyssem Fathallah, Mourad Mars and Hedi Sakli
Technologies 2026, 14(1), 35; https://doi.org/10.3390/technologies14010035 - 5 Jan 2026
Viewed by 328
Abstract
Stress is a pervasive global health concern that adversely contributes to morbidity and reduced productivity, yet it often remains unquantified due to its subjective and variant presentation. Although artificial intelligence offers an encouraging path toward automated monitoring of mental states, current state-of-the-art approaches [...] Read more.
Stress is a pervasive global health concern that adversely contributes to morbidity and reduced productivity, yet it often remains unquantified due to its subjective and variant presentation. Although artificial intelligence offers an encouraging path toward automated monitoring of mental states, current state-of-the-art approaches are challenged by the reliance on single-source data, sparsity of labeled samples, and significant class imbalance. This paper proposes NeuroStrainSense, a novel deep multimodal stress detection model that integrates three complementary datasets—WESAD, SWELL-KW, and TILES—through a Transformer-based feature fusion architecture combined with a Variational Autoencoder for generative data augmentation. The Transformer architecture employs four encoder layers with eight multi-head attention heads and a hidden dimension of 512 to capture complex inter-modal dependencies across physiological, audio, and behavioral modalities. Our experiments demonstrate that NeuroStrainSense achieves a state-of-the-art performance with accuracies of 87.1%, 88.5%, and 89.8% on the respective datasets, with F1-scores exceeding 0.85 and AUCs greater than 0.89, representing improvements of 2.6–6.6 percentage points over existing baselines. We propose a robust evaluation framework that quantifies discrimination among stress types through clustering validity metrics, achieving a Silhouette Score of 0.75 and Intraclass Correlation Coefficient of 0.76. Comprehensive ablation experiments confirm the utility of each modality and the VAE augmentation module, with physiological features contributing most significantly (average performance decrease of 5.8% when removed), followed by audio (2.8%) and behavioral features (2.1%). Statistical validation confirms all findings at the p < 0.01 significance level. Beyond binary classification, the model identifies five clinically relevant stress profiles—Cognitive Overload, Burnout, Acute Stress, Psychosomatic, and Low-Grade Chronic—with an expert concordance of Cohen’s κ = 0.71 (p < 0.001), demonstrating the strong ecological validity for personalized well-being and occupational health applications. External validation on the MIT Reality Mining dataset confirms the generalizability with minimal performance degradation (accuracy: 0.785, F1-score: 0.752, AUC: 0.849). This work underlines the potential of integrated multimodal learning and demographically aware generative AI for continuous, precise, and fair stress monitoring across diverse populations and environmental contexts. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

16 pages, 1334 KB  
Article
UAV Classification Based on Statistics of RF Transmissions
by Jarosław Magiera
Appl. Sci. 2026, 16(1), 499; https://doi.org/10.3390/app16010499 - 4 Jan 2026
Viewed by 393
Abstract
The malicious use of small unmanned aerial vehicles (UAVs) necessitates the development of effective countermeasures against such threats. Counter-UAV systems encompass detection, classification, and neutralization. Detection and classification can be performed using visual, audio, radar, or radio frequency (RF) sensors. This paper proposes [...] Read more.
The malicious use of small unmanned aerial vehicles (UAVs) necessitates the development of effective countermeasures against such threats. Counter-UAV systems encompass detection, classification, and neutralization. Detection and classification can be performed using visual, audio, radar, or radio frequency (RF) sensors. This paper proposes a straightforward UAV classification scheme based on analyzing RF transmissions according to fundamental parameters such as bandwidth, duration, and center frequency. The statistics of these parameters are expected to be unique, enabling differentiation between various UAV models. The paper outlines the methodology for analyzing received waveforms to estimate the aforementioned parameters and their distributions. Computer vision tools are employed for spectrogram processing. The proposed approach is validated on a large dataset containing waveforms from eight UAV models. Three types of statistics are evaluated, demonstrating that each analyzed UAV exhibits distinct transmission-related features. Full article
Show Figures

Figure 1

Back to TopTop