MDPI - Publisher of Open Access Journals

16 pages, 1591 KB

Open AccessArticle

Developmental and School-Related Risk Factors in Auditory Processing Disorder: A Pilot Study in Polish Children

by Natalia Moćko, Arkadiusz Badziński and Michał Kręcichwost

Appl. Sci. 2025, 15(21), 11687; https://doi.org/10.3390/app152111687 (registering DOI) - 31 Oct 2025

The paper presents the issue of acquired and secondary auditory processing disorder (APD) in children and adolescents in the Polish population. The authors analyzed a group of individuals with APD and younger children who were at risk based on a detailed interview with [...] Read more.

The paper presents the issue of acquired and secondary auditory processing disorder (APD) in children and adolescents in the Polish population. The authors analyzed a group of individuals with APD and younger children who were at risk based on a detailed interview with parents. A comparison of developmental factors showed several similarities between the risk and diagnosed APD groups, including abnormal muscle tone (64.29% vs. 33.33%), ear diseases (42.86% vs. 57.58%), and complicated delivery (32.14% vs. 39.39%). In terms of school factors, the most significant difficulties were associated with poor concentration (78.57% vs. 54.55%), irregularities in mastering phonology related to writing (67.86% vs. 75.76%), and reading (64.29% vs. 78.79%), as well as problems with understanding speech-in-noise perception (60.71% vs. 57.58%). A comparison of children at risk of APD and those with a confirmed diagnosis revealed multiple similarities. The results were visualized using Pareto charts to highlight the most influential factors. The results indicate the need to disseminate screening that could show the APD risk group. Therefore, the diagnostic process could be performed more quickly in such individuals. Based on recurring developmental factors, the Risk Assessment Questionnaire (RAQ) was developed as a non-clinical screening tool to identify children potentially at risk of APD. The RAQ demonstrated a moderate discriminative potential (AUC = 0.68; sensitivity = 75%; specificity = 68%) and may support early referral for diagnostic evaluation. The results highlight the value of systematic screening to accelerate diagnosis and intervention, especially in populations where access to formal assessment is limited. Full article

(This article belongs to the Section Biomedical Engineering)

20 pages, 3036 KB

Open AccessArticle

Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions

by Tsung-Jung Li and Jeih-Weih Hung

Mathematics 2025, 13(21), 3481; https://doi.org/10.3390/math13213481 (registering DOI) - 31 Oct 2025

Abstract

We propose MUSE++, an advanced and lightweight speech enhancement (SE) framework that builds upon the original MUSE architecture by introducing three key improvements: a Mamba-based state space model, dynamic SNR-driven data augmentation, and an augmented multi-objective loss function. First, we replace the original [...] Read more.

We propose MUSE++, an advanced and lightweight speech enhancement (SE) framework that builds upon the original MUSE architecture by introducing three key improvements: a Mamba-based state space model, dynamic SNR-driven data augmentation, and an augmented multi-objective loss function. First, we replace the original multi-path enhanced Taylor (MET) transformer block with the Mamba architecture, enabling substantial reductions in model complexity and parameter count while maintaining robust enhancement capability. Second, we adopt a dynamic training strategy that varies the signal-to-noise ratios (SNRs) across diverse speech samples, promoting improved generalization to real-world acoustic scenarios. Third, we expand the model’s loss framework with additional objective measures, allowing the model to be empirically tuned towards both perceptual and objective SE metrics. Comprehensive experiments conducted on the VoiceBank-DEMAND dataset demonstrate that MUSE++ delivers consistently superior performance across standard evaluation metrics, including PESQ, CSIG, CBAK, COVL, SSNR, and STOI, while reducing the number of model parameters by over 65% compared to the baseline. These results highlight MUSE++ as a highly efficient and effective solution for speech enhancement, particularly in resource-constrained and real-time deployment scenarios. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

18 pages, 1819 KB

Open AccessArticle

Speech Markers of Parkinson’s Disease: Phonological Features and Acoustic Measures

by Ratree Wayland, Rachel Meyer and Kevin Tang

Brain Sci. 2025, 15(11), 1162; https://doi.org/10.3390/brainsci15111162 - 29 Oct 2025

Abstract

Background/Objectives: Parkinson’s disease (PD) affects both articulatory and phonatory subsystems, leading to characteristic speech changes known as hypokinetic dysarthria. However, few studies have jointly analyzed these subsystems within the same participants using interpretable deep-learning-based measures. Methods: Speech data from the PC-GITA corpus, [...] Read more.

Background/Objectives: Parkinson’s disease (PD) affects both articulatory and phonatory subsystems, leading to characteristic speech changes known as hypokinetic dysarthria. However, few studies have jointly analyzed these subsystems within the same participants using interpretable deep-learning-based measures. Methods: Speech data from the PC-GITA corpus, including 50 Colombian Spanish speakers with PD and 50 age- and sex-matched healthy controls were analyzed. We combined phonological feature posteriors—probabilistic indices of articulatory constriction derived from the Phonet deep neural network—with harmonics-to-noise ratio (HNR) as a laryngeal measure. Linear mixed-effects models tested how these measures related to disease severity (UPDRS, UPDRS-speech, and Hoehn and Yahr), age, and sex. Results: PD participants showed significantly higher [continuant] posteriors, especially for dental stops, reflecting increased spirantization and articulatory weakening. In contrast, [sonorant] posteriors did not differ from controls, indicating reduced oral constriction without a shift toward more open, approximant-like articulations. HNR was predicted by vowel height and sex but did not distinguish PD from controls, likely reflecting ON-medication recordings. Conclusions: These findings demonstrate that deep-learning-derived articulatory features can capture early, subphonemic weakening in PD speech—particularly for coronal consonants—while single-parameter laryngeal indices such as HNR are less sensitive under medicated conditions. By linking spectral energy patterns to interpretable phonological categories, this approach provides a transparent framework for detecting subtle articulatory deficits and developing feature-level biomarkers of PD progression. Full article

(This article belongs to the Section Behavioral Neuroscience)

► Show Figures

Figure 1

11 pages, 243 KB

Open AccessArticle

Association Between Shift Work and Auditory–Cognitive Processing in Middle-Aged Healthcare Workers

by Margarida Roque, Tatiana Marques and Margarida Serrano

Audiol. Res. 2025, 15(6), 145; https://doi.org/10.3390/audiolres15060145 - 25 Oct 2025

Viewed by 147

Abstract

Background/Objectives: Shift work in healthcare professionals affects performance in high cognitive processing, especially in complex environments. However, the beneficial effects that working in complex environments may have on auditory–cognitive processing remain unknown. These professionals face increased challenges in decision-making due to factors such [...] Read more.

Background/Objectives: Shift work in healthcare professionals affects performance in high cognitive processing, especially in complex environments. However, the beneficial effects that working in complex environments may have on auditory–cognitive processing remain unknown. These professionals face increased challenges in decision-making due to factors such as noise exposure and sleep disturbances, which may lead to the development of enhanced auditory–cognitive resources. This study aims to investigate the associations between shift work and auditory–cognitive processing in middle-aged healthcare workers. Methods: Thirty middle-aged healthcare workers were equally allocated to a shift worker (SW) or a fixed-schedule worker (FSW) group. Performance on a cognitive test, and in pure-tone audiometry, speech in quiet and noise, and listening effort were used to explore whether correlations were specific to shift work. Results: Exploratory analyses indicated that shift workers tended to perform better in visuospatial/executive function, memory recall, memory index, orientation, and total MoCA score domains compared to fixed-schedule workers. In the SW group, hearing thresholds correlated with memory recall and memory index. In the FSW group, hearing thresholds correlated with orientation, memory index, and total MoCA score, while listening effort correlated with naming, and speech intelligibility in quiet correlated with total MoCA scores. Conclusions: These exploratory findings suggest that shift work may be linked to distinct auditory–cognitive patterns, with potential compensatory mechanisms in visuospatial/executive functions and memory among middle-aged healthcare workers. Larger, longitudinal studies are warranted to confirm whether these patterns reflect true adaptive mechanisms. Full article

(This article belongs to the Special Issue The Aging Ear)

11 pages, 578 KB

Open AccessCommunication

Precision Audiometry and Ecological Validity: Exploring the Link Between Patient-Reported Outcome Measures and Speech Testing in CI Users

by Matthias Hey and Thomas Hocke

Audiol. Res. 2025, 15(5), 142; https://doi.org/10.3390/audiolres15050142 - 21 Oct 2025

Viewed by 177

Abstract

Background/Objectives: Audiometric methods for hearing-impaired patients are constantly evolving as new therapeutic interventions and improved clinical standards are established. This study aimed to explore the relationship between patient-reported outcome measures in cochlear implant users and scores from audiometric test procedures in quiet and [...] Read more.

Background/Objectives: Audiometric methods for hearing-impaired patients are constantly evolving as new therapeutic interventions and improved clinical standards are established. This study aimed to explore the relationship between patient-reported outcome measures in cochlear implant users and scores from audiometric test procedures in quiet and noise. Methods: In a prospective study, 20 postlingually deafened CI users were included. Speech comprehension was measured in quiet (by Freiburg words) and in noise (by the Oldenburg sentence test), while stationary speech-simulating or temporally fluctuating noise was applied and the noise sources were varied. Subjective feedback from the patients was obtained using the HISQUI19 questionnaire. Results: Word scores in quiet showed a significant positive correlation with the user’s subjective assessment of hearing ability using the questionnaire (Spearman’s R = 0.57). A greater correlation of the subjective evaluation of comprehension against fluctuating background noise as compared with stationary background noise was evident. On the other hand, the test–retest accuracy was reduced by a substantial factor in the transition from stationary to fluctuating background noise. Conclusions: By introducing temporal fluctuations in the background noise, the ecological validity can be improved, but at the cost of a parallel decrease in the accuracy of the test procedure. Especially in the context of studies, this knowledge may help to improve the choice of the specific test method used in evaluating the relationship between ecological validity and precision audiometry. Full article

(This article belongs to the Section Hearing)

► Show Figures

Figure 1

29 pages, 2790 KB

Open AccessArticle

A New Hybrid Adaptive Self-Loading Filter and GRU-Net for Active Noise Control in a Right-Angle Bending Pipe of an Air Conditioner

by Wenzhao Zhu, Zezheng Gu, Xiaoling Chen, Ping Xie, Lei Luo and Zonglong Bai

Sensors 2025, 25(20), 6293; https://doi.org/10.3390/s25206293 - 10 Oct 2025

Viewed by 412

Abstract

The air-conditioner noise in a rehabilitation room can seriously affect the mental state of patients. However, the existing single-layer active noise control (ANC) filters may fail to attenuate the complicated harmonic noise, and the deep recursive ANC method may fail to work in [...] Read more.

The air-conditioner noise in a rehabilitation room can seriously affect the mental state of patients. However, the existing single-layer active noise control (ANC) filters may fail to attenuate the complicated harmonic noise, and the deep recursive ANC method may fail to work in real time. To solve the problem, in a bending-pipe model, a new hybrid adaptive self-loading filtered-x least-mean-square (ASL-FxLMS) and convolutional neural network-gate recurrent unit (CNN-GRU) network is proposed. At first, based on the recursive GRU translation core, an improved CNN-GRU network with multi-head attention layers is proposed. Especially for complicated harmonic noises with more or fewer frequencies than harmonic models, the attenuation performance will be improved. In addition, its structure is optimized to decrease the computing load. In addition, an improved time-delay estimator is applied to improve the real-time ANC performance of CNN-GRU. Meanwhile, an adaptive self-loading FxLMS algorithm has been developed to deal with the uncertain components of complicated harmonic noise. Moreover, to achieve balance attenuation, robustness, and tracking performance, the ASL-FxLMS and CNN-GRU are connected by a convex combination structure. Furthermore, theoretical analysis and simulations are also conducted to show the effectiveness of the proposed method. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

19 pages, 1648 KB

Open AccessArticle

Modality-Enhanced Multimodal Integrated Fusion Attention Model for Sentiment Analysis

by Zhenwei Zhang, Wenyan Wu, Tao Yuan and Guang Feng

Appl. Sci. 2025, 15(19), 10825; https://doi.org/10.3390/app151910825 - 9 Oct 2025

Viewed by 1041

Abstract

Multimodal sentiment analysis aims to utilize multisource information such as text, speech and vision to more comprehensively and accurately identify an individual’s emotional state. However, existing methods still face challenges in practical applications, including modality heterogeneity, insufficient expressive power of non-verbal modalities, and [...] Read more.

Multimodal sentiment analysis aims to utilize multisource information such as text, speech and vision to more comprehensively and accurately identify an individual’s emotional state. However, existing methods still face challenges in practical applications, including modality heterogeneity, insufficient expressive power of non-verbal modalities, and low fusion efficiency. To address these issues, this paper proposes a Modality Enhanced Multimodal Integration Model (MEMMI). First, a modality enhancement module is designed to leverage the semantic guidance capability of the text modality, enhancing the feature representation of non-verbal modalities through a multihead attention mechanism and a dynamic routing strategy. Second, a gated fusion mechanism is introduced to selectively inject speech and visual information into the dominant text modality, enabling robust information completion and noise suppression. Finally, a combined attention fusion module is constructed to synchronously fuse information from all three modalities within a unified architecture, hile a multiscale encoder is used to capture feature representations at different semantic levels. Experimental results on three benchmark datasets—CMU-MOSEI, CMU-MOSI, and CH-SIMS—demonstrate the superiority of the proposed model. On CMU-MOSI, it achieves an Acc-7 of 45.91, with binary accuracy/F1 of 82.86/84.60, MAE of 0.734, and Corr of 0.790, outperforming TFN and MulT by a large margin. On CMU-MOSEI, the model reaches an Acc-7 of 54.17, Acc-2/F1 of 83.69/86.02, MAE of 0.526, and Corr of 0.779, surpassing all baselines, including ALMT. On CH-SIMS, it further achieves 41.88, 66.52, and 77.68 in Acc-5/Acc-3/Acc-2, with F1 of 77.85, MAE of 0.450, and Corr of 0.594, establishing new state-of-the-art performance across datasets. These results confirm that MEMMI achieves state-of-the-art performance across multiple metrics. Furthermore, ablation studies validate the effectiveness of each module in enhancing modality representation and fusion efficiency. Full article

► Show Figures

Figure 1

46 pages, 7346 KB

Open AccessReview

Integrating Speech Recognition into Intelligent Information Systems: From Statistical Models to Deep Learning

by Chaoji Wu, Yi Pan, Haipan Wu and Lei Ning

Informatics 2025, 12(4), 107; https://doi.org/10.3390/informatics12040107 - 4 Oct 2025

Viewed by 962

Abstract

Automatic speech recognition (ASR) has advanced rapidly, evolving from early template-matching systems to modern deep learning frameworks. This review systematically traces ASR’s technological evolution across four phases: the template-based era, statistical modeling approaches, the deep learning revolution, and the emergence of large-scale models [...] Read more.

Automatic speech recognition (ASR) has advanced rapidly, evolving from early template-matching systems to modern deep learning frameworks. This review systematically traces ASR’s technological evolution across four phases: the template-based era, statistical modeling approaches, the deep learning revolution, and the emergence of large-scale models under diverse learning paradigms. We analyze core technologies such as hidden Markov models (HMMs), Gaussian mixture models (GMMs), recurrent neural networks (RNNs), and recent architectures including Transformer-based models and Wav2Vec 2.0. Beyond algorithmic development, we examine how ASR integrates into intelligent information systems, analyzing real-world applications in healthcare, education, smart homes, enterprise systems, and automotive domains with attention to deployment considerations and system design. We also address persistent challenges—noise robustness, low-resource adaptation, and deployment efficiency—while exploring emerging solutions such as multimodal fusion, privacy-preserving modeling, and lightweight architectures. Finally, we outline future research directions to guide the development of robust, scalable, and intelligent ASR systems for complex, evolving environments. Full article

(This article belongs to the Section Machine Learning)

► Show Figures

Figure 1

10 pages, 294 KB

Open AccessArticle

Performance Differences Between Spanish AzBio and Latin American HINT: Implications for Test Selection

by Chrisanda Marie Sanchez, Jennifer Coto, Sandra Velandia, Ivette Cejas and Meredith A. Holcomb

Audiol. Res. 2025, 15(5), 129; https://doi.org/10.3390/audiolres15050129 - 2 Oct 2025

Viewed by 236

Abstract

Background/Objectives: Spanish-speaking patients face persistent barriers in accessing equitable audiological care, particularly when standardized language-appropriate tools are lacking. Two Spanish-language sentence recognition tests, the Spanish AzBio Sentence (SAzB) and the Latin American Hearing in Noise Test (LAH), are commonly used to evaluate speech [...] Read more.

Background/Objectives: Spanish-speaking patients face persistent barriers in accessing equitable audiological care, particularly when standardized language-appropriate tools are lacking. Two Spanish-language sentence recognition tests, the Spanish AzBio Sentence (SAzB) and the Latin American Hearing in Noise Test (LAH), are commonly used to evaluate speech perception in adults with hearing loss. However, performance differences between these measures may influence referral decisions for hearing intervention, such as cochlear implantation. This study compared test performance under varying noise and spatial conditions to guide appropriate test selection and reduce the risk of misclassification that may contribute to healthcare disparities. Methods: Twenty-one bilingual Spanish/English speaking adults with normal bilateral hearing completed speech perception testing using both the SAzB and LAH. Testing was conducted under two spatial configurations: (1) speech and noise presented from the front (0° azimuth) and (2) speech to the simulated poorer ear and noise to the better ear (90°/270° azimuth). Conditions included quiet and three signal-to-noise ratios (+10, +5, and 0 dB). Analyses included paired t-tests and one-way ANOVAs. Results: Participants scored significantly higher on the LAH than on the SAzB across all SNR conditions and configurations, with ceiling effects observed for the LAH. SAzB scores varied by language dominance, while LAH scores did not. No other differences were observed based on any further demographic information. Conclusions: The SAzB provides a more challenging and informative assessment of speech perception in noise. Relying on easier tests like the LAH may obscure real-world difficulties and delay appropriate referrals for hearing loss intervention, including cochlear implant evaluation. Selecting the most appropriate test is critical to avoiding under-referral and ensuring Spanish-speaking patients receive equitable and accurate care. Full article

(This article belongs to the Section Speech and Language)

► Show Figures

Figure 1

14 pages, 839 KB

Open AccessArticle

MMFA: Masked Multi-Layer Feature Aggregation for Speaker Verification Using WavLM

by Uijong Lee and Seok-Pil Lee

Electronics 2025, 14(19), 3857; https://doi.org/10.3390/electronics14193857 - 29 Sep 2025

Viewed by 543

Abstract

Speaker verification (SV) is a core technology for security and personalized services, and its importance has been growing with the spread of wearables such as smartwatches, earbuds, and AR/VR headsets, where privacy-preserving on-device operation under limited compute and power budgets is required. Recently, [...] Read more.

Speaker verification (SV) is a core technology for security and personalized services, and its importance has been growing with the spread of wearables such as smartwatches, earbuds, and AR/VR headsets, where privacy-preserving on-device operation under limited compute and power budgets is required. Recently, self-supervised learning (SSL) models such as WavLM and wav2vec 2.0 have been widely adopted as front ends that provide multi-layer speech representations without labeled data. Lower layers contain fine-grained acoustic information, whereas higher layers capture phonetic and contextual features. However, conventional SV systems typically use only the final layer or a single-step temporal attention over a simple weighted sum of layers, implicitly assuming that frame importance is shared across layers and thus failing to fully exploit the hierarchical diversity of SSL embeddings. We argue that frame relevance is layer dependent, as the frames most critical for speaker identity differ across layers. To address this, we propose Masked Multi-layer Feature Aggregation (MMFA), which first applies independent frame-wise attention within each layer, then performs learnable layer-wise weighting to suppress irrelevant frames such as silence and noise while effectively combining complementary information across layers. On VoxCeleb1, MMFA achieves consistent improvements over strong baselines in both EER and minDCF, and attention-map analysis confirms distinct selection patterns across layers, validating MMFA as a robust SV approach even in short-utterance and noisy conditions. Full article

(This article belongs to the Special Issue Advances in Wearable Computing: Connectivity, Security, and Applications)

► Show Figures

Figure 1

21 pages, 3434 KB

Open AccessArticle

Deep Learning-Based Compliance Assessment for Chinese Rail Transit Dispatch Speech

by Qiuzhan Zhao, Jinbai Zou and Lingxiao Chen

Appl. Sci. 2025, 15(19), 10498; https://doi.org/10.3390/app151910498 - 28 Sep 2025

Viewed by 274

Abstract

Rail transit dispatch speech plays a critical role in ensuring the safety of urban rail operations. To enable automated and accurate compliance assessment of dispatch speech, this study proposes an improved deep learning model to address the limitations of conventional approaches in terms [...] Read more.

Rail transit dispatch speech plays a critical role in ensuring the safety of urban rail operations. To enable automated and accurate compliance assessment of dispatch speech, this study proposes an improved deep learning model to address the limitations of conventional approaches in terms of accuracy and robustness. Building upon the baseline Whisper model, two key enhancements are introduced: (1) low-rank adaptation (LoRA) fine-tuning to better adapt the model to the specific acoustic and linguistic characteristics of rail transit dispatch speech, and (2) a novel entity-aware attention mechanism that incorporates named entity recognition (NER) embeddings into the decoder. This mechanism enables attention computation between words belonging to the same entity category across different commands and recitations, which helps highlight keywords critical for compliance assessment and achieve precise inter-sentence element alignment. Experimental results on real-world test sets demonstrate that the proposed model improves recognition accuracy by 30.5% compared to the baseline model. In terms of robustness, we evaluate the relative performance retention under severe noise conditions. While Zero-shot, Full Fine-tuning, and LoRA-only models achieve robustness scores of 72.2%, 72.4%, and 72.1%, respectively, and the NER-only variant reaches 88.1%, our proposed approach further improves to 89.6%. These results validate the model’s significant robustness and its potential to provide efficient and reliable technical support for ensuring the normative use of dispatch speech in urban rail transit operations. Full article

► Show Figures

Figure 1

14 pages, 1437 KB

Open AccessArticle

Increased Listening Effort: Is Hearing Training a Solution?—Results of a Pilot Study on Individualized Computer-Based Auditory Training in Subjects Not (Yet) Fitted with Hearing Aids

by Dominik Péus, Jan-Patric Schmid, Andreas Koj, Andreas Radeloff and Michael Schulte

Audiol. Res. 2025, 15(5), 124; https://doi.org/10.3390/audiolres15050124 - 27 Sep 2025

Viewed by 444

Abstract

Background: Hearing and cognition decline with age. Hearing is now considered an independent risk factor for later cognitive impairment. Computerized cognitive auditory training is being discussed as a possible adjunctive therapy approach. Objectives: The aim of this exploratory study is to investigate [...] Read more.

Background: Hearing and cognition decline with age. Hearing is now considered an independent risk factor for later cognitive impairment. Computerized cognitive auditory training is being discussed as a possible adjunctive therapy approach. Objectives: The aim of this exploratory study is to investigate how the success of a computer-based cognitive auditory training (CCAT) can be measured. For this purpose, the influence of a CCAT on different dimensions of hearing and cognition was determined. Materials and Methods: 23 subjects between 52 and 77 years old were recruited with normacusis to moderate hearing loss. They underwent 40 digital training lessons at home. Before, during, and after completion, concentration ability with the d2-R, memory (VLMT), subjective hearing impairment (HHI), hearing quality (SSQ12), listening effort in noise (ACALES), and speech understanding in noise (GÖSA) were measured. Results and Discussion: In this uncontrolled, non-randomized study, one of the main findings was that cognitive dimensions, namely processing speed, improved by 12.11 ± 16.40 points (p = 0.006), and concentration performance improved by 12.56 ± 13.50 points (p = 0.001), which were not directly trained in CCAT. Learning performance also improved slightly by 4.00 ± 7.00 (p = 0.019). Subjective hearing handicap significantly reduced by 10.70 ± 12.38 (p = 0.001). There were no significant changes in the SSQ-12 (p = 0.979). Hearing effort improved by 1.79 ± 2.13 dB SPL (p = 0.001), 1.75 ± 2.09 (p = 0.001), and 3.32 ± 3.27 dB (p < 0.001), respectively. Speech understanding in noise did not improve significantly. CCAT is likely to improve several dimensions of hearing and cognition. Controlled future studies are needed to investigate its efficacy. Full article

(This article belongs to the Special Issue Emerging Technologies in Audiology: Advancing Assessment, Intervention, and Accessibility)

► Show Figures

Figure 1

15 pages, 2966 KB

Open AccessArticle

Time Delay and Frequency Analysis of Remote Microphones

by Elena Andreatta, Igor Caregnato, Antonio Selmo, Andrea Gulli, Marius George Onofrei and Eva Orzan

Audiol. Res. 2025, 15(5), 123; https://doi.org/10.3390/audiolres15050123 - 25 Sep 2025

Viewed by 404

Abstract

Background/Objectives: A.BA.CO. is a speech-to-text captioning system developed for school classrooms. The system uses remote microphones to capture the teacher’s speech without background noise. Under this setup, an issue of signal latency arises for students wearing hearing aids (HAs) or cochlear implants (CIs), [...] Read more.

Background/Objectives: A.BA.CO. is a speech-to-text captioning system developed for school classrooms. The system uses remote microphones to capture the teacher’s speech without background noise. Under this setup, an issue of signal latency arises for students wearing hearing aids (HAs) or cochlear implants (CIs), whose latency is different from that of the remote microphones and may require the development of a temporal coupling solution. This study establishes the foundation for such a solution by determining the latency of two RMs (Remote Microphones) compatible with both HA and CI systems. The frequency response of the systems is analyzed independently and combined. Methods: The RMs combined with two Behind-The-Ear HAs, for which transparency was verified, were tested with two different compression ratios in a laboratory specializing in electroacoustic measurements using the comparison method to assess performance. Results: The time measurements revealed that the RMs differ by 10–12 ms (23–24 ms and 33–35 ms) and that the two HAs have time delays that differ by 1–2 ms (6–7 ms and 5–7 ms). The frequency responses showed that when HA and RM have similar gains, they exhibit comb-filter distortions. This effect could alter the acoustic output of devices in the ear canal and vary according to the mix ratio and mutual positions of HA and RM, potentially necessitating greater commitment from the wearer. Conclusions: The communication system will have to foresee different delays based on the model and brand of RM because similar transmission systems do not have the same time delays. RMs were originally designed for HA and are most effective if they represent the only or major acoustic stimulation that reaches the eardrum. These limits must be considered when estimating the effectiveness of A.BA.CO. with RM. Full article

(This article belongs to the Special Issue Breaking Down Listening Barriers for Students with Hearing Difficulties)

► Show Figures

Figure 1

18 pages, 615 KB

Open AccessArticle

Auditory Processing and Speech Sound Disorders: Behavioral and Electrophysiological Findings

by Konstantinos Drosos, Paris Vogazianos, Dionysios Tafiadis, Louiza Voniati, Alexandra Papanicolaou, Klea Panayidou and Chryssoula Thodi

Audiol. Res. 2025, 15(5), 119; https://doi.org/10.3390/audiolres15050119 - 19 Sep 2025

Viewed by 490

Abstract

Background: Children diagnosed with Speech Sound Disorders (SSDs) encounter difficulties in speech perception, especially when listening in the presence of background noise. Recommended protocols for auditory processing evaluation include behavioral linguistic and speech processing tests, as well as objective electrophysiological measures. The present [...] Read more.

Background: Children diagnosed with Speech Sound Disorders (SSDs) encounter difficulties in speech perception, especially when listening in the presence of background noise. Recommended protocols for auditory processing evaluation include behavioral linguistic and speech processing tests, as well as objective electrophysiological measures. The present study compared the auditory processing profiles of children with SSD and typically developing (TD) children using a battery of behavioral language and auditory tests combined with auditory evoked responses. Methods: Forty (40) parents of 7–10 years old Greek Cypriot children completed parent questionnaires related to their children’s listening; their children completed an assessment comprising language, phonology, auditory processing, and auditory evoked responses. The experimental group included 24 children with a history of SSDs; the control group consisted of 16 TD children. Results: Three factors significantly differentiated SSD from TD children: Factor 1 (auditory processing screening), Factor 5 (phonological awareness), and Factor 13 (Auditory Brainstem Response—ABR wave V latency). Among these, Factor 1 consistently predicted SSD classification both independently and in combined models, indicating strong ecological and diagnostic relevance. This predictive power suggests real-world listening behaviors are central to SSD differentiation. The significant correlation between Factor 5 and Factor 13 may suggest an interaction between auditory processing at the brainstem level and higher-order phonological manipulation. Conclusions: This research underscores the diagnostic significance of integrating behavioral and physiological metrics through dimensional and predictive methodologies. Factor 1, which focuses on authentic listening environments, was identified as the strongest predictor. These results advocate for the inclusion of ecologically valid listening items in the screening for APD. Poor discrimination of speech in noise imposes discrepancies between incoming auditory information and retained phonological representations, which disrupts the implicit processing mechanisms that align auditory input with phonological representations stored in memory. Speech and language pathologists can incorporate pertinent auditory processing assessment findings to identify potential language-processing challenges and formulate more effective therapeutic intervention strategies. Full article

(This article belongs to the Section Speech and Language)

► Show Figures

Figure 1

17 pages, 8430 KB

Open AccessArticle

Robust Audio–Visual Speaker Localization in Noisy Aircraft Cabins for Inflight Medical Assistance

by Qiwu Qin and Yian Zhu

Sensors 2025, 25(18), 5827; https://doi.org/10.3390/s25185827 - 18 Sep 2025

Viewed by 533

Abstract

Active Speaker Localization (ASL) involves identifying both who is speaking and where they are speaking from within audiovisual content. This capability is crucial in constrained and acoustically challenging environments, such as aircraft cabins during in-flight medical emergencies. In this paper, we propose a [...] Read more.

Active Speaker Localization (ASL) involves identifying both who is speaking and where they are speaking from within audiovisual content. This capability is crucial in constrained and acoustically challenging environments, such as aircraft cabins during in-flight medical emergencies. In this paper, we propose a novel end-to-end Cross-Modal Audio–Visual Fusion Network (CMAVFN) designed specifically for ASL under real-world aviation conditions, which are characterized by engine noise, dynamic lighting, occlusions from seats or oxygen masks, and frequent speaker turnover. Our model directly processes raw video frames and multi-channel ambient audio, eliminating the need for intermediate face detection pipelines. It anchors spatially resolved visual features with directional audio cues using a cross-modal attention mechanism. To enhance spatiotemporal reasoning, we introduce a dual-branch localization decoder and a cross-modal auxiliary supervision loss. Extensive experiments on public datasets (AVA-ActiveSpeaker, EasyCom) and our domain-specific AirCabin-ASL benchmark demonstrate that CMAVFN achieves robust speaker localization in noisy, occluded, and multi-speaker aviation scenarios. This framework offers a practical foundation for speech-driven interaction systems in aircraft cabins, enabling applications such as real-time crew assistance, voice-based medical documentation, and intelligent in-flight health monitoring. Full article

(This article belongs to the Special Issue Advanced Biomedical Imaging and Signal Processing)

► Show Figures

Figure 1

Search Results (590)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (590)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI