Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = cochleogram

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 6983 KB  
Article
Cochleogram-Based Speech Emotion Recognition with the Cascade of Asymmetric Resonators with Fast-Acting Compression Using Time-Distributed Convolutional Long Short-Term Memory and Support Vector Machines
by Cevahir Parlak
Biomimetics 2025, 10(3), 167; https://doi.org/10.3390/biomimetics10030167 - 10 Mar 2025
Viewed by 840
Abstract
Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human [...] Read more.
Feature extraction is a crucial stage in speech emotion recognition applications, and filter banks with their related statistical functions are widely used for this purpose. Although Mel filters and MFCCs achieve outstanding results, they do not perfectly model the structure of the human ear, as they use a simplified mechanism to simulate the functioning of human cochlear structures. The Mel filters system is not a perfect representation of human hearing, but merely an engineering shortcut to suppress the pitch and low-frequency components, which have little use in traditional speech recognition applications. However, speech emotion recognition classification is heavily related to pitch and low-frequency component features. The newly tailored CARFAC 24 model is a sophisticated system for analyzing human speech and is designed to best simulate the functionalities of the human cochlea. In this study, we use the CARFAC 24 system for speech emotion recognition and compare it with state-of-the-art systems using speaker-independent studies conducted with Time-Distributed Convolutional LSTM networks and Support Vector Machines, with the use of the ASED and the NEMO emotional speech dataset. The results demonstrate that CARFAC 24 is a valuable alternative to Mel and MFCC features in speech emotion recognition applications. Full article
Show Figures

Figure 1

23 pages, 1282 KB  
Article
Classification of Adventitious Sounds Combining Cochleogram and Vision Transformers
by Loredana Daria Mang, Francisco David González Martínez, Damian Martinez Muñoz, Sebastián García Galán and Raquel Cortina
Sensors 2024, 24(2), 682; https://doi.org/10.3390/s24020682 - 21 Jan 2024
Cited by 10 | Viewed by 3359
Abstract
Early identification of respiratory irregularities is critical for improving lung health and reducing global mortality rates. The analysis of respiratory sounds plays a significant role in characterizing the respiratory system’s condition and identifying abnormalities. The main contribution of this study is to investigate [...] Read more.
Early identification of respiratory irregularities is critical for improving lung health and reducing global mortality rates. The analysis of respiratory sounds plays a significant role in characterizing the respiratory system’s condition and identifying abnormalities. The main contribution of this study is to investigate the performance when the input data, represented by cochleogram, is used to feed the Vision Transformer (ViT) architecture, since this input–classifier combination is the first time it has been applied to adventitious sound classification to our knowledge. Although ViT has shown promising results in audio classification tasks by applying self-attention to spectrogram patches, we extend this approach by applying the cochleogram, which captures specific spectro-temporal features of adventitious sounds. The proposed methodology is evaluated on the ICBHI dataset. We compare the classification performance of ViT with other state-of-the-art CNN approaches using spectrogram, Mel frequency cepstral coefficients, constant-Q transform, and cochleogram as input data. Our results confirm the superior classification performance combining cochleogram and ViT, highlighting the potential of ViT for reliable respiratory sound classification. This study contributes to the ongoing efforts in developing automatic intelligent techniques with the aim to significantly augment the speed and effectiveness of respiratory disease detection, thereby addressing a critical need in the medical field. Full article
(This article belongs to the Special Issue Advanced Machine Intelligence for Biomedical Signal Processing)
Show Figures

Figure 1

16 pages, 2143 KB  
Article
Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition
by Wondimu Lambamo, Ramasamy Srinivasagan and Worku Jifara
Appl. Sci. 2023, 13(1), 569; https://doi.org/10.3390/app13010569 - 31 Dec 2022
Cited by 8 | Viewed by 3871
Abstract
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in [...] Read more.
The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition. Full article
(This article belongs to the Special Issue AI, Machine Learning and Deep Learning in Signal Processing)
Show Figures

Figure 1

14 pages, 2535 KB  
Article
Anti-PD-1 Therapy Does Not Influence Hearing Ability in the Most Sensitive Frequency Range, but Mitigates Outer Hair Cell Loss in the Basal Cochlear Region
by Judit Szepesy, Gabriella Miklós, János Farkas, Dániel Kucsera, Zoltán Giricz, Anita Gáborján, Gábor Polony, Ágnes Szirmai, László Tamás, László Köles, Zoltán V. Varga and Tibor Zelles
Int. J. Mol. Sci. 2020, 21(18), 6701; https://doi.org/10.3390/ijms21186701 - 13 Sep 2020
Cited by 5 | Viewed by 3663
Abstract
The administration of immune checkpoint inhibitors (ICIs) often leads to immune-related adverse events. However, their effect on auditory function is largely unexplored. Thorough preclinical studies have not been published yet, only sporadic cases and pharmacovigilance reports suggest their significance. Here we investigated the [...] Read more.
The administration of immune checkpoint inhibitors (ICIs) often leads to immune-related adverse events. However, their effect on auditory function is largely unexplored. Thorough preclinical studies have not been published yet, only sporadic cases and pharmacovigilance reports suggest their significance. Here we investigated the effect of anti-PD-1 antibody treatment (4 weeks, intraperitoneally, 200 μg/mouse, 3 times/week) on hearing function and cochlear morphology in C57BL/6J mice. ICI treatment did not influence the hearing thresholds in click or tone burst stimuli at 4–32 kHz frequencies measured by auditory brainstem response. The number and morphology of spiral ganglion neurons were unaltered in all cochlear turns. The apical-middle turns (<32 kHz) showed preservation of the inner and outer hair cells (OHCs), whilst ICI treatment mitigated the age-related loss of OHCs in the basal turn (>32 kHz). The number of Iba1-positive macrophages has also increased moderately in this high frequency region. We conclude that a 4-week long ICI treatment does not affect functional and morphological integrity of the inner ear in the most relevant hearing range (4–32 kHz; apical-middle turns), but a noticeable preservation of OHCs and an increase in macrophage activity appeared in the >32 kHz basal part of the cochlea. Full article
(This article belongs to the Special Issue Immunoglobulins in Inflammation)
Show Figures

Figure 1

Back to TopTop