MDPI - Publisher of Open Access Journals

28 pages, 13595 KiB

Open AccessArticle

Open-Set Recognition of Environmental Sound Based on KDE-GAN and Attractor–Reciprocal Point Learning

by Jiakuan Wu, Nan Wang, Huajie Hong, Wei Wang, Kunsheng Xing and Yujie Jiang

Acoustics 2025, 7(2), 33; https://doi.org/10.3390/acoustics7020033 - 28 May 2025

Viewed by 25

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based [...] Read more.

While open-set recognition algorithms have been extensively explored in computer vision, their application to environmental sound analysis remains understudied. To address this gap, this study investigates how to effectively recognize unknown sound categories in real-world environments by proposing a novel Kernel Density Estimation-based Generative Adversarial Network (KDE-GAN) for data augmentation combined with Attractor–Reciprocal Point Learning for open-set classification. Specifically, our approach addresses three key challenges: (1) How to generate boundary-aware synthetic samples for robust open-set training: A closed-set classifier’s pre-logit layer outputs are fed into the KDE-GAN, which synthesizes samples mapped to the logit layer using the classifier’s original weights. Kernel Density Estimation then enforces Density Loss and Offset Loss to ensure these samples align with class boundaries. (2) How to optimize feature space organization: The closed-set classifier is constrained by an Attractor–Reciprocal Point joint loss, maintaining intra-class compactness while pushing unknown samples toward low-density regions. (3) How to evaluate performance in highly open scenarios: We validate the method using UrbanSound8K, AudioEventDataset, and TUT Acoustic Scenes 2017 as closed sets, with ESC-50 categories as open-set samples, achieving AUROC/OSCR scores of 0.9251/0.8743, 0.7921/0.7135, and 0.8209/0.6262, respectively. The findings demonstrate the potential of this framework to enhance environmental sound monitoring systems, particularly in applications requiring adaptability to unseen acoustic events (e.g., urban noise surveillance or wildlife monitoring). Full article

► Show Figures

Figure 1

18 pages, 2345 KiB

Open AccessArticle

SGM-EMA: Speech Enhancement Method Score-Based Diffusion Model and EMA Mechanism

by Yuezhou Wu, Zhiri Li and Hua Huang

Appl. Sci. 2025, 15(10), 5243; https://doi.org/10.3390/app15105243 - 8 May 2025

Viewed by 415

Abstract

The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using [...] Read more.

The score-based diffusion model has made significant progress in the field of computer vision, surpassing the performance of generative models, such as variational autoencoders, and has been extended to applications such as speech enhancement and recognition. This paper proposes a U-Net architecture using a score-based diffusion model and an efficient multi-scale attention mechanism (EMA) for the speech enhancement task. The model leverages the symmetric structure of U-Net to extract speech features and captures contextual information and local details across different scales using the EMA mechanism, improving speech quality in noisy environments. We evaluate the method on the VoiceBank-DEMAND (VB-DMD) dataset and the DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus–TUT Sound Events 2017 (TIMIT-TUT) dataset. The experimental results show that the proposed model performed well in terms of speech quality perception (PESQ), extended short-time objective intelligibility (ESTOI), and scale-invariant signal-to-distortion ratio (SI-SDR). Especially when processing out-of-dataset noisy speech, the proposed method achieved excellent speech enhancement results compared to other methods, demonstrating the model’s strong generalization capability. We also conducted an ablation study on the SDE solver and the EMA mechanism, and the results show that the reverse diffusion method outperformed the Euler–Maruyama method, and the EMA strategy could improve the model performance. The results demonstrate the effectiveness of these two techniques in our system. Nevertheless, since the model is specifically designed for Gaussian noise, its performance under non-Gaussian or complex noise conditions may be limited. Full article

(This article belongs to the Special Issue Application of Deep Learning in Speech Enhancement Technology)

► Show Figures

Figure 1

27 pages, 7054 KiB

Open AccessArticle

An Ensemble of Convolutional Neural Networks for Sound Event Detection

by Abdinabi Mukhamadiyev, Ilyos Khujayarov, Dilorom Nabieva and Jinsoo Cho

Mathematics 2025, 13(9), 1502; https://doi.org/10.3390/math13091502 - 1 May 2025

Viewed by 346

Abstract

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings [...] Read more.

Sound event detection tasks are rapidly advancing in the field of pattern recognition, and deep learning methods are particularly well suited for such tasks. One of the important directions in this field is to detect the sounds of emotional events around residential buildings in smart cities and quickly assess the situation for security purposes. This research presents a comprehensive study of an ensemble convolutional recurrent neural network (CRNN) model designed for sound event detection (SED) in residential and public safety contexts. The work focuses on extracting meaningful features from audio signals using image-based representation, such as Discrete Cosine Transform (DCT) spectrograms, Cocheagrams, and Mel spectrograms, to enhance robustness against noise and improve feature extraction. In collaboration with police officers, a two-hour dataset consisting of 112 clips related to four classes of emotional sounds, such as harassment, quarrels, screams, and breaking sounds, was prepared. In addition to the crowdsourced dataset, publicly available datasets were used to broaden the study’s applicability. Our dataset contains 5055 audio files of different lengths totaling 14.14 h and strongly labeled data. The dataset consists of 13 separate sound categories. The proposed CRNN model integrates spatial and temporal feature extraction by processing these spectrograms through convolution and bi-directional gated recurrent unit (GRU) layers. An ensemble approach combines predictions from three models, achieving F1 scores of 71.5% for segment-based metrics and 46% for event-based metrics. The results demonstrate the model’s effectiveness in detecting sound events under noisy conditions, even with a small, unbalanced dataset. This research highlights the potential of the model for real-time audio surveillance systems using mini-computers, offering cost-effective and accurate solutions for maintaining public order. Full article

(This article belongs to the Special Issue Advanced Machine Vision with Mathematics)

► Show Figures

Figure 1

12 pages, 2289 KiB

Open AccessArticle

Local Time-Frequency Feature Fusion Using Cross-Attention for Acoustic Scene Classification

by Rong Huang, Yue Xie and Pengxu Jiang

Symmetry 2025, 17(1), 49; https://doi.org/10.3390/sym17010049 - 30 Dec 2024

Viewed by 739

Abstract

To address the interdependence of local time-frequency information in audio scene recognition, a segment-based time-frequency feature fusion method based on cross-attention is proposed. Since audio scene recognition is highly sensitive to individual sound events within a scene, the input features are segmented into [...] Read more.

To address the interdependence of local time-frequency information in audio scene recognition, a segment-based time-frequency feature fusion method based on cross-attention is proposed. Since audio scene recognition is highly sensitive to individual sound events within a scene, the input features are segmented into multiple segments along the time dimension to obtain local features, allowing the subsequent attention mechanism to focus on the time slices of key sound events. Furthermore, to leverage the advantages of both convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which are mainstream structures in audio scene recognition tasks, this paper employs a symmetry structure to separately obtain the time-frequency features output by CNNs and RNNs and then fuses the two sets of features using cross-attention. Experiments on the TUT2018, TAU2019, and TAU2020 datasets demonstrate that the performance of this algorithm improves the official baseline results by 17.78%, 15.95%, and 20.13%, respectively. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

17 pages, 2093 KiB

Open AccessArticle

Investigation of Data Augmentation Techniques in Environmental Sound Recognition

by Anastasios Loukas Sarris, Nikolaos Vryzas, Lazaros Vrysis and Charalampos Dimoulas

Electronics 2024, 13(23), 4719; https://doi.org/10.3390/electronics13234719 - 28 Nov 2024

Viewed by 903

Abstract

The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck [...] Read more.

The majority of sound events that occur in everyday life, like those caused by animals or household devices, can be included in the environmental sound family. This audio category has not been researched as much as music or speech recognition. One main bottleneck in the design of environmental data-driven monitoring automation is the lack of sufficient data representing each of a wide range of categories. In the context of audio data, an important method to increase the available data is the process of the augmentation of existing datasets. In this study, some of the most widespread time domain data augmentation techniques are studied, along with their effects on the recognition of environmental sounds, through the UrbanSound8K dataset, which consists of ten classes. The confusion matrix and the metrics that can be calculated based on the matrix were used to examine the effect of the augmentation. Also, to address the difficulty that arises when large datasets are augmented, a web-based data augmentation application was created. To evaluate the performance of the data augmentation techniques, a convolutional neural network architecture trained on the original set was used. Moreover, four time domain augmentation techniques were used. Although the parameters of the techniques applied were chosen conservatively, they helped the model to better cluster the data, especially in the four classes in which confusion was high in the initial classification. Furthermore, a web application is presented in which the user can upload their own data and apply these data augmentation techniques to both the audio extract and its time frequency representation, the spectrogram. Full article

(This article belongs to the Special Issue Recent Advances in Audio, Speech and Music Processing and Analysis)

► Show Figures

Figure 1

12 pages, 1513 KiB

Open AccessArticle

Emotion-Recognition System for Smart Environments Using Acoustic Information (ERSSE)

by Gabriela Santiago, Jose Aguilar and Rodrigo García

Information 2024, 15(11), 677; https://doi.org/10.3390/info15110677 - 30 Oct 2024

Viewed by 1337

Abstract

Acoustic management is very important for detecting possible events in the context of a smart environment (SE). In previous works, we proposed a reflective middleware for acoustic management (ReM-AM) and its autonomic cycles of data analysis tasks, along with its ontology-driven architecture. In [...] Read more.

Acoustic management is very important for detecting possible events in the context of a smart environment (SE). In previous works, we proposed a reflective middleware for acoustic management (ReM-AM) and its autonomic cycles of data analysis tasks, along with its ontology-driven architecture. In this work, we aim to develop an emotion-recognition system for ReM-AM that uses sound events, rather than speech, as its main focus. The system is based on a sound pattern for emotion recognition and the autonomic cycle of intelligent sound analysis (ISA), defined by three tasks: variable extraction, sound data analysis, and emotion recommendation. We include a case study to test our emotion-recognition system in a simulation of a smart movie theater, with different situations taking place. The implementation and verification of the tasks show a promising performance in the case study, with 80% accuracy in sound recognition, and its general behavior shows that it can contribute to improving the well-being of the people present in the environment. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

21 pages, 12963 KiB

Open AccessArticle

A Multi-Task Network: Improving Unmanned Underwater Vehicle Self-Noise Separation via Sound Event Recognition

by Wentao Shi, Dong Chen, Fenghua Tian, Shuxun Liu and Lianyou Jing

J. Mar. Sci. Eng. 2024, 12(9), 1563; https://doi.org/10.3390/jmse12091563 - 5 Sep 2024

Viewed by 973

Abstract

The performance of an Unmanned Underwater Vehicle (UUV) is significantly influenced by the magnitude of self-generated noise, making it a crucial factor in advancing acoustic load technologies. Effective noise management, through the identification and separation of various self-noise types, is essential for enhancing [...] Read more.

The performance of an Unmanned Underwater Vehicle (UUV) is significantly influenced by the magnitude of self-generated noise, making it a crucial factor in advancing acoustic load technologies. Effective noise management, through the identification and separation of various self-noise types, is essential for enhancing a UUV’s reception capabilities. This paper concentrates on the development of UUV self-noise separation techniques, with a particular emphasis on feature extraction and separation in multi-task learning environments. We introduce an enhancement module designed to leverage noise categorization for improved network efficiency. Furthermore, we propose a neural network-based multi-task framework for the identification and separation of self-noise, the efficacy of which is substantiated by experimental trials conducted in a lake setting. The results demonstrate that our network outperforms the Conv-tasnet baseline, achieving a 0.99 dB increase in Signal-to-Interference-plus-Noise Ratio (SINR) and a 0.05 enhancement in the recognized energy ratio. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

20 pages, 3915 KiB

Open AccessArticle

A Study of Improved Two-Stage Dual-Conv Coordinate Attention Model for Sound Event Detection and Localization

by Guorong Chen, Yuan Yu, Yuan Qiao, Junliang Yang, Chongling Du, Zhang Qian and Xiao Huang

Sensors 2024, 24(16), 5336; https://doi.org/10.3390/s24165336 - 18 Aug 2024

Viewed by 1176

Abstract

Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization [...] Read more.

Sound Event Detection and Localization (SELD) is a comprehensive task that aims to solve the subtasks of Sound Event Detection (SED) and Sound Source Localization (SSL) simultaneously. The task of SELD lies in the need to solve both sound recognition and spatial localization problems, and different categories of sound events may overlap in time and space, making it more difficult for the model to distinguish between different events occurring at the same time and to locate the sound source. In this study, the Dual-conv Coordinate Attention Module (DCAM) combines dual convolutional blocks and Coordinate Attention, and based on this, the network architecture based on the two-stage strategy is improved to form the SELD-oriented Two-Stage Dual-conv Coordinate Attention Model (TDCAM) for SELD. TDCAM draws on the concepts of Visual Geometry Group (VGG) networks and Coordinate Attention to effectively capture critical local information by focusing on the coordinate space information of the feature map and dealing with the relationship between the feature map channels to enhance the feature selection capability of the model. To address the limitation of a single-layer Bi-directional Gated Recurrent Unit (Bi-GRU) in the two-stage network in terms of timing processing, we add to the structure of the two-layer Bi-GRU and introduce the data enhancement techniques of the frequency mask and time mask to improve the modeling and generalization ability of the model for timing features. Through experimental validation on the TAU Spatial Sound Events 2019 development dataset, our approach significantly improves the performance of SELD compared to the two-stage network baseline model. Furthermore, the effectiveness of DCAM and the two-layer Bi-GRU structure is confirmed by performing ablation experiments. Full article

(This article belongs to the Special Issue Sensors and Techniques for Indoor Positioning and Localization)

► Show Figures

Figure 1

24 pages, 3882 KiB

Open AccessArticle

Open-Set Recognition of Pansori Rhythm Patterns Based on Audio Segmentation

by Jie You and Joonwhoan Lee

Appl. Sci. 2024, 14(16), 6893; https://doi.org/10.3390/app14166893 - 6 Aug 2024

Viewed by 1058

Abstract

Pansori, a traditional Korean form of musical storytelling, is characterized by performances involving a vocalist and a drummer. It is well-known for the singer’s expressive narrative (aniri) and delicate gesture with fan in hand. The classical Pansori repertoires mostly tell love, satire, and [...] Read more.

Pansori, a traditional Korean form of musical storytelling, is characterized by performances involving a vocalist and a drummer. It is well-known for the singer’s expressive narrative (aniri) and delicate gesture with fan in hand. The classical Pansori repertoires mostly tell love, satire, and humor, as well as some social lessons. These performances, which can extend from three to five hours, necessitate that the vocalist adheres to precise rhythmic structures. The distinctive rhythms of Pansori are crucial for conveying both the narrative and musical expression effectively. This paper explores the challenge of open-set recognition, aiming to efficiently identify unknown Pansori rhythm patterns while applying the methodology to diverse acoustic datasets, such as sound events and genres. We propose a lightweight deep learning-based encoder–decoder segmentation model, which employs a 2-D log-Mel spectrogram as input for the encoder and produces a frame-based 1-D decision along the temporal axis. This segmentation approach, processing 2-D inputs to classify frame-wise rhythm patterns, proves effective in detecting unknown patterns within time-varying sound streams encountered in daily life. Throughout the training phase, both center and supervised contrastive losses, along with cross-entropy loss, are minimized. This strategy aimed to create a compact cluster structure within the feature space for known classes, thereby facilitating the recognition of unknown rhythm patterns by allocating ample space for their placement within the embedded feature space. Comprehensive experiments utilizing various datasets—including Pansori rhythm patterns (91.8%), synthetic datasets of instrument sounds (95.1%), music genres (76.9%), and sound datasets from DCASE challenges (73.0%)—demonstrate the efficacy of our proposed method to detect unknown events, as evidenced by the AUROC metrics. Full article

(This article belongs to the Special Issue Algorithmic Music and Sound Computing)

► Show Figures

Figure 1

16 pages, 4193 KiB

Open AccessArticle

Combined Data Augmentation on EANN to Identify Indoor Anomalous Sound Event

by Xiyu Song, Junhan Xiong, Mei Wang, Qingshan Mei and Xiaodong Lin

Appl. Sci. 2024, 14(4), 1327; https://doi.org/10.3390/app14041327 - 6 Feb 2024

Cited by 2 | Viewed by 1492

Abstract

Indoor abnormal sound event identification refers to the automatic detection and recognition of abnormal sounds in an indoor environment using computer auditory technology. However, the process of model training usually requires a large amount of high-quality data, which can be time-consuming and costly [...] Read more.

Indoor abnormal sound event identification refers to the automatic detection and recognition of abnormal sounds in an indoor environment using computer auditory technology. However, the process of model training usually requires a large amount of high-quality data, which can be time-consuming and costly to collect. Utilizing limited data has become another preferred approach for such research, but it introduces overfitting issues for machine learning models on small datasets. To overcome this issue, we proposed and validated the framework of combining the offline augmentation of raw audio and online augmentation of spectral features, making the application of small datasets in indoor anomalous sound event identification more feasible. Along with this, an improved two-dimensional audio convolutional neural network (EANN) was also proposed to evaluate and compare the impacts of different data augmentation methods under the framework on the sensitivity of sound event identification. Moreover, we further investigated the performance of four combinations of data augmentation techniques. Our research shows that the proposed combined data augmentation method has an accuracy of 97.4% on the test dataset, which is 10.6% higher than the baseline method. This demonstrates the method’s potential in the identification of indoor abnormal sound events. Full article

► Show Figures

Figure 1

15 pages, 11940 KiB

Open AccessArticle

An Investigation of ECAPA-TDNN Audio Type Recognition Method Based on Mel Acoustic Spectrograms

by Jian Wang, Zhongzheng Wang, Xingcheng Han and Yan Han

Electronics 2023, 12(21), 4421; https://doi.org/10.3390/electronics12214421 - 27 Oct 2023

Cited by 3 | Viewed by 2787

Abstract

Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, [...] Read more.

Audio signals play a crucial role in our perception of our surroundings. People rely on sound to assess motion, distance, direction, and environmental conditions, aiding in danger avoidance and decision making. However, in real-world environments, during the acquisition and transmission of audio signals, we often encounter various types of noises that interfere with the intended signals. As a result, the essential features of audio signals become significantly obscured. Under the interference of strong noise, identifying noise segments or sound segments, and distinguishing audio types becomes pivotal for detecting specific events and sound patterns or isolating abnormal sounds. This study analyzes the characteristics of Mel’s acoustic spectrogram, explores the application of the deep learning ECAPA-TDNN method for audio type recognition, and substantiates its effectiveness through experiments. Ultimately, the experimental results demonstrate that the deep learning ECAPA-TDNN method for audio type recognition, utilizing Mel’s acoustic spectrogram as features, achieves a notably high recognition accuracy. Full article

(This article belongs to the Special Issue Emerging Trends in Advanced Video and Sequence Technology)

► Show Figures

Figure 1

16 pages, 2703 KiB

Open AccessArticle

Automatic Robust Crackle Detection and Localization Approach Using AR-Based Spectral Estimation and Support Vector Machine

by Loredana Daria Mang, Julio José Carabias-Orti, Francisco Jesús Canadas-Quesada, Juan de la Torre-Cruz, Antonio Muñoz-Montoro, Pablo Revuelta-Sanz and Eilas Fernandez Combarro

Appl. Sci. 2023, 13(19), 10683; https://doi.org/10.3390/app131910683 - 26 Sep 2023

Cited by 2 | Viewed by 1470

Abstract

Auscultation primarily relies upon the acoustic expertise of individual doctors in identifying, through the use of a stethoscope, the presence of abnormal sounds such as crackles because the recognition of these sound patterns has critical importance in the context of early detection and [...] Read more.

Auscultation primarily relies upon the acoustic expertise of individual doctors in identifying, through the use of a stethoscope, the presence of abnormal sounds such as crackles because the recognition of these sound patterns has critical importance in the context of early detection and diagnosis of respiratory pathologies. In this paper, we propose a novel method combining autoregressive (AR)-based spectral features and a support vector machine (SVM) classifier to detect the presence of crackle events and their temporal location within the input signal. A preprocessing stage is performed to discard information out of the band of interest and define the segments for short-time signal analysis. The AR parameters are estimated for each segment to be classified by means of support vector machine (SVM) classifier into crackles and normal lung sounds using a set of synthetic crackle waveforms that have been modeled to train the classifier. A dataset composed of simulated and real coarse and fine crackles sound signals was created with several signal-to-noise (SNR) ratios to evaluate the robustness of the proposed method. Each simulated and real signal was mixed with noise that shows the same spectral energy distribution as typically found in breath noise from a healthy subject. This study makes a significant contribution by achieving competitive results. The proposed method yields values ranging from 80% in the lowest signal-to-noise ratio scenario to a perfect 100% in the highest signal-to-noise ratio scenario. Notably, these results surpass those of other methods presented by a margin of at least 15%. The combination of an autoregressive (AR) model with a support vector machine (SVM) classifier offers an effective solution for detecting the presented events. This approach exhibits enhanced robustness against variations in the signal-to-noise ratio that the input signals may encounter. Full article

(This article belongs to the Special Issue Pattern Recognition and Artificial Intelligence in Biomedical Signal Processing)

► Show Figures

Figure 1

18 pages, 1423 KiB

Open AccessArticle

Simplicial Homology Global Optimization of EEG Signal Extraction for Emotion Recognition

by Ahmed Roshdy, Samer Al Kork, Taha Beyrouthy and Amine Nait-ali

Robotics 2023, 12(4), 99; https://doi.org/10.3390/robotics12040099 - 11 Jul 2023

Cited by 4 | Viewed by 2481

Abstract

Emotion recognition is a vital part of human functioning. textcolorredIt enables individuals to respond suitably to environmental events and develop self-awareness. The fast-paced developments in brain–computer interfacing (BCI) technology necessitate that intelligent machines of the future be able to digitize and recognize human [...] Read more.

Emotion recognition is a vital part of human functioning. textcolorredIt enables individuals to respond suitably to environmental events and develop self-awareness. The fast-paced developments in brain–computer interfacing (BCI) technology necessitate that intelligent machines of the future be able to digitize and recognize human emotions. To achieve this, both humans and machines have relied on facial expressions, in addition to other visual cues. While facial expressions are effective in recognizing emotions, they can be artificially replicated and require constant monitoring. In recent years, the use of Electroencephalography (EEG) signals has become a popular method for emotion recognition, thanks to advances in deep learning and machine learning techniques. EEG-based systems for recognizing emotions involve measuring electrical activity in the brain of a subject who is exposed to emotional stimuli such as images, sounds, or videos. Machine learning algorithms are then used to extract features from the electrical activity data that correspond to specific emotional states. The quality of the extracted EEG signal is crucial, as it affects the overall complexity of the system and the accuracy of the machine learning algorithm. This article presents an approach to improve the accuracy of EEG-based emotion recognition systems while reducing their complexity. The approach involves optimizing the number of EEG channels, their placement on the human scalp, and the target frequency band of the measured signal to maximize the difference between high and low arousal levels. The optimization method, called the simplicial homology global optimization (SHGO), is used for this purpose. Experimental results demonstrate that a six-electrode configuration optimally placed can achieve a better level of accuracy than a 14-electrode configuration, resulting in an over 60% reduction in complexity in terms of the number of electrodes. This method demonstrates promising results in improving the efficiency and accuracy of EEG-based emotion recognition systems, which could have implications for various fields, including healthcare, psychology, and human–computer interfacing. Full article

(This article belongs to the Section Sensors and Control in Robotics)

► Show Figures

Figure 1

16 pages, 1984 KiB

Open AccessArticle

Neural Indicators of Visual Andauditory Recognition of Imitative Words on Different De-Iconization Stages

by Liubov Tkacheva, Maria Flaksman, Yulia Sedelkina, Yulia Lavitskaya, Andrey Nasledov and Elizaveta Korotaevskaya

Brain Sci. 2023, 13(4), 681; https://doi.org/10.3390/brainsci13040681 - 19 Apr 2023

Viewed by 1600

Abstract

The research aims to reveal neural indicators of recognition for iconic words and the possible cross-modal multisensory integration behind this process. The goals of this research are twofold: (1) to register event-related potentials (ERP) in the brain in the process of visual and [...] Read more.

The research aims to reveal neural indicators of recognition for iconic words and the possible cross-modal multisensory integration behind this process. The goals of this research are twofold: (1) to register event-related potentials (ERP) in the brain in the process of visual and auditory recognition of Russian imitative words on different de-iconization stages; and (2) to establish whether differences in the brain activity arise while processing visual and auditory stimuli of different nature. Sound imitative (onomatopoeic, mimetic, and ideophonic) words are words with iconic correlation between form and meaning (iconicity being a relationship of resemblance). Russian adult participants (n = 110) were presented with 15 stimuli both visually and auditorily. The stimuli material was equally distributed into three groups according to the criterion of (historical) iconicity loss: five explicit sound imitative (SI) words, five implicit SI words and five non-SI words. It was established that there was no statistically significant difference between visually presented explicit or implicit SI words and non-SI words respectively. However, statistically significant differences were registered for auditorily presented explicit SI words in contrast to implicit SI words in the N400 ERP component, as well as implicit SI words in contrast to non-SI words in the P300 ERP component. We thoroughly analyzed the integrative brain activity in response to explicit IS words and compared it to that in response to implicit SI and non-SI words presented auditorily. The data yielded by this analysis showed the N400 ERP component was more prominent during the recognition process of the explicit SI words received from the central channels (specifically Cz). We assume that these results indicate a specific brain response associated with directed attention in the process of performing cognitive decision making tasks regarding explicit and implicit SI words presented auditorily. This may reflect a higher level of cognitive complexity in identifying this type of stimuli considering the experimental task challenges that may involve cross-modal integration process. Full article

(This article belongs to the Special Issue The Neural Basis of Multisensory Plasticity)

► Show Figures

Figure 1

18 pages, 1013 KiB

Open AccessArticle

Sound of Daily Living Identification Based on Hierarchical Situation Audition

by Jiaxuan Wu, Yunfei Feng and Carl K. Chang

Sensors 2023, 23(7), 3726; https://doi.org/10.3390/s23073726 - 4 Apr 2023

Viewed by 1731

Abstract

One of the key objectives in developing IoT applications is to automatically detect and identify human activities of daily living (ADLs). Mobile phone users are becoming more accepting of sharing data captured by various built-in sensors. Sounds detected by smartphones are processed in [...] Read more.

One of the key objectives in developing IoT applications is to automatically detect and identify human activities of daily living (ADLs). Mobile phone users are becoming more accepting of sharing data captured by various built-in sensors. Sounds detected by smartphones are processed in this work. We present a hierarchical identification system to recognize ADLs by detecting and identifying certain sounds taking place in a complex audio situation (AS). Three major categories of sound are discriminated in terms of signal duration. These are persistent background noise (PBN), non-impulsive long sounds (NILS), and impulsive sound (IS). We first analyze audio signals in a situation-aware manner and then map the sounds of daily living (SDLs) to ADLs. A new hierarchical audible event (AE) recognition approach is proposed that classifies atomic audible actions (AAs), then computes pre-classified portions of atomic AAs energy in one AE session, and finally marks the maximum-likelihood ADL label as the outcome. Our experiments demonstrate that the proposed hierarchical methodology is effective in recognizing SDLs and, thus, also in detecting ADLs with a remarkable performance for other known baseline systems. Full article

(This article belongs to the Special Issue Sensors for Non-intrusive Human Activity Monitoring)

► Show Figures

Figure 1

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (38)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI