MDPI - Publisher of Open Access Journals

20 pages, 2732 KB

Open AccessArticle

Redesigning Multimodal Interaction: Adaptive Signal Processing and Cross-Modal Interaction for Hands-Free Computer Interaction

by Bui Hong Quan, Nguyen Dinh Tuan Anh, Hoang Van Phi and Bui Trung Thanh

Sensors 2025, 25(17), 5411; https://doi.org/10.3390/s25175411 - 2 Sep 2025

Viewed by 362

Abstract

Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech [...] Read more.

Hands-free computer interaction is a key topic in assistive technology, with camera-based and voice-based systems being the most common methods. Recent camera-based solutions leverage facial expressions or head movements to simulate mouse clicks or key presses, while voice-based systems enable control via speech commands, wake-word detection, and vocal gestures. However, existing systems often suffer from limitations in responsiveness and accuracy, especially under real-world conditions. In this paper, we present 3-Modal Human-Computer Interaction (3M-HCI), a novel interaction system that dynamically integrates facial, vocal, and eye-based inputs through a new signal processing pipeline and a cross-modal coordination mechanism. This approach not only enhances recognition accuracy but also reduces interaction latency. Experimental results demonstrate that 3M-HCI outperforms several recent hands-free interaction solutions in both speed and precision, highlighting its potential as a robust assistive interface. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

18 pages, 3373 KB

Open AccessArticle

Framework for Classification of Fattening Pig Vocalizations in a Conventional Farm with High Relevance for Practical Application

by Thies J. Nicolaisen, Katharina E. Bollmann, Isabel Hennig-Pauka and Sarah C. L. Fischer

Animals 2025, 15(17), 2572; https://doi.org/10.3390/ani15172572 - 1 Sep 2025

Viewed by 329

Abstract

The vocal repertoire of the domestic pig (Sus scrofa domesticus) was examined in this study under conventional housing conditions. Therefore, direct behavior-associated vocalizations of fattening pigs were recorded and assigned to behavioral categories. Subsequently, a mathematical analysis of the recorded vocalizations [...] Read more.

The vocal repertoire of the domestic pig (Sus scrofa domesticus) was examined in this study under conventional housing conditions. Therefore, direct behavior-associated vocalizations of fattening pigs were recorded and assigned to behavioral categories. Subsequently, a mathematical analysis of the recorded vocalizations was conducted using the frequency-based parameters of 25%, 50% and 75% quantiles of the frequency spectrum and the time-based parameters of variance of the time signal, mean level of the individual amplitude modulation and cumulative amplitude modulation. Most vocalizations were positively/neutrally assessed vocalizations constituting 59.7%, of which grunting was by far the most frequent vocalization. Negatively assessed vocalizations accounted for 37.8% of all vocalizations. Data analysis based on the six parameters resulted in a distinguishability of vocalizations related to negatively valenced behavior from those related to positively/neutrally valenced behavior. The study illustrates the relationship between auditory sensory perception and the underlying mathematical signals. It shows how pig vocalizations assessed by observations, for example, as positive or negative, are distinguishable using mathematical parameters but also which ambiguities arise when objective mathematical features widely overlap. In this way, the study encourages the use of more complex algorithms in the future to solve this challenging, multidimensional problem, forming the basis for future automatic detection of negative pig vocalizations. Full article

(This article belongs to the Special Issue Animal Health and Welfare Assessment of Pigs)

► Show Figures

Figure 1

13 pages, 1305 KB

Open AccessArticle

Fine-Tuning BirdNET for the Automatic Ecoacoustic Monitoring of Bird Species in the Italian Alpine Forests

by Giacomo Schiavo, Alessia Portaccio and Alberto Testolin

Information 2025, 16(8), 628; https://doi.org/10.3390/info16080628 - 23 Jul 2025

Viewed by 742

Abstract

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial [...] Read more.

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial intelligence, might finally offer scalable tools for systematic biodiversity assessment. In this study, we evaluate the performance of BirdNET, a state-of-the-art deep learning model for avian sound recognition, in the context of selected bird species characteristic of the Italian Alpine region. To this end, we assemble a comprehensive, manually annotated audio dataset targeting key regional species, and we investigate a variety of strategies for model adaptation, including fine-tuning with data augmentation techniques to enhance recognition under challenging recording conditions. As a baseline, we also develop and evaluate a simple Convolutional Neural Network (CNN) trained exclusively on our domain-specific dataset. Our findings indicate that BirdNET performance can be greatly improved by fine-tuning the pre-trained network with data collected within the specific regional soundscape, outperforming both the original BirdNET and the baseline CNN by a significant margin. These findings underscore the importance of environmental adaptation and data variability for the development of automated ecoacoustic monitoring devices while highlighting the potential of deep learning methods in supporting conservation efforts and informing soundscape management in protected areas. Full article

(This article belongs to the Special Issue Signal Processing Based on Machine Learning Techniques)

► Show Figures

Graphical abstract

16 pages, 815 KB

Open AccessReview

Microvascularization of the Vocal Folds: Molecular Architecture, Functional Insights, and Personalized Research Perspectives

by Roxana-Andreea Popa, Cosmin-Gabriel Popa, Delia Hînganu and Marius Valeriu Hînganu

J. Pers. Med. 2025, 15(7), 293; https://doi.org/10.3390/jpm15070293 - 7 Jul 2025

Viewed by 599

Abstract

Introduction: The vascular architecture of the vocal folds plays a critical role in sustaining the dynamic demands of phonation. Disruptions in this microvascular system are linked to various pathological conditions, including Reinke’s edema, hemorrhage, and laryngeal carcinoma. This review explores the structural [...] Read more.

Introduction: The vascular architecture of the vocal folds plays a critical role in sustaining the dynamic demands of phonation. Disruptions in this microvascular system are linked to various pathological conditions, including Reinke’s edema, hemorrhage, and laryngeal carcinoma. This review explores the structural and functional components of vocal fold microvascularization, with emphasis on pericytes, endothelial interactions, and neurovascular regulation. Materials and Methods: A systematic review of the literature was conducted using databases such as PubMed, Scopus, Web of Science, and Embase. Keywords included “pericytes”, “Reinke’s edema”, and “vocal fold microvascularization”. Selected studies were peer-reviewed and met criteria for methodological quality and relevance to laryngeal microvascular physiology and pathology. Results: The vocal fold vasculature is organized in a parallel, tree-like pattern with distinct arterioles, capillaries, and venules. Capillaries dominate the superficial lamina propria, while transitional vessels connect to deeper arterioles surrounded by smooth muscle. Pericytes, present from birth, form tight associations with endothelial cells and contribute to capillary stability, vessel remodeling, and mechanical protection during vibration. Their thick cytoplasmic processes suggest a unique adaptation to the biomechanical stress of phonation. Arteriovenous anastomoses regulate perfusion by shunting blood according to functional demand. Furthermore, neurovascular control is mediated by noradrenergic fibers and neuropeptides such as VIP and CGRP, modulating vascular tone and glandular secretion. The limited lymphatic presence in the vocal fold mucosa contributes to edema accumulation while also restricting carcinoma spread, offering both therapeutic challenges and advantages. Conclusions: A deeper understanding of vocal fold microvascularization enhances clinical approaches to voice disorders and laryngeal disease, offering new perspectives for targeted therapies and regenerative strategies. Full article

(This article belongs to the Special Issue Clinical Diagnosis and Treatment in Otorhinolaryngology)

► Show Figures

Figure 1

24 pages, 5287 KB

Open AccessArticle

A Tourette Syndrome/ADHD-like Phenotype Results from Postnatal Disruption of CB₁ and CB₂ Receptor Signalling

by Victoria Gorberg, Tamar Harpaz, Emilya Natali Shamir, Orit Diana Karminsky, Ester Fride, Roger G. Pertwee, Iain R. Greig, Peter McCaffery and Sharon Anavi-Goffer

Int. J. Mol. Sci. 2025, 26(13), 6052; https://doi.org/10.3390/ijms26136052 - 24 Jun 2025

Viewed by 773

Abstract

Cannabinoid receptor 1 (CB₁) signalling is critical for weight gain and for milk intake in newborn pups. This is important as in humans, low birth weight increases the risk for attention-deficit hyperactivity disorder (ADHD). Moreover, some children with ADHD also have [...] Read more.

Cannabinoid receptor 1 (CB₁) signalling is critical for weight gain and for milk intake in newborn pups. This is important as in humans, low birth weight increases the risk for attention-deficit hyperactivity disorder (ADHD). Moreover, some children with ADHD also have Tourette syndrome (TS). However, it remains unclear if insufficient CB₁ receptor signalling may promote ADHD/TS-like behaviours. Here, ADHD/TS-like behaviours were studied from postnatal to adulthood by exposing postnatal wild-type CB₁ and Cannabinoid receptor 2 (CB₂) knockout mouse pups to SR141716A (rimonabant), a CB₁ receptor antagonist/inverse agonist. Postnatal disruption of the cannabinoid system by SR141716A induced vocal-like tics and learning deficits in male mice, accompanied by excessive vocalisation, hyperactivity, motor-like tics and/or high-risk behaviour in adults. In CB₁ knockouts, rearing and risky behaviours increased in females. In CB₂ knockouts, vocal-like tics did not develop, and males were hyperactive with learning deficits. Importantly, females were hyperactive but showed no vocal-like tics. The appearance of vocal-like tics depends on disrupted CB₁ receptor signalling and on functional CB₂ receptors after birth. Inhibition of CB₁ receptor signalling together with CB₂ receptor stimulation underlie ADHD/TS-like behaviours in males. This study suggests that the ADHD/TS phenotype may be a single clinical entity resulting from incorrect cannabinoid signalling after birth. Full article

(This article belongs to the Special Issue Breakthroughs in Diagnostic Prediction and Fundamental Therapeutics of Dementia and Movement Disorders, 2nd Edition)

► Show Figures

Figure 1

28 pages, 11527 KB

Open AccessArticle

Tracking of Fin Whales Using a Power Detector, Source Wavelet Extraction, and Cross-Correlation on Recordings Close to Triplets of Hydrophones

by Ronan Le Bras, Peter Nielsen and Paulina Bittner

J. Mar. Sci. Eng. 2025, 13(6), 1138; https://doi.org/10.3390/jmse13061138 - 7 Jun 2025

Viewed by 1133

Abstract

Whale signals originating in the vicinity of a triplet of underwater hydrophones, at a 2 km distance from each other, are recorded at the three sensors. They offer the opportunity to test simple models of propagation applied in the immediate neighborhood of the [...] Read more.

Whale signals originating in the vicinity of a triplet of underwater hydrophones, at a 2 km distance from each other, are recorded at the three sensors. They offer the opportunity to test simple models of propagation applied in the immediate neighborhood of the triplet, by comparing the arrival times and amplitudes of direct and reflected paths between the whale and the three hydrophones. Examples of recordings of individual fin whales passing by hydrophone triplets, based on the characteristics of their vocalizations around 20 Hz, are presented. Two types of calls are observed and their source wavelets extracted. Time segments are delimited around each call using a power detector. The time of arrival of the direct wave to the sensor and the Time Differences of Arrivals (TDOA) between sensors are obtained by correlation of the extracted source wavelets within the time segments. In addition to direct arrival, multiple reflections and the delays between the reflection and the direct arrival are automatically picked. A grid-search method of tracking the calls is presented based on the TDOA between three hydrophones and reflection delay times. Estimates of the depth of vocalization of the whale are made assuming a simple straight ray propagation model. The amplitude ratios between two hydrophones follow the spherical amplitude decay law of one over distance when the cetacean is in the immediate vicinity of the triplet, in a circle of radius 1.5 km sharing its center with the triplet’s center. Full article

(This article belongs to the Special Issue Advances in Underwater Acoustic Communication and Ocean Sensor Networks)

► Show Figures

Figure 1

44 pages, 12058 KB

Open AccessFeature PaperArticle

Harmonizer: A Universal Signal Tokenization Framework for Multimodal Large Language Models

by Amin Amiri, Alireza Ghaffarnia, Nafiseh Ghaffar Nia, Dalei Wu and Yu Liang

Mathematics 2025, 13(11), 1819; https://doi.org/10.3390/math13111819 - 29 May 2025

Viewed by 1628

Abstract

This paper introduces Harmonizer, a universal framework designed for tokenizing heterogeneous input signals, including text, audio, and video, to enable seamless integration into multimodal large language models (LLMs). Harmonizer employs a unified approach to convert diverse, non-linguistic signals into discrete tokens via its [...] Read more.

This paper introduces Harmonizer, a universal framework designed for tokenizing heterogeneous input signals, including text, audio, and video, to enable seamless integration into multimodal large language models (LLMs). Harmonizer employs a unified approach to convert diverse, non-linguistic signals into discrete tokens via its FusionQuantizer architecture, built on FluxFormer, to efficiently capture essential signal features while minimizing complexity. We enhance features through STFT-based spectral decomposition, Hilbert transform analytic signal extraction, and SCLAHE spectrogram contrast optimization, and train using a composite loss function to produce reliable embeddings and construct a robust vector vocabulary. Experimental validation on music datasets such as E-GMD v1.0.0, Maestro v3.0.0, and GTZAN demonstrates high fidelity across 288 s of vocal signals (MSE = 0.0037, CC = 0.9282, Cosine Sim. = 0.9278, DTW = 12.12, MFCC Sim. = 0.9997, Spectral Conv. = 0.2485). Preliminary tests on text reconstruction and UCF-101 video clips further confirm Harmonizer’s applicability across discrete and spatiotemporal modalities. Rooted in the universality of wave phenomena and Fourier theory, Harmonizer offers a physics-inspired, modality-agnostic fusion mechanism via wave superposition and interference principles. In summary, Harmonizer integrates natural language processing and signal processing into a coherent tokenization paradigm for efficient, interpretable multimodal learning. Full article

(This article belongs to the Special Issue Applied Mathematics in Machine Learning and Cloud Computing: Foundations and Applications)

► Show Figures

Figure 1

24 pages, 552 KB

Open AccessReview

Ethical Considerations in Emotion Recognition Research

by Darlene Barker, Mukesh Kumar Reddy Tippireddy, Ali Farhan and Bilal Ahmed

Psychol. Int. 2025, 7(2), 43; https://doi.org/10.3390/psycholint7020043 - 29 May 2025

Viewed by 4041

Abstract

The deployment of emotion-recognition technologies expands across healthcare education and gaming sectors to improve human–computer interaction. These systems examine facial expressions together with vocal tone and physiological signals, which include pupil size and electroencephalogram (EEG), to detect emotional states and deliver customized responses. [...] Read more.

The deployment of emotion-recognition technologies expands across healthcare education and gaming sectors to improve human–computer interaction. These systems examine facial expressions together with vocal tone and physiological signals, which include pupil size and electroencephalogram (EEG), to detect emotional states and deliver customized responses. The technology provides benefits through accessibility, responsiveness, and adaptability but generates multiple complex ethical issues. The combination of emotional profiling with biased algorithmic interpretations of culturally diverse expressions and affective data collection without meaningful consent presents major ethical concerns. The increased presence of these systems in classrooms, therapy sessions, and personal devices makes the potential for misuse or misinterpretation more critical. The paper integrates findings from literature review and initial emotion-recognition studies to create a conceptual framework that prioritizes data dignity, algorithmic accountability, and user agency and presents a conceptual framework that addresses these risks and includes safeguards for participants’ emotional well-being. The framework introduces structural safeguards which include data minimization, adaptive consent mechanisms, and transparent model logic as a more complete solution than privacy or fairness approaches. The authors present functional recommendations that guide developers to create ethically robust systems that match user principles and regulatory requirements. The development of real-time feedback loops for user awareness should be combined with clear disclosures about data use and participatory design practices. The successful oversight of these systems requires interdisciplinary work between researchers, policymakers, designers, and ethicists. The paper provides practical ethical recommendations for developing affective computing systems that advance the field while maintaining responsible deployment and governance in academic research and industry settings. The findings hold particular importance for high-stakes applications including healthcare, education, and workplace monitoring systems that use emotion-recognition technology. Full article

(This article belongs to the Section Neuropsychology, Clinical Psychology, and Mental Health)

► Show Figures

Figure 1

18 pages, 4885 KB

Open AccessArticle

Decoding Poultry Welfare from Sound—A Machine Learning Framework for Non-Invasive Acoustic Monitoring

by Venkatraman Manikandan and Suresh Neethirajan

Sensors 2025, 25(9), 2912; https://doi.org/10.3390/s25092912 - 5 May 2025

Cited by 3 | Viewed by 1995

Abstract

Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis [...] Read more.

Acoustic monitoring presents a promising, non-invasive modality for assessing animal welfare in precision livestock farming. In poultry, vocalizations encode biologically relevant cues linked to health status, behavioral states, and environmental stress. This study proposes an integrated analytical framework that combines signal-level statistical analysis with machine learning and deep learning classifiers to interpret chicken vocalizations in a welfare assessment context. The framework was evaluated using three complementary datasets encompassing health-related vocalizations, behavioral call types, and stress-induced acoustic responses. The pipeline employs a multistage process comprising high-fidelity signal acquisition, feature extraction (e.g., mel-frequency cepstral coefficients, spectral contrast, zero-crossing rate), and classification using models including Random Forest, HistGradientBoosting, CatBoost, TabNet, and LSTM. Feature importance analysis and statistical tests (e.g., t-tests, correlation metrics) confirmed that specific MFCC bands and spectral descriptors were significantly associated with welfare indicators. LSTM-based temporal modeling revealed distinct acoustic trajectories under visual and auditory stress, supporting the presence of habituation and stressor-specific vocal adaptations over time. Model performance, validated through stratified cross-validation and multiple statistical metrics (e.g., F1-score, Matthews correlation coefficient), demonstrated high classification accuracy and generalizability. Importantly, the approach emphasizes model interpretability, facilitating alignment with known physiological and behavioral processes in poultry. The findings underscore the potential of acoustic sensing and interpretable AI as scalable, biologically grounded tools for real-time poultry welfare monitoring, contributing to the advancement of sustainable and ethical livestock production systems. Full article

(This article belongs to the Special Issue Sensors in 2025)

► Show Figures

Figure 1

20 pages, 2817 KB

Open AccessArticle

Escalate Prognosis of Parkinson’s Disease Employing Wavelet Features and Artificial Intelligence from Vowel Phonation

by Rumana Islam and Mohammed Tarique

BioMedInformatics 2025, 5(2), 23; https://doi.org/10.3390/biomedinformatics5020023 - 30 Apr 2025

Viewed by 1582

Abstract

Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has [...] Read more.

Background: This work presents an artificial intelligence-based algorithm for detecting Parkinson’s disease (PD) from voice signals. The detection of PD at pre-symptomatic stages is imperative to slow disease progression. Speech signal processing-based PD detection can play a crucial role here, as it has been reported in the literature that PD affects the voice quality of patients at an early stage. Hence, speech samples can be used as biomarkers of PD, provided that suitable voice features and artificial intelligence algorithms are employed. Methods: Advanced signal-processing techniques are used to extract audio features from the sustained vowel ‘/a/’ sound. The extracted audio features include baseline features, intensities, formant frequencies, bandwidths, vocal fold parameters, and Mel-frequency cepstral coefficients (MFCCs) to form a feature vector. Then, this feature vector is further enriched by including wavelet-based features to form the second feature vector. For classification purposes, two popular machine learning models, namely, support vector machine (SVM) and k-nearest neighbors (kNNs), are trained to distinguish patients with PD. Results: The results demonstrate that the inclusion of wavelet-based voice features enhances the performance of both the SVM and kNN models for PD detection. However, kNN provides better accuracy, detection speed, training time, and misclassification cost than SVM. Conclusions: This work concludes that wavelet-based voice features are important for detecting neurodegenerative diseases like PD. These wavelet features can enhance the classification performance of machine learning models. This work also concludes that kNN is recommendable over SVM for the investigated voice features, despite the inclusion and exclusion of the wavelet features. Full article

► Show Figures

Figure 1

19 pages, 2225 KB

Open AccessArticle

A Bird Vocalization Classification Method Based on Bidirectional FBank with Enhanced Robustness

by Chizhou Peng, Yan Zhang, Jing Lu, Danjv Lv and Yanjiao Xiong

Appl. Sci. 2025, 15(9), 4913; https://doi.org/10.3390/app15094913 - 28 Apr 2025

Viewed by 479

Abstract

Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes [...] Read more.

Recent advances in audio signal processing and pattern recognition have made the classification of bird vocalization a focus of bioacoustic research. However, the accurate classification of birdsongs is challenged by environmental noise and the limitations of traditional feature extraction methods. This study proposes the iWAVE-BiFBank method, an innovative approach combining improved wavelet adaptive denoising (iWAVE) and a bidirectional Mel-filter bank (BiFBank) for effective birdsong classification with enhanced robustness. The iWAVE method achieves adaptive optimization using the autocorrelation coefficient and peak-sum-ratio (PSR), overcoming the manual adjustments required with and incompleteness of traditional methods. BiFBank combines FBank and inverse FBank (iFBank) to enhance feature representation. This fusion addresses the shortcomings of FBank and introduces novel transformation methods and filter designs to iFBank, with a focus on high-frequency components. The iWAVE-BiFBank method creates a robust feature set, which can effectively reduce the noise of audio signals and capture both low- and high-frequency information. Experiments were conducted on a dataset of 16 species of birds, and the proposed method was verified with a random forest (RF) classifier. The results show that iWAVE-BiFBank achieves an accuracy of 94.00%, with other indicators, including the F1 score, exceeding 93.00%, outperforming all other tested methods. Overall, the proposed method effectively reduces audio noise, comprehensively captures the characteristics of bird vocalization, and provides improved classification performance. Full article

► Show Figures

Figure 1

16 pages, 551 KB

Open AccessArticle

Dual-Channel Spoofed Speech Detection Based on Graph Attention Networks

by Yun Tan, Xiaoqian Weng and Jiangzhang Zhu

Symmetry 2025, 17(5), 641; https://doi.org/10.3390/sym17050641 - 24 Apr 2025

Viewed by 616

Abstract

In the field of voice cryptography, detecting forged speech is crucial for secure communication and identity authentication. While most existing spoof detection methods rely on monaural audio, the characteristics of dual-channel signals remain underexplored. To address this, we propose a symmetrical dual-branch detection [...] Read more.

In the field of voice cryptography, detecting forged speech is crucial for secure communication and identity authentication. While most existing spoof detection methods rely on monaural audio, the characteristics of dual-channel signals remain underexplored. To address this, we propose a symmetrical dual-branch detection framework that integrates Res2Net with coordinate attention (Res2NetCA) and a dual-channel heterogeneous graph fusion module (DHGFM). The proposed architecture encodes left and right vocal tract signals into spectrogram and time-domain graphs, and it models both intra- and inter-channel time–frequency dependencies through graph attention mechanisms and fusion strategies. Experimental results on the ASVspoof2019 and ASVspoof2021 LA datasets demonstrate the superior detection performance of our method. Specifically, it achieved an EER of 1.64% and a Min-tDCF of 0.051 on ASVspoof2019, and an EER of 6.76% with a Min-tDCF of 0.3638 on ASVspoof2021, validating the effectiveness and potential of dual-channel modeling in spoofed speech detection. Full article

(This article belongs to the Special Issue Applications Based on Symmetry in Applied Cryptography)

► Show Figures

Figure 1

18 pages, 3228 KB

Open AccessArticle

Automatic Detection and Unsupervised Clustering-Based Classification of Cetacean Vocal Signals

by Yinian Liang, Yan Wang, Fangjiong Chen, Hua Yu, Fei Ji and Yankun Chen

Appl. Sci. 2025, 15(7), 3585; https://doi.org/10.3390/app15073585 - 25 Mar 2025

Cited by 1 | Viewed by 686

Abstract

In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for [...] Read more.

In the ocean environment, passive acoustic monitoring (PAM) is an important technique for the surveillance of cetacean species. Manual detection for a large amount of PAM data is inefficient and time-consuming. To extract useful features from a large amount of PAM data for classifying different cetacean species, we propose an automatic detection and unsupervised clustering-based classification method for cetacean vocal signals. This paper overcomes the limitations of the traditional threshold-based method, and the threshold is set adaptively according to the mean value of the signal energy in each frame. Furthermore, we also address the problem of the high cost of data training and labeling in deep-learning-based methods by using the unsupervised clustering-based classification method. Firstly, the automatic detection method extracts vocal signals from PAM data and, at the same time, removes clutter information. Then, the vocal signals are analyzed for classification using a clustering algorithm. This method grabs the acoustic characteristics of vocal signals and distinguishes them from environmental noise. We process 194 audio files in a total of 25.3 h of vocal signal from two marine mammal public databases. Five kinds of vocal signals from different cetaceans are extracted and assembled to form 8 datasets for classification. The verification experiments were conducted on four clustering algorithms based on two performance metrics. The experimental results confirm the effectiveness of the proposed method. The proposed method automatically removes about 75% of clutter data from 1581.3MB of data in audio files and extracts 75.75 MB of the features detected by our algorithm. Four classical unsupervised clustering algorithms are performed on the datasets we made for verification and obtain an average accuracy rate of 84.83%. Full article

(This article belongs to the Special Issue Machine Learning in Acoustic Signal Processing)

► Show Figures

Figure 1

18 pages, 2018 KB

Open AccessArticle

Adapting a Large-Scale Transformer Model to Decode Chicken Vocalizations: A Non-Invasive AI Approach to Poultry Welfare

by Suresh Neethirajan

AI 2025, 6(4), 65; https://doi.org/10.3390/ai6040065 - 25 Mar 2025

Cited by 3 | Viewed by 1698

Abstract

Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, [...] Read more.

Natural Language Processing (NLP) and advanced acoustic analysis have opened new avenues in animal welfare research by decoding the vocal signals of farm animals. This study explored the feasibility of adapting a large-scale Transformer-based model, OpenAI’s Whisper, originally developed for human speech recognition, to decode chicken vocalizations. Our primary objective was to determine whether Whisper could effectively identify acoustic patterns associated with emotional and physiological states in poultry, thereby enabling real-time, non-invasive welfare assessments. To achieve this, chicken vocal data were recorded under diverse experimental conditions, including healthy versus unhealthy birds, pre-stress versus post-stress scenarios, and quiet versus noisy environments. The audio recordings were processed through Whisper, producing text-like outputs. Although these outputs did not represent literal translations of chicken vocalizations into human language, they exhibited consistent patterns in token sequences and sentiment indicators strongly correlated with recognized poultry stressors and welfare conditions. Sentiment analysis using standard NLP tools (e.g., polarity scoring) identified notable shifts in “negative” and “positive” scores that corresponded closely with documented changes in vocal intensity associated with stress events and altered physiological states. Despite the inherent domain mismatch—given Whisper’s original training on human speech—the findings clearly demonstrate the model’s capability to reliably capture acoustic features significant to poultry welfare. Recognizing the limitations associated with applying English-oriented sentiment tools, this study proposes future multimodal validation frameworks incorporating physiological sensors and behavioral observations to further strengthen biological interpretability. To our knowledge, this work provides the first demonstration that Transformer-based architectures, even without species-specific fine-tuning, can effectively encode meaningful acoustic patterns from animal vocalizations, highlighting their transformative potential for advancing productivity, sustainability, and welfare practices in precision poultry farming. Full article

(This article belongs to the Special Issue Artificial Intelligence in Agriculture)

► Show Figures

Figure 1

19 pages, 4948 KB

Open AccessArticle

Five-Cavity Resonance Inspired, rGO Nano-Sheet Reinforced, Multi-Site Voice Synergetic Detection Hydrogel Sensors with Diverse Self-Adhesion and Robust Wireless Transmissibility

by Yue Wu, Kewei Zhao, Jingliu Wang, Chunhui Li, Xubao Jiang, Yudong Wang and Xiangling Gu

Gels 2025, 11(4), 233; https://doi.org/10.3390/gels11040233 - 23 Mar 2025

Cited by 1 | Viewed by 655

Abstract

The practical application of flexible sensors in sound detection is significantly hindered by challenges such as information isolation, fragmentation, and low fidelity. To address these challenges, this work developed a composite hydrogel via a one-pot method, employing polyvinyl alcohol (PVA) as the first [...] Read more.

The practical application of flexible sensors in sound detection is significantly hindered by challenges such as information isolation, fragmentation, and low fidelity. To address these challenges, this work developed a composite hydrogel via a one-pot method, employing polyvinyl alcohol (PVA) as the first network, polyacrylic acid (PAA) as the second network, and two-dimensional nanomaterials—reduced graphene oxide (rGO)—generated through the redox reaction of polydopamine (PDA) and graphene oxide (GO) as conductive fillers. The uniformly distributed rGO within the hydrogel forms an efficient conductive network, endowing the material with high sensitivity (GF = 0.64), excellent conductivity (8.15 S m⁻¹), rapid response time (350 ms), and outstanding stability. The synergistic interaction between PDA and PAA modulates the hydrogel’s adhesion (0.89 kPa), enabling conformal attachment to skin surfaces. The designed rGO@PVA-PAA hydrogel-based flexible sensor effectively monitors vibrations across diverse frequencies originating from five vocal cavities (head, nasal, oral, laryngeal, and thoracic cavities) during singing. Integrated with multi-position synchronization and Bluetooth wireless sensing technologies, the system achieves coordinated and efficient monitoring of multiple vocal cavities. Furthermore, the hydrogel sensor demonstrates versatility in detecting physiological signals, including electrocardiograms, subtle vibrations, and multi-scale body movements, highlighting its broad applicability in biomedical and motion-sensing applications. Full article

(This article belongs to the Special Issue Advanced Hydrogels for Biomedical Applications)

► Show Figures

Figure 1

Search Results (170)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (170)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI