Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (62)

Search Parameters:
Keywords = MEL-spectrum

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 17061 KB  
Article
Numerical Analysis of Cavitation Suppression on a NACA 0018 Hydrofoil Using a Surface Cavity
by Pankaj Kumar, Ebrahim Kadivar and Ould el Moctar
J. Mar. Sci. Eng. 2025, 13(8), 1517; https://doi.org/10.3390/jmse13081517 - 6 Aug 2025
Viewed by 567
Abstract
This study examines the hydrodynamic and acoustic performance of plain NACA0018 hydrofoil and modified NACA0018 hydrofoils (foil with a cavity on suction surface) at a Reynolds number (Re) of 40,000, which is indicative of small-scale turbines and [...] Read more.
This study examines the hydrodynamic and acoustic performance of plain NACA0018 hydrofoil and modified NACA0018 hydrofoils (foil with a cavity on suction surface) at a Reynolds number (Re) of 40,000, which is indicative of small-scale turbines and marine applications. A cavity was created on suction side surface at 40–50% of the chord length, which is chosen for its efficacy in cavitation control. The present analysis examines the impact of the cavity on lift-to-drag-ratio (L/D) and cavity length at three cavitation numbers (1.7, 1.2, and 0.93) for plain and modified hydrofoils. Simulations demonstrate a significant enhancement of 7% in the lift-to-drag ratio relative to traditional designed foils. Contrary to earlier observations, the cavity length increases instead of decreasing for the modified hydrofoil. Both periodic steady and turbulent inflow conditions are captured that simulate the complex cavity dynamics and flow–acoustic interactions. It is found that a reduction in RMS velocity with modified blade suggests flow stabilization. Spectral analysis using Mel-frequency techniques confirms the cavity’s potential to reduce low-frequency flow-induced noise. These findings offer new insights for designing quieter and more efficient hydrofoils and turbine blades. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

25 pages, 1822 KB  
Article
Emotion Recognition from Speech in a Subject-Independent Approach
by Andrzej Majkowski and Marcin Kołodziej
Appl. Sci. 2025, 15(13), 6958; https://doi.org/10.3390/app15136958 - 20 Jun 2025
Cited by 1 | Viewed by 2147
Abstract
The aim of this article is to critically and reliably assess the potential of current emotion recognition technologies for practical applications in human–computer interaction (HCI) systems. The study made use of two databases: one in English (RAVDESS) and another in Polish (EMO-BAJKA), both [...] Read more.
The aim of this article is to critically and reliably assess the potential of current emotion recognition technologies for practical applications in human–computer interaction (HCI) systems. The study made use of two databases: one in English (RAVDESS) and another in Polish (EMO-BAJKA), both containing speech recordings expressing various emotions. The effectiveness of recognizing seven and eight different emotions was analyzed. A range of acoustic features, including energy features, mel-cepstral features, zero-crossing rate, fundamental frequency, and spectral features, were utilized to analyze the emotions in speech. Machine learning techniques such as convolutional neural networks (CNNs), long short-term memory (LSTM) networks, and support vector machines with a cubic kernel (cubic SVMs) were employed in the emotion classification task. The research findings indicated that the effective recognition of a broad spectrum of emotions in a subject-independent approach is limited. However, significantly better results were obtained in the classification of paired emotions, suggesting that emotion recognition technologies could be effectively used in specific applications where distinguishing between two particular emotional states is essential. To ensure a reliable and accurate assessment of the emotion recognition system, care was taken to divide the dataset in such a way that the training and testing data contained recordings of completely different individuals. The highest classification accuracies for pairs of emotions were achieved for Angry–Fearful (0.8), Angry–Happy (0.86), Angry–Neutral (1.0), Angry–Sad (1.0), Angry–Surprise (0.89), Disgust–Neutral (0.91), and Disgust–Sad (0.96) in the RAVDESS. In the EMO-BAJKA database, the highest classification accuracies for pairs of emotions were for Joy–Neutral (0.91), Surprise–Neutral (0.80), Surprise–Fear (0.91), and Neutral–Fear (0.91). Full article
(This article belongs to the Special Issue New Advances in Applied Machine Learning)
Show Figures

Figure 1

13 pages, 1695 KB  
Article
Deepfake Voice Detection: An Approach Using End-to-End Transformer with Acoustic Feature Fusion by Cross-Attention
by Liang Yu Gong and Xue Jun Li
Electronics 2025, 14(10), 2040; https://doi.org/10.3390/electronics14102040 - 16 May 2025
Viewed by 2151
Abstract
Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in [...] Read more.
Deepfake technology uses artificial intelligence to create highly realistic but fake audio, video, or images, often making it difficult to distinguish from real content. Due to its potential use for misinformation, fraud, and identity theft, deepfake technology has gained a bad reputation in the digital world. Recently, many works have reported on the detection of deepfake videos/images. However, few studies have concentrated on developing robust deepfake voice detection systems. Among most existing studies in this field, a deepfake voice detection system commonly requires a large amount of training data and a robust backbone to detect real and logistic attack audio. For acoustic feature extractions, Mel-frequency Filter Bank (MFB)-based approaches are more suitable for extracting speech signals than applying the raw spectrum as input. Recurrent Neural Networks (RNNs) have been successfully applied to Natural Language Processing (NLP), but these backbones suffer from gradient vanishing or explosion while processing long-term sequences. In addition, the cross-dataset evaluation of most deepfake voice recognition systems has weak performance, leading to a system robustness issue. To address these issues, we propose an acoustic feature-fusion method to combine Mel-spectrum and pitch representation based on cross-attention mechanisms. Then, we combine a Transformer encoder with a convolutional neural network block to extract global and local features as a front end. Finally, we connect the back end with one linear layer for classification. We summarized several deepfake voice detectors’ performances on the silence-segment processed ASVspoof 2019 dataset. Our proposed method can achieve an Equal Error Rate (EER) of 26.41%, while most of the existing methods result in EER higher than 30%. We also tested our proposed method on the ASVspoof 2021 dataset, and found that it can achieve an EER as low as 28.52%, while the EER values for existing methods are all higher than 28.9%. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

29 pages, 4394 KB  
Article
Analysis of Voice, Speech, and Language Biomarkers of Parkinson’s Disease Collected in a Mixed Reality Setting
by Milosz Dudek, Daria Hemmerling, Marta Kaczmarska, Joanna Stepien, Mateusz Daniol, Marek Wodzinski and Magdalena Wojcik-Pedziwiatr
Sensors 2025, 25(8), 2405; https://doi.org/10.3390/s25082405 - 10 Apr 2025
Cited by 5 | Viewed by 3716
Abstract
This study explores an innovative approach to early Parkinson’s disease (PD) detection by analyzing speech data collected using a mixed reality (MR) system. A total of 57 Polish participants, including PD patients and healthy controls, performed five speech tasks while using an MR [...] Read more.
This study explores an innovative approach to early Parkinson’s disease (PD) detection by analyzing speech data collected using a mixed reality (MR) system. A total of 57 Polish participants, including PD patients and healthy controls, performed five speech tasks while using an MR head-mounted display (HMD). Speech data were recorded and analyzed to extract acoustic and linguistic features, which were then evaluated using machine learning models, including logistic regression, support vector machines (SVMs), random forests, AdaBoost, and XGBoost. The XGBoost model achieved the best performance, with an F1-score of 0.90 ± 0.05 in the story-retelling task. Key features such as MFCCs (mel-frequency cepstral coefficients), spectral characteristics, RASTA-filtered auditory spectrum, and local shimmer were identified as significant in detecting PD-related speech alterations. Additionally, state-of-the-art deep learning models (wav2vec2, HuBERT, and WavLM) were fine-tuned for PD detection. HuBERT achieved the highest performance, with an F1-score of 0.94 ± 0.04 in the diadochokinetic task, demonstrating the potential of deep learning to capture complex speech patterns linked to neurodegenerative diseases. This study highlights the effectiveness of combining MR technology for speech data collection with advanced machine learning (ML) and deep learning (DL) techniques, offering a non-invasive and high-precision approach to PD diagnosis. The findings hold promise for broader clinical applications, advancing the diagnostic landscape for neurodegenerative disorders. Full article
Show Figures

Figure 1

17 pages, 1463 KB  
Article
Interpretable Probabilistic Identification of Depression in Speech
by Stavros Ntalampiras
Sensors 2025, 25(4), 1270; https://doi.org/10.3390/s25041270 - 19 Feb 2025
Cited by 2 | Viewed by 986
Abstract
Mental health assessment is typically carried out via a series of conversation sessions with medical professionals, where the overall aim is the diagnosis of mental illnesses and well-being evaluation. Despite its arguable socioeconomic significance, national health systems fail to meet the increased demand [...] Read more.
Mental health assessment is typically carried out via a series of conversation sessions with medical professionals, where the overall aim is the diagnosis of mental illnesses and well-being evaluation. Despite its arguable socioeconomic significance, national health systems fail to meet the increased demand for such services that has been observed in recent years. To assist and accelerate the diagnosis process, this work proposes an AI-based tool able to provide interpretable predictions by automatically processing the recorded speech signals. An explainability-by-design approach is followed, where audio descriptors related to the problem at hand form the feature vector (Mel-scaled spectrum summarization, Teager operator and periodicity description), while modeling is based on Hidden Markov Models adapted from an ergodic universal one following a suitably designed data selection scheme. After extensive and thorough experiments adopting a standardized protocol on a publicly available dataset, we report significantly higher results with respect to the state of the art. In addition, an ablation study was carried out, providing a comprehensive analysis of the relevance of each system component. Last but not least, the proposed solution not only provides excellent performance, but its operation and predictions are transparent and interpretable, laying out the path to close the usability gap existing between such systems and medical personnel. Full article
(This article belongs to the Special Issue Advances in Acoustic Sensors and Deep Audio Pattern Recognition)
Show Figures

Figure 1

21 pages, 12814 KB  
Article
Multi-Scale Deep Feature Fusion with Machine Learning Classifier for Birdsong Classification
by Wei Li, Danju Lv, Yueyun Yu, Yan Zhang, Lianglian Gu, Ziqian Wang and Zhicheng Zhu
Appl. Sci. 2025, 15(4), 1885; https://doi.org/10.3390/app15041885 - 12 Feb 2025
Cited by 2 | Viewed by 1966
Abstract
Birds are significant bioindicators in the assessment of habitat biodiversity, ecological impacts and ecosystem health. Against the backdrop of easier bird vocalization data acquisition, and with deep learning and machine learning technologies as the technical support, exploring recognition and classification networks suitable for [...] Read more.
Birds are significant bioindicators in the assessment of habitat biodiversity, ecological impacts and ecosystem health. Against the backdrop of easier bird vocalization data acquisition, and with deep learning and machine learning technologies as the technical support, exploring recognition and classification networks suitable for bird calls has become the focus of bioacoustics research. Due to the fact that the spectral differences among various bird calls are much greater than the differences between human languages, constructing birdsong classification networks based on human speech recognition networks does not yield satisfactory results. Effectively capturing the differences in birdsong across species is a crucial factor in improving recognition accuracy. To address the differences in features, this study proposes multi-scale deep features. At the same time, we separate the classification part from the deep network by using machine learning to adapt to classification with distinct feature differences in birdsong. We validate the effectiveness of multi-scale deep features on a publicly available dataset of 20 bird species. The experimental results show that the accuracy of the multi-scale deep features on a log-wavelet spectrum, log-Mel spectrum and log-power spectrum reaches 94.04%, 97.81% and 95.89%, respectively, achieving an improvement over single-scale deep features on these three spectrograms. Comparative experimental results show that the proposed multi-scale deep feature method is superior to five state-of-the-art birdsong identification methods, which provides new perspectives and tools for birdsong identification research, and is of great significance for ecological monitoring, biodiversity conservation and forest research. Full article
Show Figures

Figure 1

16 pages, 29747 KB  
Article
Identification of Elephant Rumbles in Seismic Infrasonic Signals Using Spectrogram-Based Machine Learning
by Janitha Vidunath, Chamath Shamal, Ravindu Hiroshan, Udani Gamlath, Chamira U. S. Edussooriya and Sudath R. Munasinghe
Appl. Syst. Innov. 2024, 7(6), 117; https://doi.org/10.3390/asi7060117 - 29 Nov 2024
Cited by 3 | Viewed by 2597
Abstract
This paper presents several machine learning methods and highlights the most effective one for detecting elephant rumbles in infrasonic seismic signals. The design and implementation of electronic circuitry to amplify, filter, and digitize the seismic signals captured through geophones are presented. The process [...] Read more.
This paper presents several machine learning methods and highlights the most effective one for detecting elephant rumbles in infrasonic seismic signals. The design and implementation of electronic circuitry to amplify, filter, and digitize the seismic signals captured through geophones are presented. The process converts seismic rumbles to a spectrogram and the existing methods of spectrogram feature extraction and appropriate machine learning algorithms are compared on their merit for automatic seismic rumble identification. A novel method of denoising the spectrum that leads to enhanced accuracy in identifying seismic rumbles is presented. It is experimentally found that the combination of the Mel-frequency cepstral coefficient (MFCC) feature extraction method and the ridge classifier machine learning algorithm give the highest accuracy of 97% in detecting infrasonic elephant rumbles hidden in seismic signals. The trained machine learning algorithm can run quite efficiently on general-purpose embedded hardware such as a Raspberry Pi, hence the method provides a cost-effective and scalable platform to develop a tool to remotely localize elephants, which would help mitigate the human–elephant conflict. Full article
Show Figures

Figure 1

17 pages, 6603 KB  
Article
Impact of Full-Spectrum and Infrared Lighting on Growth, Oxidative Stress, and Cecal Microbiota in Broilers
by Khawar Hayat, Rongjin Zheng, Li Zeng, Zunzhong Ye and Jinming Pan
Antioxidants 2024, 13(12), 1442; https://doi.org/10.3390/antiox13121442 - 23 Nov 2024
Viewed by 1107
Abstract
Lighting is crucial for the development of broilers as it affects their growth performance, oxidative stress, and overall health. This study investigates the impact of full-spectrum light, infrared light, and LED white light exposure on the growth performance, oxidative stress markers, and cecal [...] Read more.
Lighting is crucial for the development of broilers as it affects their growth performance, oxidative stress, and overall health. This study investigates the impact of full-spectrum light, infrared light, and LED white light exposure on the growth performance, oxidative stress markers, and cecal microbiota of medium-growth yellow-feathered broilers. A total of 216 medium-growth yellow-feathered chicks (Yuhuang No. 5), five days old, were randomly divided into three groups: 72 chicks in each group, with three replicates of 24 chicks. The birds were raised under different lighting conditions, including LED infrared light (II), full-spectrum therapy light (FB), and LED white light (CG) until day 87. This experiment comprised the early growth phase and measured critical hormones such as Melatonin (Mel), Growth Hormone (GH), and Growth Hormone Releasing Hormone (GHRH), as well as Malondialdehyde (MDA), Superoxide Dismutase (SOD), and Catalase (CAT). Additionally, this study examined the differences in microbiota diversity and composition. The results demonstrated that LED infrared and full-spectrum light exposure significantly (p < 0.05) increased broiler body weight. Particularly, full-spectrum light was effective in comb redness and reducing final comb length and oxidative stress. Furthermore, full-spectrum light improved microbial prosperity and diversity compared with the other lighting conditions. Overall, the findings suggest that full-spectrum lighting is more beneficial for broiler growth, reducing oxidative stress, and promoting gut health compared with LED infrared lighting. These insights can be applied to optimizing broiler farming practices, thereby improving productivity and animal welfare. Full article
(This article belongs to the Special Issue Oxidative Stress in Poultry Reproduction and Nutrition)
Show Figures

Figure 1

20 pages, 5794 KB  
Article
Advanced Bearing-Fault Diagnosis and Classification Using Mel-Scalograms and FOX-Optimized ANN
by Muhammad Farooq Siddique, Wasim Zaman, Saif Ullah, Muhammad Umar, Faisal Saleem, Dongkoo Shon, Tae Hyun Yoon, Dae-Seung Yoo and Jong-Myon Kim
Sensors 2024, 24(22), 7303; https://doi.org/10.3390/s24227303 - 15 Nov 2024
Cited by 20 | Viewed by 2391
Abstract
Accurate and reliable bearing-fault diagnosis is important for ensuring the efficiency and safety of industrial machinery. This paper presents a novel method for bearing-fault diagnosis using Mel-transformed scalograms obtained from vibrational signals (VS). The signals are windowed and pass through a Mel filter [...] Read more.
Accurate and reliable bearing-fault diagnosis is important for ensuring the efficiency and safety of industrial machinery. This paper presents a novel method for bearing-fault diagnosis using Mel-transformed scalograms obtained from vibrational signals (VS). The signals are windowed and pass through a Mel filter bank, converting them into a Mel spectrum. These scalograms are subsequently fed into an autoencoder comprising convolutional and pooling layers to extract robust features. The classification is performed using an artificial neural network (ANN) optimized with the FOX optimizer, which replaces traditional backpropagation. The FOX optimizer enhances synaptic weight adjustments, leading to superior classification accuracy, minimal loss, improved generalization, and increased interpretability. The proposed model was validated on a laboratory dataset obtained from a bearing testbed with multiple fault conditions. Experimental results demonstrate that the model achieves perfect precision, recall, F1-scores, and an AUC of 1.00 across all fault categories, significantly outperforming comparison models. The t-SNE plots illustrate clear separability between different fault classes, confirming the model’s robustness and reliability. This approach offers an efficient and highly accurate solution for real-time predictive maintenance in industrial applications. Full article
(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors 2024)
Show Figures

Figure 1

20 pages, 4970 KB  
Article
Revealing the Next Word and Character in Arabic: An Effective Blend of Long Short-Term Memory Networks and ARABERT
by Fawaz S. Al-Anzi and S. T. Bibin Shalini
Appl. Sci. 2024, 14(22), 10498; https://doi.org/10.3390/app142210498 - 14 Nov 2024
Cited by 3 | Viewed by 1795
Abstract
Arabic raw audio datasets were initially gathered to produce a corresponding signal spectrum, which was further used to extract the Mel-Frequency Cepstral Coefficients (MFCCs). The pronunciation dictionary, language model, and acoustic model were further derived from the MFCCs’ features. These output data were [...] Read more.
Arabic raw audio datasets were initially gathered to produce a corresponding signal spectrum, which was further used to extract the Mel-Frequency Cepstral Coefficients (MFCCs). The pronunciation dictionary, language model, and acoustic model were further derived from the MFCCs’ features. These output data were processed into Baidu’s Deep Speech model (ASR system) to attain the text corpus. Baidu’s Deep Speech model was implemented to precisely identify the global optimal value rapidly while preserving a low word and character discrepancy rate by attaining an excellent performance in isolated and end-to-end speech recognition. The desired outcome in this work is to forecast the next word and character in a sequential and systematic order that applies under natural language processing (NLP). This work combines the trained Arabic language model ARABERT with the potential of Long Short-Term Memory (LSTM) networks to predict the next word and character in an Arabic text. We used the pre-trained ARABERT embedding to improve the model’s capacity and, to capture semantic relationships within the language, we educated LSTM + CNN and Markov models on Arabic text data to assess the efficacy of this model. Python libraries such as TensorFlow, Pickle, Keras, and NumPy were used to effectively design our development model. We extensively assessed the model’s performance using new Arabic text, focusing on evaluation metrics like accuracy, word error rate, character error rate, BLEU score, and perplexity. The results show how well the combined LSTM + ARABERT and Markov models have outperformed the baseline models in envisaging the next word or character in the Arabic text. The accuracy rates of 64.9% for LSTM, 74.6% for ARABERT + LSTM, and 78% for Markov chain models were achieved in predicting the next word, and the accuracy rates of 72% for LSTM, 72.22% for LSTM + CNN, and 73% for ARABERET + LSTM models were achieved for the next-character prediction. This work unveils a novelty in Arabic natural language processing tasks, estimating a potential future expansion in deriving a precise next-word and next-character forecasting, which can be an efficient utility for text generation and machine translation applications. Full article
Show Figures

Figure 1

21 pages, 7017 KB  
Article
Multi-Scale Frequency-Adaptive-Network-Based Underwater Target Recognition
by Lixu Zhuang, Afeng Yang, Yanxin Ma and David Day-Uei Li
J. Mar. Sci. Eng. 2024, 12(10), 1766; https://doi.org/10.3390/jmse12101766 - 5 Oct 2024
Cited by 2 | Viewed by 1161
Abstract
Due to the complexity of underwater environments, underwater target recognition based on radiated noise has always been challenging. This paper proposes a multi-scale frequency-adaptive network for underwater target recognition. Based on the different distribution densities of Mel filters in the low-frequency band, a [...] Read more.
Due to the complexity of underwater environments, underwater target recognition based on radiated noise has always been challenging. This paper proposes a multi-scale frequency-adaptive network for underwater target recognition. Based on the different distribution densities of Mel filters in the low-frequency band, a three-channel improved Mel energy spectrum feature is designed first. Second, by combining a frequency-adaptive module, an attention mechanism, and a multi-scale fusion module, a multi-scale frequency-adaptive network is proposed to enhance the model’s learning ability. Then, the model training is optimized by introducing a time–frequency mask, a data augmentation strategy involving data confounding, and a focal loss function. Finally, systematic experiments were conducted based on the ShipsEar dataset. The results showed that the recognition accuracy for five categories reached 98.4%, and the accuracy for nine categories in fine-grained recognition was 88.6%. Compared with existing methods, the proposed multi-scale frequency-adaptive network for underwater target recognition has achieved significant performance improvement. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

23 pages, 10692 KB  
Article
Intelligent Fault Diagnosis Method for Constant Pressure Variable Pump Based on Mel-MobileViT Lightweight Network
by Yonghui Zhao, Anqi Jiang, Wanlu Jiang, Xukang Yang, Xudong Xia and Xiaoyang Gu
J. Mar. Sci. Eng. 2024, 12(9), 1677; https://doi.org/10.3390/jmse12091677 - 19 Sep 2024
Cited by 1 | Viewed by 1483
Abstract
The sound signals of hydraulic pumps contain abundant key information reflecting their internal mechanical states. In environments characterized by high temperatures or high-speed rotation, or where sensor deployment is challenging, acoustic sensors offer non-contact and flexible arrangement features. Therefore, this study aims to [...] Read more.
The sound signals of hydraulic pumps contain abundant key information reflecting their internal mechanical states. In environments characterized by high temperatures or high-speed rotation, or where sensor deployment is challenging, acoustic sensors offer non-contact and flexible arrangement features. Therefore, this study aims to develop an intelligent fault diagnosis method for hydraulic pumps based on acoustic signals. Initially, the Adaptive Chirp Mode Decomposition (ACMD) method is employed to remove environmental noise from the acoustic signals, enhancing the feature signals. Subsequently, the Mel spectrum is extracted as the acoustic fingerprint features of various fault states of the hydraulic pump, and these features are used to train the MobileViT network, achieving accurate identification of the different fault modes. The results indicate that the proposed Mel-MobileViT model effectively identifies and classifies various faults in constant pressure variable pumps, outperforming other models. This study not only provides an efficient and reliable intelligent method for the fault diagnosis of critical industrial equipment such as hydraulic pumps, but also offers new perspectives on the application of deep learning in acoustic pattern analysis. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

18 pages, 3164 KB  
Article
Cough Detection Using Acceleration Signals and Deep Learning Techniques
by Daniel Sanchez-Morillo, Diego Sales-Lerida, Blanca Priego-Torres and Antonio León-Jiménez
Electronics 2024, 13(12), 2410; https://doi.org/10.3390/electronics13122410 - 20 Jun 2024
Cited by 3 | Viewed by 3808
Abstract
Cough is a frequent symptom in many common respiratory diseases and is considered a predictor of early exacerbation or even disease progression. Continuous cough monitoring offers valuable insights into treatment effectiveness, aiding healthcare providers in timely intervention to prevent exacerbations and hospitalizations. Objective [...] Read more.
Cough is a frequent symptom in many common respiratory diseases and is considered a predictor of early exacerbation or even disease progression. Continuous cough monitoring offers valuable insights into treatment effectiveness, aiding healthcare providers in timely intervention to prevent exacerbations and hospitalizations. Objective cough monitoring methods have emerged as superior alternatives to subjective methods like questionnaires. In recent years, cough has been monitored using wearable devices equipped with microphones. However, the discrimination of cough sounds from background noise has been shown a particular challenge. This study aimed to demonstrate the effectiveness of single-axis acceleration signals combined with state-of-the-art deep learning (DL) algorithms to distinguish intentional coughing from sounds like speech, laugh, or throat noises. Various DL methods (recurrent, convolutional, and deep convolutional neural networks) combined with one- and two-dimensional time and time–frequency representations, such as the signal envelope, kurtogram, wavelet scalogram, mel, Bark, and the equivalent rectangular bandwidth spectrum (ERB) spectrograms, were employed to identify the most effective approach. The optimal strategy, which involved the SqueezeNet model in conjunction with wavelet scalograms, yielded an accuracy and precision of 92.21% and 95.59%, respectively. The proposed method demonstrated its potential for cough monitoring. Future research will focus on validating the system in spontaneous coughing of subjects with respiratory diseases under natural ambulatory conditions. Full article
Show Figures

Figure 1

13 pages, 1846 KB  
Article
Enhancing Selective Antimicrobial and Antibiofilm Activities of Melittin through 6-Aminohexanoic Acid Substitution
by Naveenkumar Radhakrishnan, Sukumar Dinesh Kumar, Song-Yub Shin and Sungtae Yang
Biomolecules 2024, 14(6), 699; https://doi.org/10.3390/biom14060699 - 14 Jun 2024
Cited by 8 | Viewed by 2305
Abstract
Leucine residues are commonly found in the hydrophobic face of antimicrobial peptides (AMPs) and are crucial for membrane permeabilization, leading to the cell death of invading pathogens. Melittin, which contains four leucine residues, demonstrates broad-spectrum antimicrobial properties but also significant cytotoxicity against mammalian [...] Read more.
Leucine residues are commonly found in the hydrophobic face of antimicrobial peptides (AMPs) and are crucial for membrane permeabilization, leading to the cell death of invading pathogens. Melittin, which contains four leucine residues, demonstrates broad-spectrum antimicrobial properties but also significant cytotoxicity against mammalian cells. To enhance the cell selectivity of melittin, this study synthesized five analogs by replacing leucine with its structural isomer, 6-aminohexanoic acid. Among these analogs, Mel-LX3 exhibited potent antibacterial activity against both Gram-positive and Gram-negative bacteria. Importantly, Mel-LX3 displayed significantly reduced hemolytic and cytotoxic effects compared to melittin. Mechanistic studies, including membrane depolarization, SYTOX green uptake, FACScan analysis, and inner/outer membrane permeation assays, demonstrated that Mel-LX3 effectively permeabilized bacterial membranes similar to melittin. Notably, Mel-LX3 showed robust antibacterial activity against methicillin-resistant Staphylococcus aureus (MRSA) and multidrug-resistant Pseudomonas aeruginosa (MDRPA). Furthermore, Mel-LX3 effectively inhibited biofilm formation and eradicated existing biofilms of MDRPA. With its improved selective antimicrobial and antibiofilm activities, Mel-LX3 emerges as a promising candidate for the development of novel antimicrobial agents. We propose that the substitution of leucine with 6-aminohexanoic acid in AMPs represents a significant strategy for combating resistant bacteria. Full article
Show Figures

Figure 1

14 pages, 2494 KB  
Article
BERTIVITS: The Posterior Encoder Fusion of Pre-Trained Models and Residual Skip Connections for End-to-End Speech Synthesis
by Zirui Wang, Minqi Song and Dongbo Zhou
Appl. Sci. 2024, 14(12), 5060; https://doi.org/10.3390/app14125060 - 10 Jun 2024
Cited by 1 | Viewed by 2492
Abstract
Enhancing the naturalness and rhythmicity of generated audio in end-to-end speech synthesis is crucial. The current state-of-the-art (SOTA) model, VITS, utilizes a conditional variational autoencoder architecture. However, it faces challenges, such as limited robustness, due to training solely on text and spectrum data [...] Read more.
Enhancing the naturalness and rhythmicity of generated audio in end-to-end speech synthesis is crucial. The current state-of-the-art (SOTA) model, VITS, utilizes a conditional variational autoencoder architecture. However, it faces challenges, such as limited robustness, due to training solely on text and spectrum data from the training set. Particularly, the posterior encoder struggles with mid- and high-frequency feature extraction, impacting waveform reconstruction. Existing efforts mainly focus on prior encoder enhancements or alignment algorithms, neglecting improvements to spectrum feature extraction. In response, we propose BERTIVITS, a novel model integrating BERT into VITS. Our model features a redesigned posterior encoder with residual connections and utilizes pre-trained models to enhance spectrum feature extraction. Compared to VITS, BERTIVITS shows significant subjective MOS score improvements (0.16 in English, 0.36 in Chinese) and objective Mel-Cepstral coefficient reductions (0.52 in English, 0.49 in Chinese). BERTIVITS is tailored for single-speaker scenarios, improving speech synthesis technology for applications like post-class tutoring or telephone customer service. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

Back to TopTop