Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (9)

Search Parameters:
Keywords = voiceprint characteristics

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 11690 KB  
Article
Research on Vibration and Noise of Oil Immersed Transformer Considering Influence of Transformer Oil
by Xueyan Hao, Sheng Ma, Xuefeng Zhu, Yubo Zhang, Ruge Liu and Bo Zhang
Energies 2025, 18(23), 6155; https://doi.org/10.3390/en18236155 - 24 Nov 2025
Viewed by 927
Abstract
This study investigates the vibration and noise characteristics of oil-immersed power transformers, with a particular focus on the influence of transformer oil on structural dynamics and acoustic emission. The research integrates multi-physics modelling, finite-element simulation, and field measurements to analyze the vibration transmission [...] Read more.
This study investigates the vibration and noise characteristics of oil-immersed power transformers, with a particular focus on the influence of transformer oil on structural dynamics and acoustic emission. The research integrates multi-physics modelling, finite-element simulation, and field measurements to analyze the vibration transmission paths from the core and windings to the tank wall. A fluid–structure interaction (FSI) model is developed to account for the damping effect of insulating oil, and a correction factor is introduced to adjust modal parameters. Simulation results reveal that oil significantly enhances vibration propagation, especially in the vertical direction, while structural ribs and clamping configurations affect local vibration intensity. Noise simulations show that magnetostriction is the dominant source of audible sound, with harmonic components sensitive to load and voltage variations. Experimental validation using a portable sound level meter confirms the simulation trends and highlights the spatial variability of acoustic pressure. The findings provide a theoretical and practical basis for optimizing sensor placement and developing voiceprint-based diagnostic tools for transformer condition monitoring. Full article
Show Figures

Figure 1

23 pages, 1520 KB  
Article
Data Augmentation for Voiceprint Recognition Using Generative Adversarial Networks
by Yao-San Lin, Hung-Yu Chen, Mei-Ling Huang and Tsung-Yu Hsieh
Algorithms 2024, 17(12), 583; https://doi.org/10.3390/a17120583 - 18 Dec 2024
Cited by 2 | Viewed by 2410
Abstract
Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural networks (CNNs) with mel-frequency cepstral coefficients [...] Read more.
Voiceprint recognition systems often face challenges related to limited and diverse datasets, which hinder their performance and generalization capabilities. This study proposes a novel approach that integrates generative adversarial networks (GANs) for data augmentation and convolutional neural networks (CNNs) with mel-frequency cepstral coefficients (MFCCs) for voiceprint classification. Experimental results demonstrate that the proposed methodology improves recognition accuracy by up to 15% in low-resource scenarios. The optimal ratio of real-to-GAN-generated samples was determined to be 3:2, which balanced dataset diversity and model performance. In specific cases, the model achieved an accuracy of 96.6%, showcasing its effectiveness in capturing unique voice characteristics while mitigating overfitting. These results highlight the potential of combining GAN-augmented data and CNN-based classification to enhance voiceprint recognition in diverse and resource-constrained environments. Full article
(This article belongs to the Special Issue Machine Learning for Pattern Recognition (2nd Edition))
Show Figures

Figure 1

15 pages, 5545 KB  
Essay
Feature Extraction of Flow Sediment Content of Hydropower Unit Based on Voiceprint Signal
by Boyi Xiao, Yun Zeng, Wenqing Hu and Yuesong Cheng
Energies 2024, 17(5), 1041; https://doi.org/10.3390/en17051041 - 22 Feb 2024
Cited by 2 | Viewed by 1532
Abstract
The hydropower turbine parts running in the sand-bearing flow will experience surface wear, leading to a decline in the hydropower unit’s stability, mechanical performance, and efficiency. A voiceprint signal-based method is proposed for extracting the flow sediment content feature of the hydropower unit. [...] Read more.
The hydropower turbine parts running in the sand-bearing flow will experience surface wear, leading to a decline in the hydropower unit’s stability, mechanical performance, and efficiency. A voiceprint signal-based method is proposed for extracting the flow sediment content feature of the hydropower unit. Firstly, the operating voiceprint information of the hydropower unit is obtained, and the signal is decomposed by the Ensemble Empirical Mode Decomposition (EEMD) algorithm, and a series of intrinsic mode functions (IMFs) are obtained. Combined with correlation analysis, more sensitive IMF components are extracted and input into a convolutional neural network (CNN) for training, and the multi-dimensional output of the fully connected layer of CNN is used as the feature vector. The k-means clustering algorithm is used to calculate the eigenvector clustering center of the hydropower unit with a clean flow state and a high sediment content state, and the characteristic index of the hydropower unit sediment content is constructed based on the Euclidean distance method. We define this characteristic index as SI, and the change in the SI value can reflect the degree of sediment content in the flow of the unit. A higher SI value indicates a lower sediment content, while a lower SI value suggests a higher sediment content. Combined with the sediment voiceprint data of the test bench, when the water flow changed from clear water to high sediment flow (1.492 × 105 mg/L), the SI value decreased from 1 to 0.06, and when the water flow with high sediment content returned to clear water, the SI value returned to 1. The experiment proves the effectiveness of the method. The extracted feature index can be used to detect the flow sediment content of the hydropower unit and give early warning in time, so as to improve the maintenance level of the hydropower unit. Full article
(This article belongs to the Special Issue Fault Diagnosis and Control in Renewable Power Systems)
Show Figures

Figure 1

24 pages, 6594 KB  
Article
Voiceprint Fault Diagnosis of Converter Transformer under Load Influence Based on Multi-Strategy Improved Mel-Frequency Spectrum Coefficient and Temporal Convolutional Network
by Hui Li, Qi Yao and Xin Li
Sensors 2024, 24(3), 757; https://doi.org/10.3390/s24030757 - 24 Jan 2024
Cited by 10 | Viewed by 2534
Abstract
In order to address the challenges of low recognition accuracy and the difficulty in effective diagnosis in traditional converter transformer voiceprint fault diagnosis, a novel method is proposed in this article. This approach takes account of the impact of load factors, utilizes a [...] Read more.
In order to address the challenges of low recognition accuracy and the difficulty in effective diagnosis in traditional converter transformer voiceprint fault diagnosis, a novel method is proposed in this article. This approach takes account of the impact of load factors, utilizes a multi-strategy improved Mel-Frequency Spectrum Coefficient (MFCC) for voiceprint signal feature extraction, and combines it with a temporal convolutional network for fault diagnosis. Firstly, it improves the hunter–prey optimizer (HPO) as a parameter optimization algorithm and adopts IHPO combined with variational mode decomposition (VMD) to achieve denoising of voiceprint signals. Secondly, the preprocessed voiceprint signal is combined with Mel filters through the Stockwell transform. To adapt to the stationary characteristics of the voiceprint signal, the processed features undergo further mid-temporal processing, ultimately resulting in the implementation of a multi-strategy improved MFCC for voiceprint signal feature extraction. Simultaneously, load signal segmentation is introduced for the diagnostic intervals, forming a joint feature vector. Finally, by using the Mish activation function to improve the temporal convolutional network, the IHPO-ITCN is proposed to adaptively optimize the size of convolutional kernels and the number of hidden layers and construct a transformer fault diagnosis model. By constructing multiple sets of comparison tests through specific examples and comparing them with the traditional voiceprint diagnostic model, our results show that the model proposed in this paper has a fault recognition accuracy as high as 99%. The recognition accuracy was significantly improved and the training speed also shows superior performance, which can be effectively used in the field of multiple fault diagnosis of converter transformers. Full article
(This article belongs to the Special Issue Sensors and Fault Diagnostics in Power System)
Show Figures

Figure 1

20 pages, 5974 KB  
Article
Voiceprint Recognition under Cross-Scenario Conditions Using Perceptual Wavelet Packet Entropy-Guided Efficient-Channel-Attention–Res2Net–Time-Delay-Neural-Network Model
by Shuqi Wang, Huajun Zhang, Xuetao Zhang, Yixin Su and Zhenghua Wang
Mathematics 2023, 11(19), 4205; https://doi.org/10.3390/math11194205 - 9 Oct 2023
Cited by 5 | Viewed by 2951
Abstract
(1) Background: Voiceprint recognition technology uses individual vocal characteristics for identity authentication and faces many challenges in cross-scenario applications. The sound environment, device characteristics, and recording conditions in different scenarios cause changes in sound features, which, in turn, affect the accuracy of voiceprint [...] Read more.
(1) Background: Voiceprint recognition technology uses individual vocal characteristics for identity authentication and faces many challenges in cross-scenario applications. The sound environment, device characteristics, and recording conditions in different scenarios cause changes in sound features, which, in turn, affect the accuracy of voiceprint recognition. (2) Methods: Based on the latest trends in deep learning, this paper uses the perceptual wavelet packet entropy (PWPE) method to extract the basic voiceprint features of the speaker before using the efficient channel attention (ECA) block and the Res2Net block to extract deep features. The PWPE block removes the effect of environmental noise on voiceprint features, so the perceptual wavelet packet entropy-guided ECA–Res2Net–Time-Delay-Neural-Network (PWPE-ECA-Res2Net-TDNN) model shows an excellent robustness. The ECA-Res2Net-TDNN block uses temporal statistical pooling with a multi-head attention mechanism to weight frame-level audio features, resulting in a weighted average of the final representation of the speech-level feature vectors. The sub-center ArcFace loss function is used to enhance intra-class compactness and inter-class differences, avoiding classification via output value alone like the softmax loss function. Based on the aforementioned elements, the PWPE-ECA-Res2Net-TDNN model for speaker recognition is designed to extract speaker feature embeddings more efficiently in cross-scenario applications. (3) Conclusions: The experimental results demonstrate that, compared to the ECAPA-TDNN model using MFCC features, the PWPE-based ECAPA-TDNN model performs better in terms of cross-scene recognition accuracy, exhibiting a stronger robustness and better noise resistance. Furthermore, the model maintains a relatively short recognition time even under the highest recognition rate conditions. Finally, a set of ablation experiments targeting each module of the proposed model is conducted. The results indicate that each module contributes to an improvement in the recognition performance. Full article
Show Figures

Figure 1

27 pages, 22568 KB  
Article
Deep Learning with LPC and Wavelet Algorithms for Driving Fault Diagnosis
by Cihun-Siyong Alex Gong, Chih-Hui Simon Su, Yuan-En Liu, De-Yu Guu and Yu-Hua Chen
Sensors 2022, 22(18), 7072; https://doi.org/10.3390/s22187072 - 19 Sep 2022
Cited by 11 | Viewed by 4923
Abstract
Vehicle fault detection and diagnosis (VFDD) along with predictive maintenance (PdM) are indispensable for early diagnosis in order to prevent severe accidents due to mechanical malfunction in urban environments. This paper proposes an early voiceprint driving fault identification system using machine learning algorithms [...] Read more.
Vehicle fault detection and diagnosis (VFDD) along with predictive maintenance (PdM) are indispensable for early diagnosis in order to prevent severe accidents due to mechanical malfunction in urban environments. This paper proposes an early voiceprint driving fault identification system using machine learning algorithms for classification. Previous studies have examined driving fault identification, but less attention has focused on using voiceprint features to locate corresponding faults. This research uses 43 different common vehicle mechanical malfunction condition voiceprint signals to construct the dataset. These datasets were filtered by linear predictive coefficient (LPC) and wavelet transform(WT). After the original voiceprint fault sounds were filtered and obtained the main fault characteristics, the deep neural network (DNN), convolutional neural network (CNN), and long short-term memory (LSTM) architectures are used for identification. The experimental results show that the accuracy of the CNN algorithm is the best for the LPC dataset. In addition, for the wavelet dataset, DNN has the best performance in terms of identification performance and training time. After cross-comparison of experimental results, the wavelet algorithm combined with DNN can improve the identification accuracy by up to 16.57% compared with other deep learning algorithms and reduce the model training time by up to 21.5% compared with other algorithms. Realizing the cross-comparison of recognition results through various machine learning methods, it is possible for the vehicle to proactively remind the driver of the real-time potential hazard of vehicle machinery failure. Full article
(This article belongs to the Special Issue Feature Papers in Fault Diagnosis & Sensors Section 2022)
Show Figures

Figure 1

15 pages, 12136 KB  
Article
Target Speaker Extraction by Fusing Voiceprint Features
by Shidan Cheng, Ying Shen and Dongqing Wang
Appl. Sci. 2022, 12(16), 8152; https://doi.org/10.3390/app12168152 - 15 Aug 2022
Cited by 6 | Viewed by 5139
Abstract
It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation models adopted by these [...] Read more.
It is a critical problem to accurately separate clean speech in the multispeaker scenario for different speakers. However, in most cases, smart devices such as smart phones interact with only one specific user. As a consequence, the speech separation models adopted by these devices only have to extract the target speaker’s speech. A voiceprint, which reflects the speaker’s voice characteristics, provides prior knowledge for the target speech separation. Therefore, how to efficiently integrate voiceprint features into the existing speech separation models to improve their performance for the target speech separation is an interesting problem not fully explored. This paper attempts to solve this issue to some extent and our contributions are as follows. First, two different voiceprint features (i.e., MFCCs and d-vector) are explored in the performance enhancement for three speech separation models. Second, three different feature fusion methods are proposed to efficiently fuse the voiceprint features with the magnitude spectrograms originally used in the speech separation models. Third, a target speech extraction method which utilizes the fused features is proposed for two speaker-independent models. Experiments demonstrate that the speech separation models integrated with voiceprint features using three feature fusion methods can effectively extract the target speaker’s speech. Full article
(This article belongs to the Special Issue Advances in Speech and Language Processing)
Show Figures

Figure 1

15 pages, 6146 KB  
Article
Study on a Fault Identification Method of the Hydraulic Pump Based on a Combination of Voiceprint Characteristics and Extreme Learning Machine
by Wanlu Jiang, Zhenbao Li, Jingjing Li, Yong Zhu and Peiyao Zhang
Processes 2019, 7(12), 894; https://doi.org/10.3390/pr7120894 - 1 Dec 2019
Cited by 31 | Viewed by 4129
Abstract
Aiming at addressing the problem that the faults in axial piston pumps are complex and difficult to effectively diagnose, an axial piston pump fault diagnosis method that is based on the combination of Mel-frequency cepstrum coefficients (MFCC) and the extreme learning machine (ELM) [...] Read more.
Aiming at addressing the problem that the faults in axial piston pumps are complex and difficult to effectively diagnose, an axial piston pump fault diagnosis method that is based on the combination of Mel-frequency cepstrum coefficients (MFCC) and the extreme learning machine (ELM) is proposed. Firstly, a sound sensor is used to realize contactless sound signal acquisition of the axial piston pump. The wavelet packet default threshold denoises the original acquired sound signals. Afterwards, windowing and framing are added to the de-noised sound signals. The MFCC voiceprint characteristics of the processed sound signals are extracted. The voiceprint characteristics are divided into a training sample set and test sample set. ELM models with different numbers of neurons in the hidden layers are established for training and testing. The relationship between the number of neurons in the hidden layer and the recognition accuracy rate is obtained. The ELM model with the optimal number of hidden layer neurons is established and trained with the training sample set. The trained ELM model is applied to the test sample set for fault diagnosis. The fault diagnosis results are obtained. The fault diagnosis results of the ELM model are compared with those of the back propagation (BP) neural network and the support vector machine. The results show that the fault diagnosis method that is proposed in this paper has a higher recognition accuracy rate, shorter training and diagnosis times, and better application prospect. Full article
Show Figures

Figure 1

18 pages, 817 KB  
Article
A Novel Voice Sensor for the Detection of Speech Signals
by Kun-Ching Wang
Sensors 2013, 13(12), 16533-16550; https://doi.org/10.3390/s131216533 - 2 Dec 2013
Viewed by 6354
Abstract
In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the [...] Read more.
In order to develop a novel voice sensor to detect human voices, the use of features which are more robust to noise is an important issue. Voice sensor is also called voice activity detection (VAD). Due to that the inherent nature of the formant structure only occurred on the speech spectrogram (well-known as voiceprint), Wu et al. were the first to use band-spectral entropy (BSE) to describe the characteristics of voiceprints. However, the performance of VAD based on BSE feature was degraded in colored noise (or voiceprint-like noise) environments. In order to solve this problem, we propose the two-dimensional part-band energy entropy (TD-PBEE) parameter based on two variables: part-band partition number upon frequency index and long-term window size upon time index to further improve the BSE-based VAD algorithm. The two variables can efficiently represent the characteristics of voiceprints on each critical frequency band and use long-term information for noisy speech spectrograms, respectively. The TD-PBEE parameter can be regarded as a PBEE parameter over time. First, the strength of voiceprints can be partly enhanced by using four entropies applied to four part-bands. We can use the four part-band energy entropies for describing the voiceprints in detail. Due to the characteristics of non-stationary for speech and various noises, we will then use long-term information processing to refine the PBEE, so the voice-like noise can be distinguished from noisy speech through the concept of PBEE with long-term information. Our experiments show that the proposed feature extraction with the TD-PBEE parameter is quite insensitive to background noise. The proposed TD-PBEE-based VAD algorithm is evaluated for four types of noises and five signal-to-noise ratio (SNR) levels. We find that the accuracy of the proposed TD-PBEE-based VAD algorithm averaged over all noises and all SNR levels is better than that of other considered VAD algorithms. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Back to TopTop