MDPI - Publisher of Open Access Journals

13 pages, 366 KiB

Open AccessArticle

Speaker Recognition Based on the Joint Loss Function

by Tengteng Feng, Houbin Fan, Fengpei Ge, Shuxin Cao and Chunyan Liang

Electronics 2023, 12(16), 3447; https://doi.org/10.3390/electronics12163447 - 15 Aug 2023

Cited by 2 | Viewed by 1593

The statistical pyramid dense time-delay neural network (SPD-TDNN) model makes it difficult to deal with the imbalance of training data, poses a high risk of overfitting, and has weak generalization ability. To solve these problems, we propose a method based on the joint loss function and improved statistical pyramid dense time-delay neural network (JLF-ISPD-TDNN), which improves on the SPD-TDNN model and uses the joint loss function method to combine the advantages of the cross-entropy loss function and the comparative learning of the loss function. By minimizing the distance between speech embeddings from the same speaker and maximizing the distance between speech embeddings from different speakers, the model could achieve enhanced generalization performance and more robust speaker feature representation. We evaluated the proposed method’s performance using the evaluation indexes of the equal error rate (EER) and minimum cost function (minDCF). The experimental results show that the EEE and minDCF on the Aishell-1 dataset reached 1.02% and 0.1221%, respectively. Therefore, using the joint loss function in the improved SPD-TDNN model can significantly enhance the model’s speaker recognition performance. Full article

(This article belongs to the Special Issue Machine Learning in Music/Audio Signal Processing)

► Show Figures

Figure 1

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI