Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement

Geng, Junhao; Jia, Dongyao; Li, Ziqi; He, Zihao; Wu, Nengkai; Zhang, Weijia; Cui, Rongtao

doi:10.3390/app152111412

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement

by

Junhao Geng

¹,

Dongyao Jia

^2,*,

Ziqi Li

²

,

Zihao He

²,

Nengkai Wu

²,

Weijia Zhang

² and

Rongtao Cui

²

¹

Beijing Research Institute of Automation for Machinery Industry Co., Ltd., No. 1 Jiaochangkou Deshengmenwai, Xicheng District, Beijing 100120, China

²

School of Automation and Intelligence, Beijing Jiaotong University, No. 3 Shangyuancun, Haidian District, Beijing 100044, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(21), 11412; https://doi.org/10.3390/app152111412 (registering DOI)

Submission received: 2 October 2025 / Revised: 21 October 2025 / Accepted: 21 October 2025 / Published: 24 October 2025

(This article belongs to the Special Issue Techniques and Applications of Natural Language Processing)

Download Versions Notes

Abstract

Speech recognition, as a key driver of artificial intelligence and global communication, has advanced rapidly in major languages, while studies on low-resource languages remain limited. Tongan, a representative Polynesian language, carries significant cultural value. However, Tongan speech recognition faces three main challenges: data scarcity, limited adaptability of transfer learning, and weak dictionary modeling. This study proposes improvements in adaptive transfer learning and NBPE-based dictionary modeling to address these issues. An adaptive transfer learning strategy with layer-wise unfreezing and dynamic learning rate adjustment is introduced, enabling effective adaptation of pretrained models to the target language while improving accuracy and efficiency. In addition, the MEA-AGA is developed by combining the Mind Evolutionary Algorithm (MEA) with the Adaptive Genetic Algorithm (AGA) to optimize the number of byte-pair encoding (NBPE) parameters, thereby enhancing recognition accuracy and speed. The collected Tongan speech data were expanded and preprocessed, after which the experiments were conducted on an NVIDIA RTX 4070 GPU (16 GB) using CUDA 11.8 under the Ubuntu 18.04 operating system. Experimental results show that the proposed method achieved a word error rate (WER) of 26.18% and a word-per-second (WPS) rate of 68, demonstrating clear advantages over baseline methods and confirming its effectiveness for low-resource language applications. Although the proposed approach demonstrates promising performance, this study is still limited by the relatively small corpus size and the early stage of research exploration. Future work will focus on expanding the dataset, refining adaptive transfer strategies, and enhancing cross-lingual generalization to further improve the robustness and scalability of the model.

Keywords: Tongan; low-resource speech recognition; transfer learning; layer-wise fine-tuning; dictionary optimization

Share and Cite

MDPI and ACS Style

Geng, J.; Jia, D.; Li, Z.; He, Z.; Wu, N.; Zhang, W.; Cui, R. Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement. Appl. Sci. 2025, 15, 11412. https://doi.org/10.3390/app152111412

AMA Style

Geng J, Jia D, Li Z, He Z, Wu N, Zhang W, Cui R. Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement. Applied Sciences. 2025; 15(21):11412. https://doi.org/10.3390/app152111412

Chicago/Turabian Style

Geng, Junhao, Dongyao Jia, Ziqi Li, Zihao He, Nengkai Wu, Weijia Zhang, and Rongtao Cui. 2025. "Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement" Applied Sciences 15, no. 21: 11412. https://doi.org/10.3390/app152111412

APA Style

Geng, J., Jia, D., Li, Z., He, Z., Wu, N., Zhang, W., & Cui, R. (2025). Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement. Applied Sciences, 15(21), 11412. https://doi.org/10.3390/app152111412

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI