MDPI - Publisher of Open Access Journals

19 pages, 827 KB

Open AccessArticle

MLSL-Spell: Chinese Spelling Check Based on Multi-Label Annotation

by Liming Jiang, Xingfa Shen, Qingbiao Zhao and Jian Yao

Appl. Sci. 2024, 14(6), 2541; https://doi.org/10.3390/app14062541 - 18 Mar 2024

Cited by 1 | Viewed by 1716

Chinese spelling errors are commonplace in our daily lives, which might be caused by input methods, optical character recognition, or speech recognition. Due to Chinese characters’ phonetic and visual similarities, the Chinese spelling check (CSC) is a very challenging task. However, the existing CSC solutions cannot achieve good spelling check performance since they often fail to fully extract the contextual information and Pinyin information. In this paper, we propose a novel CSC framework based on multi-label annotation (MLSL-Spell), consisting of two basic phases: spelling detection and correction. In the spelling detection phase, MLSL-Spell uses the fusion vectors of both character-based pre-trained context vectors and Pinyin vectors and adopts the sequence labeling method to explicitly label the type of misspelled characters. In the spelling correction phase, MLSL-Spell uses Masked Language Mode (MLM) model to generate candidate characters, then performs corresponding screenings according to the error types, and finally screens out the correct characters through the XGBoost classifier. Experiments show that the MLSL-Spell model outperforms the benchmark model. On SIGHAN 2013 dataset, the spelling detection F1 score of MLSL-Spell is 18.3% higher than that of the pointer network (PN) model, and the spelling correction F1 score is 10.9% higher. On SIGHAN 2015 dataset, the spelling detection F1 score of MLSL-Spell is 11% higher than that of Bert and 15.7% higher than that of the PN model. And the spelling correction F1 of MLSL-Spell score is 6.8% higher than that of PN model. Full article

(This article belongs to the Special Issue Applications, Challenges and Future Direction of Natural Language Processing Based on Deep Learning)

► Show Figures

Figure 1

17 pages, 2936 KB

Open AccessArticle

Visual and Phonological Feature Enhanced Siamese BERT for Chinese Spelling Error Correction

by Yujia Liu, Hongliang Guo, Shuai Wang and Tiejun Wang

Appl. Sci. 2022, 12(9), 4578; https://doi.org/10.3390/app12094578 - 30 Apr 2022

Viewed by 2746

Abstract

Chinese Spelling Check (CSC) aims to detect and correct spelling errors in Chinese. Most CSC models rely on human-defined confusion sets to narrow the search space, failing to resolve errors outside the confusion set. However, most spelling errors in current benchmark datasets are character pairs in similar pronunciations. Errors in similar shapes and errors which are visually and phonologically irrelevant are not considered. Furthermore, widely-used automatically generated training data in CSC tasks leads to label leakage and unfair comparison between different methods. In this work, we propose a feature (visual and phonological) enhanced siamese BERT to (1) correct spelling errors without using confusion sets; (2) integrate phonological and visual features for CSC by a glyph graph; (3) improve performance for unseen spelling errors. To evaluate CSC methods fairly and comprehensively, we build a large-scale CSC dataset in which the number of samples in different error types is the same. The experimental results show that the proposed approach achieves better performance compared with previous state-of-the-art methods on three benchmark datasets and the new error-type balanced dataset. Full article

(This article belongs to the Topic Machine and Deep Learning)

► Show Figures

Figure 1

17 pages, 1742 KB

Open AccessArticle

Post Text Processing of Chinese Speech Recognition Based on Bidirectional LSTM Networks and CRF

by Li Yang, Ying Li, Jin Wang and Zhuo Tang

Electronics 2019, 8(11), 1248; https://doi.org/10.3390/electronics8111248 - 31 Oct 2019

Cited by 20 | Viewed by 4832

Abstract

With the rapid development of Internet of Things Technology, speech recognition has been applied more and more widely. Chinese Speech Recognition is a complex process. In the process of speech-to-text conversion, due to the influence of dialect, environmental noise, and context, the accuracy of speech-to-text in multi-round dialogues and specific contexts is still not high. After the general speech recognition technology, the text after speech recognition can be detected and corrected in the specific context, which is helpful to improve the robustness of text comprehension and is a beneficial supplement to the speech recognition technology. In this paper, a text processing model after Chinese Speech Recognition is proposed, which combines a bidirectional long short-term memory (LSTM) network with a conditional random field (CRF) model. The task is divided into two stages: text error detection and text error correction. In this paper, a bidirectional long short-term memory (Bi-LSTM) network and conditional random field are used in two stages of text error detection and text error correction respectively. Through verification and system test on the SIGHAN 2013 Chinese Spelling Check (CSC) dataset, the experimental results show that the model can effectively improve the accuracy of text after speech recognition. Full article

(This article belongs to the Special Issue AI Enabled Communication on IoT Edge Computing)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI