Next Article in Journal
AMTCN: An Attention-Based Multivariate Temporal Convolutional Network for Electricity Consumption Prediction
Next Article in Special Issue
Quantum-Inspired Fusion for Open-Domain Question Answering
Previous Article in Journal
Measurements of Geometrical Quantities and Selection of Parameters in the Robotic Grinding Process of an Aircraft Engine
Previous Article in Special Issue
Prompt-Based End-to-End Cross-Domain Dialogue State Tracking
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:

Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction

Institute of Intelligent Information Processing, Beijing Information Science and Technology University, Beijing 100101, China
China Mobile Group Design Institute Co., Ltd., Beijing 100080, China
Computer School, Beijing Information Science and Technology University, Beijing 100101, China
Author to whom correspondence should be addressed.
Electronics 2024, 13(20), 4079;
Submission received: 6 September 2024 / Revised: 14 October 2024 / Accepted: 14 October 2024 / Published: 17 October 2024
(This article belongs to the Special Issue Data Mining Applied in Natural Language Processing)


Current mainstream for Chinese grammatical error correction methods rely on deep neural network models, which require a large amount of high-quality data for training. However, existing Chinese grammatical error correction corpora have a low annotation quality and high noise levels, leading to a low generalization ability of the models and difficulty in handling complex sentences. To address this issue, this paper proposes a dynamic assessment-based curriculum learning method for Chinese grammatical error correction. The proposed approach focuses on two key components: defining the difficulty of training samples and devising an effective training strategy. In the difficulty assessment phase, we enhance the accuracy of the curriculum sequence by dynamically updating the evaluation model. During the training strategy phase, a multi-stage dynamic progressive approach is employed to select training samples of varying difficulty levels, which helps prevent the model from prematurely converging to local optima and enhances the overall training effectiveness. Experimental results on the MuCGEC and NLPCC 2018 Chinese grammatical error correction datasets show that the proposed curriculum learning method significantly improves the model’s error correction performance, with F0.5 scores increasing by 0.9 and 1.05, respectively, validating the method’s effectiveness.

1. Introduction

The goal of grammatical error correction is to automatically detect and correct potential grammatical errors in the text and to output an error-free text. Compared to English and other languages, Chinese is characterized by polysemy, flexible word order, a lack of morphological changes, and ambiguous syntactic structures. Additionally, homophones are widely prevalent in Chinese, where the same word can have different pronunciations and meanings depending on the context. As a result, Chinese grammatical error correction has always been a highly challenging task in the field of natural language processing [1].
In recent years, the advent of pre-trained language models has led to a shift from traditional rule-based methods to approaches based on deep neural networks for Chinese grammatical error correction, making it a prominent area of research. Methods such as the discriminative model-based Seq2Edit and the generative model-based Seq2Seq approach [2] learn grammatical rules in text by training large-scale “error-correct” parallel sentence pairs, achieving automatic identification and correction of grammatical errors. These methods often require a large amount of training data. However, annotating such data requires high expertise from annotators, and the cost of constructing high-quality Chinese grammatical error correction corpora is extremely high. Currently, open-source Chinese grammatical error correction training corpora mainly come from foreign language learning websites like Lang-8 and the Chinese Proficiency Test HSK [3]. These corpora rely on user-generated annotations, which vary in quality and contain a significant amount of noise.
Curriculum learning [4] is a staged, ordered learning method that divides tasks into multiple phases and gradually increases difficulty and complexity. This approach leverages limited high-quality corpora to maximize the efficiency and effectiveness of model learning, thereby improving correction performance. To this end, we proposes a Dynamic Assessment of Curriculum Learning (DACL) method for Chinese grammatical error correction. Specifically, in the difficulty assessment part of curriculum learning, a pre-trained language model is used to calculate the perplexity of source sentences in the training samples, which serves as the complexity score for each sentence. After each training stage, model weights are dynamically updated, and the updated model is used as the new evaluation model. This dynamically enhances the accuracy of the model’s assessment, optimizing the difficulty ranking of the training samples. In the training strategy part of curriculum learning, a multi-stage progressive training is conducted. Training samples are dynamically selected and trained in an order from easy to difficult.
Our contributions are summarized as follows:
  • To the best of our knowledge, our work is the first to introduce curriculum learning in the task of Chinese grammatical error correction.
  • We propose a novel method for dynamically assessing sentence difficulty in curriculum learning and dynamically adjusting the training sample order based on changes in sentence difficulty. This phased and systematic learning process improves the error correction performance of Chinese grammatical error correction models.
  • The automatic evaluation on two Chinese grammatical error correction datasets shows that our proposed model is superior to the strong baselines, significantly improving the effectiveness of Chinese grammatical error correction tasks.

2. Related Work

2.1. Chinese Grammatical Error Correction

The Chinese grammatical error correction task can be seen as a machine translation task [5], translating grammatically incorrect sentences into grammatically correct and fluent sentences using a sequence-to-sequence (Seq2Seq) method. In recent years, many Seq2Seq-based methods have emerged. For example, Junczys-Dowmunt et al. [6] treated the grammatical error correction task as a low-resource machine translation task and adapted several low-resource machine translation methods for grammatical error correction. Zhao et al. [7] used a copy-augmented framework that allows for copying unchanged tokens from the source sentence to the target sentence. Kaneko et al. [8] explored how to effectively incorporate pre-trained knowledge into the encoder–decoder framework. Yang et al. [9] proposed adding a grammatical error detection module to guide the corrector model’s decoder to generate more accurate sentences, thus improving model performance. Li et al. [10] proposed a novel sequence-to-action module, combining the advantages of Seq2Edit and Seq2Seq while mitigating their shortcomings, resulting in a good error correction performance. Zhang et al. [11] incorporated syntactic information into the grammatical error correction model using a syntactic parser and a graph neural network. These methods improve the model’s error correction performance by fine-tuning the encoder and decoder structures or by incorporating external knowledge into the model.
Notably, several data augmentation methods have shown good results in grammatical error correction tasks. For example, Wang et al. [12] proposed a rule-based editing method to construct noisy texts, while Wan et al. [13] introduced noise into sentence representations and applied a Seq2Seq model to generate sentences with various types of errors. Yue et al. [14] proposed a novel conditional non-autoregressive error generation model for generating Chinese grammatical errors. These methods improve model performance by constructing pseudo-data through noise injection into sentences. However, the constructed pseudo-data often contains a significant amount of data that differs from human-made errors or includes correct sentences. To address this issue, Cao et al. [15] introduced a contrastive learning method that encourages grammatical error correction models to assign higher probabilities to correct sentences while reducing the model’s tendency to generate incorrect sentences, thereby improving accuracy. Building on Cao’s work, He et al. [16] proposed a dynamic negative example construction method for grammatical error correction using contrastive learning. However, these methods do not consider the difficulty of sentences, resulting in a lack of focus during model learning and making it difficult to correct more complex sentences. This paper addresses this issue by designing a dynamic difficulty assessment and introducing a curriculum learning method to optimize the learning process.

2.2. Curriculum Learning

Curriculum learning is a machine learning strategy designed to simulate human learning by sequentially learning data samples from simple to complex, helping the model better understand and generalize the data. Its core idea is to begin with simpler samples and gradually move to more complex ones, allowing the model to adapt progressively to new tasks and enhance its understanding of the data.
Most of the prior work on curriculum learning has focused on machine translation and natural language understanding. Mohiuddin et al. [17] proposed a data-selection-based curriculum learning framework for neural machine translation. Lu et al. [18] introduced a curriculum learning approach for unsupervised neural machine translation. Grammatical error correction can also be considered a machine translation task. Therefore, by referencing the application of curriculum learning in machine translation, this method can also be adapted to grammatical error correction. Introducing curriculum learning to a new task involves the challenge of evaluating data difficulty. Previous studies by Platanios and Zhang et al. [19,20] have used criteria such as sentence length or word rarity to evaluate difficulty. However, relying solely on these features does not fully capture the complexity of the task. In Chinese grammatical error correction, data difficulty is influenced by many factors, such as sentence length, sentence structure complexity, and error types.
The curriculum learning difficulty assessment method proposed in this paper focused on the continuously iterating and optimizing the “teacher” model. This model leverages its deep understanding and predictive ability to calculate the perplexity of each sentence, which serves as the basis for evaluating sentence difficulty. Notably, the perplexity calculation process naturally incorporates the factor of sentence length, making the assessment results more comprehensive. Our method not only leverages the advantages of pre-trained models in semantic and grammatical knowledge, but also considers multiple dimensions, including sentence length, thereby addressing the limitations of previous approaches that relied solely on sentence features for difficulty evaluation. This approach allows for more precise sample selection from simple to complex during curriculum training, thereby enhancing the effectiveness of curriculum learning. The experimental results validate the reasonableness and effectiveness of this assessment method.

3. Methods

3.1. Problem Definition

Grammatical error correction Let X = x 1 , x 2 , · · · , x n be an input text sequence containing n words, where x i represents the i-th word in the text. The goal of the Seq2Seq generative model is to transform this input sequence X into a corrected output sequence Y = y 1 , y 2 , · · · , y m , where m may be greater than, equal to, or less than n, as the correction process may involve the addition, replacement, or deletion of words or punctuation marks. In the Seq2Seq model, the grammatical error correction task can be viewed as an encoder–decoder problem. The encoder maps the input sequence X to a fixed-dimensional vector c, which contains the semantic information of the input sequence. The decoder then generates the output sequence Y based on this vector c.
Formally, the encoder and decoder can be represented as follows:
c = f ( x 1 , x 2 , · · · , x n )
P ( Y X ) = t = 1 m P y t y 1 , y 2 , , y t 1 , c
Here, f is the encoder function, P = P ( Y X ) is the conditional probability of the output sequence Y given the input sequence X, and P y t y 1 , y 2 , , y t 1 , c is the conditional probability of generating the word y t at time t.

3.2. Method Framework

The overall framework of the DACL method is shown in Figure 1, mainly including the curriculum learning module and the Seq2Seq correction module. The curriculum learning module is primarily used for curriculum design and model training, while the Seq2Seq correction module is mainly used to optimize model parameters during the fine-tuning phase and output the probability distribution of predicted sentences during the inference phase. Both the teacher model and the student model adopt the Seq2Seq architecture. For a parallel corpus containing erroneous–correct sentence pairs, D = e 1 , c 1 , e 2 , c 2 , · · · , e i , c i , where and denote the i-th sentence potentially containing grammatical errors and its corresponding correct sentence, respectively. First, the initial training set is input into the initial “teacher” model. Using the difficulty assessment method, each data point in D is evaluated for difficulty. At the r-th stage of curriculum training, D is divided into k subsets D ^ r = ( D ^ r 1 , D ^ r 2 , · · · , D ^ r k ) based on the difficulty distribution of the data, where r k and k are hyperparameters, with each subset representing a difficulty level. Finally, courses are assigned to the “student” model in order of increasing difficulty. The first subset D ^ 11 , obtained from the first assessment, serves as the training data for the first stage. The second subset D ^ 22 , obtained from the second assessment, serves as the training data for the second stage, and so on. After completing each stage, the “student” model replaces the old “teacher” model, becoming the new “teacher” model. Based on learning performance, the difficulty of the next subset is evaluated, and the subsequent courses are chosen for the next stage of learning, ultimately completing the model training.

3.3. Seq2Seq Grammar Correction

Chinese grammatical error correction is modeled as a sequence-to-sequence task, with the generative Transformer architecture BART [21] model chosen as the baseline. This model comprises an encoder and a decoder. The source and target sentences are first input into the embedding layer, where they are converted into word vector representations, after the encoder encodes the word vectors of the source sentences. The decoder takes the encoder’s final hidden state and the word vectors of the target sentences as input. After decoding, it outputs the final hidden state of the decoder. This is passed through a linear layer and normalization to produce the probability distribution of the corrected sentence. The objective function during training is to minimize the teacher-forced negative log-likelihood loss, expressed as:
L θ = l o g t = 1 n P Y t Y < t ; X ; θ
Here, θ represents the trainable model parameters and is the source sentence, Y = y 1 , y 2 , · · · , y n is the target sentence consisting of n tokens, and Y < t = y 1 , y 2 , · · · , y t 1 represents the tokens visible at the t-th training time step. During inference, the most probable output sequence Y * is found by maximizing the conditional probability P ( Y * X ; θ ) .
Previous studies have shown that pre-trained models like T5 [22] and BART exhibit strong capabilities in generative tasks and have been widely applied in grammatical error correction. In this work, the BART model is selected to establish the baseline, and a curriculum learning method based on dynamic assessment (DACL) is designed to train the model, thereby improving the performance of Chinese grammatical error correction.

3.4. Curriculum Learning with Dynamic Assessment

3.4.1. Difficulty Assessment

In Chinese grammatical error correction, the distribution of ungrammatical sentences is highly variable. It is difficult for humans to establish reasonable difficulty assessment standards based on prior knowledge, and there is inconsistency between human judgment and pre-trained models regarding sentence difficulty. As the effectiveness of curriculum learning depends on accurate difficulty ranking, we decided to let the model itself assess the difficulty. The difficulty assessment process is shown in Figure 2, the teacher model first evaluates the difficulty of the input source sentences. Then, by calculating the perplexity and performing K-means clustering, the sentences are divided into subsets of varying difficulty levels. Specifically, we use the perplexity of each source sentence as a metric to measure the data difficulty. The perplexity of a sentence by the model is an indicator of the language model’s uncertainty in predicting the sentence. Intuitively, the lower the perplexity, the more confident the model is in its prediction. Lower perplexity indicates that the model finds it easier to understand the sentence.
For a dataset D with i pairs of correct and erroneous sentences, represented as D = e 1 , c 1 , e 2 , c 2 , · · · , e i , c i , the difficulty of the data is measured by calculating the perplexity of the sentences using the model. The formula for calculating perplexity is:
P P L i = e L ( e i )
Here, L is the sum of the cross-entropy losses of all tokens in the i-th source sentence.
After calculating the perplexity of all source sentences in the dataset D, the cumulative distribution function (CDF) is used to map the distribution of perplexity to the range (0, 1]:
p i = C D F P P L i
In this mapping, more difficult data tend to score closer to 1, while simpler data are often closer to 0. Therefore, this data distribution can be regarded as the difficulty scores assigned by the model to the sentences.
Next, the difficulty scores are used to perform K-means clustering in the normalized score space:
D ^ 1 , D ^ 2 , · · · , D ^ k = K -means ( p i n i = 1 , k )
Here, D ^ k represents the k-th cluster subset, containing all sentences with difficulty scores belonging to that cluster.
D ^ j = ( e j 1 , c j 1 ) , ( e j 2 , c j 2 ) , · · · , ( e j | D ^ j | , c j | D ^ j | )
To ensure that the “student” model learns in increasing order of difficulty, the difficulty scores of sentences within each subset are also arranged in ascending order. For each subset: p j 1 p j 2 · · · p j | D ^ j | .
Initial Assessment In the DACL method, GPT-2 [23] is used as the initial “teacher” model. This model is trained on a vast number of sentences, far more than the parallel sentences used to train grammatical error correction models, and GPT-2 has a deep understanding of language structure, including grammar, semantics, and contextual relationships. Therefore, it can be used to assess the difficulty of source sentences purely from a linguistic perspective, ensuring the quality of the initial assessment. This approach helps the “student” model to better learn how to correct relatively simple sentences in the early stages of learning. Additionally, this approach benefits the “student” model in making more accurate judgments in subsequent difficulty assessments after it replaces the “teacher” model. Specifically, we input the initial training set D into a randomly initialized GPT-2 model, calculate the perplexity of each source sentence, and perform CDF normalization to obtain the difficulty score for each data point. Then, using the K-means clustering algorithm, we preliminarily divide the training set into k subsets based on difficulty distribution, with each subset representing an approximately stepped difficulty level.
Iterative Assessment The subsets obtained from the initial assessment are directly used for training the “student” model. As shown in Figure 1, at the end of each training phase, the parameter-updated “student” model dynamically replaces the “teacher” model to reassess the initial training set, reordering and subdividing it into new subsets for the next round of training. As illustrated in Figure 2, this process iteratively enhances the “student” model’s error correction ability and the “teacher” model’s assessment ability in each subsequent round until the “student” model completes all lessons.
Compared with previous methods, the advantages of DACL are as follows: (1) it does not require manual design of data difficulty assessment metrics, and (2) the quality of data difficulty assessment can improve as the model trains.

3.4.2. Training Strategy

In the curriculum learning process, a “self-learning” mode is adopted, where the “student” model plays both the role of the learner and the instructor. After the initial model completes the training of the first difficulty subset D ^ 11 , the updated model immediately transforms into a new “teacher model” to reassess the difficulty of the initial training set and subdivide it into new subsets. According to the method described in Section 3.4.1, each time the model learns a new subset, it iteratively applies this method. Overall, the “student” model’s learning sequence progresses from easy to difficult. As the model’s capabilities continuously improve, it becomes increasingly reasonable at assessing higher difficulty subsets. This process continues until the “student” model completes all the courses. To alleviate the phenomenon of forgetting early learned knowledge in curriculum learning, an additional training phase is added after the curriculum training ends. In this phase, the final “student” model reassesses and reorders the initial training set by difficulty, generating a complete training set for retraining. This method ensures that the model not only gradually enhances its ability to handle complex data, but also effectively reviews and consolidates previously learned knowledge in the final stage, thereby maximizing the model’s overall performance.

4. Results and Analysis

4.1. Experimental Setup

4.1.1. Dataset

The training set used in this paper consists of the open-source Chinese corpora lang8 and HSK, totaling 1,568,885 sentence pairs. Both training sets are annotated by Chinese learners, which results in low quality and a lot of noise; therefore, data preprocessing is required before training. After data cleaning, invalid data such as overly short sentences, sentences containing emojis, and garbled text were removed. Initially, the gpt2-chinese-cluecorpussmall model was used as the initial assessment model. Utilizing its powerful language understanding capabilities, data with abnormal difficulty levels were filtered out. These abnormal data often had extremely ambiguous semantics or were interspersed with multiple punctuation marks, making them difficult for both the model and humans to understand. After completing the aforementioned preprocessing, the final data used for training consisted of 1,389,672 sentence pairs.
The validation set was sourced from MuCGEC and evaluated on both the MuCGEC and NLPCC 2018 test sets [24,25]. Details are shown in Table 1:

4.1.2. Comparison Models

This paper compares several methods that have made significant progress in Chinese grammatical error correction tasks in recent years. These methods include the generative model BART and TransGEC [26], the Large Language Model GPT-3.5, the sequence labeling models MaskGEC [27], PGKG [28], and GECToR [29], and the S2A [10] model, which combines generative and sequence labeling approaches. Specifically,
  • BART combines the advantages of BERT and GPT, allowing it to use a bidirectional encoder to capture global information and an autoregressive decoder to generate coherent output. It has proven to be a strong baseline in grammatical error correction tasks.
  • MaskGEC enhances model robustness by generating new training samples by randomly masking some words in the source sentences during training. These masked words are replaced by target words, enhancing the model’s robustness to different error patterns.
  • PGKG enhances the grammatical error correction capability of pre-trained language models by incorporating a grammatical knowledge graph. Treating grammatical error correction as a sequence labeling task, this model has demonstrated strong performance in both English and Chinese grammatical error correction.
  • GECToR is fine-tuned on the StructBERT-base-Chinese pre-trained model and applied to Chinese grammatical error correction.
  • S2A combines the strengths of Seq2Seq and sequence labeling models, effectively reducing overcorrection and thereby improving error correction performance.
  • TransGEC uses human-translated texts as data augmentation input for grammar error correction. Compared to texts by native speakers, translated texts are closer in style to non-native texts while maintaining higher quality. Experiments show that TransGEC performs excellently across multiple grammar correction test sets in English, Chinese, German, and Russian, with significant improvements in vocabulary replacement, missing words, and common word correction.
  • GPT-3.5 has demonstrated remarkable capabilities across various tasks in natural language processing. The experimental data used in this paper is sourced from Fang et al. [30], who designed zero-shot chain-of-thought (CoT) and few-shot CoT prompting methods to comprehensively evaluate ChatGPT’s capabilities in grammatical error correction.

4.1.3. Experimental Parameters

Using the Chinese BART-initialized Transformer model proposed by Shao et al. [31], the specific parameter and hyperparameter settings are as follows: both the encoder and decoder consist of 12 identical layers, with 16 attention heads in the multi-head self-attention layer, and the feed-forward network dimension is 4096. The word embedding dimensions for both the source and target are 512. Dropout is applied on both the encoder and decoder with a probability of 0.1. The model uses the Adam optimizer with an initial learning rate of 3 × 10−6, β set to (0.9, 0.999), and a polynomial learning rate decay scheme. The batch size is set to 32. As shown in Table 2, after multiple experiments and validations, it was found that setting the number of clusters for K-means clustering to 3, i.e., k = 3, yielded the best results.

4.2. Experimental Results and Analysis

Table 3 presents the experimental results of our DACL method and other baseline methods on the NLPCC 2018 shared task evaluation dataset, with the best results highlighted in bold. Compared with other methods, DACL achieved a recall rate of 34.20 and an F0.5 score of 40.02, the best results among all models in the table. The DACL method, through progressive training from easy to difficult, gradually adapts the model to understand data characteristics, enhances generalization ability, reduces overfitting, improves model stability, and strengthens the ability to handle complex samples. Compared to the baseline, DACL can improve the recall rate without lowering precision, thus enhancing overall model performance and proving the effectiveness of the DACL method. GPT-3.5 achieved a recall rate of 39.40, demonstrating its strong error detection capability. However, the F0.5 score of only 28.70 also reflects that current large language models still lag behind traditional correction models in the field of Chinese error correction. The PGKG method, using a masking strategy based on word entity-related subgraphs, significantly improved precision to 47.02, but its recall rate of 22.76 was the lowest among all of the methods. Therefore, its final F0.5 score was lower than that of DACL.
Additionally, to verify the necessity of the model learning in an easy-to-difficult course order, we set up two comparison experiments with a reverse curriculum and random curriculum. This involves arranging the subsets in reverse order and random order based on difficulty, and training the model according to these sequences.
The experimental results are shown in Table 4, where RC (Reverse Curriculum) represents the reverse curriculum method and DOC (Disorder Curriculum) represents the random curriculum method.
As shown in Table 4, when the model learns in reverse order, from difficult to easy, both recall and F0.5 scores decrease to varying degrees. When the model learns in a random order of subsets, the recall and F0.5 scores also decrease, but the precision slightly increases. The reason for this phenomenon may be that while the DACL method helps the model better learn the characteristics of data with different difficulty levels, it is also affected by various noises in the data, leading to a decrease in precision. Overall, the easy-to-difficult learning order has a significant positive impact on the model’s F0.5 score. This set of experiments verifies the rationality of our curriculum arrangement method.

4.3. Ablation Study

To demonstrate the effectiveness of DACL, we conducted the following ablation experiments:
  • Baseline Model: The first experiment uses a Transformer model initialized with Chinese BART parameters as a baseline model for error correction of source sentences.
  • DACL Method: The second experiment applies the full DACL method.
  • DACL without Multi-Stage Curriculum: The third experiment is the DACL method without the multi-stage curriculum learning component.
  • DACL without Dynamic Difficulty Assessment: The fourth experiment is the DACL method without the dynamic difficulty assessment component.
The specific experimental results are shown in Table 5:
In Table 5, w/o CL represents the DACL method with the multi-stage curriculum learning part removed. Specifically, the training set is sorted by initial difficulty, and the model is fine-tuned in a single stage. As shown in the figure, the performance significantly declines after removing the CL component, demonstrating that CL is a key part of our method. w/o DA represents the DACL method with the dynamic difficulty assessment part removed. As shown in Table 5, the F0.5 scores of the DACL method have improved on two different evaluation sets, which verifies the effectiveness and robustness of the DACL method in enhancing the model’s error correction performance. The precision increases when the multi-stage curriculum learning part is removed, possibly because during multi-stage learning, the error correction model learned some low-quality noisy sentences that were evaluated as difficult.

4.4. Case Study

We selected two relatively complex sentences from the MuCGEC test set as follows:
Sentence 1: 如果,在另一方面,你打破自我的壁垒,心变得一个巨大的湖,和那把盐会对它的味道没有影响。
Translation 1: If, on the other hand, you break down the barriers of the self, and your heart turns into a vast lake, and that pinch of salt will have no effect on its taste.
Sentence 2: 不仅如此,前途教育也能起到解决韩国教育问题的一种可行之道。
Translation 2: Not only that, prospect education can also play a role in solving the education problems in South Korea.
As shown in Table 6, BART-Large failed to identify and correct the first error. GECToR identified and corrected two errors but changed the original meaning of the sentence when correcting the second error. In contrast, DACL correctly corrected the sentence.
As shown in Table 7, BART-Large failed to identify and correct the error. Although GECToR identified and corrected the error, it introduced a new error due to overcorrection. Similarly, the method in this paper correctly corrected the error in this sentence. From these two simple examples, it can be seen that DACL can improve the error detection and correction abilities of the error correction model, thereby increasing the success rate of correcting complex sentences.

5. Conclusions

This study introduces a novel approach to Chinese grammatical error correction by incorporating curriculum learning strategies, resulting in the development of a training method called DACL, which is based on dynamic sentence difficulty assessment. This DACL method calculates the perplexity of source sentences using continuously updated models, generating a score that reflects the sentence difficulty. Based on this score, sentences are reordered from easy to difficult, and the training set is divided into multiple subsets of different difficulty levels using the K-means clustering algorithm. The model sequentially trains these subsets in increasing order of difficulty, enhancing its ability to assess difficulty after each training phase, thereby more accurately evaluating data difficulty. This strategy can overcome the issues of data scarcity and noise in Chinese grammatical error correction, significantly optimizing the model’s performance.
Experimental results on two public datasets show that, compared with baseline methods, the proposed DACL method achieves significant improvements in precision, recall, and F0.5 score, and demonstrates a clear advantage over existing similar methods. Comparative and ablation experiments prove the effectiveness of the proposed dynamic evaluation curriculum learning method for the Chinese grammatical error correction task.We have observed that many tasks in other fields also require text correction. For example, in the animal movement prediction task [32], a correction step is needed during the data preprocessing phase. In the future, we will further explore how to apply the DACL method more effectively to these downstream tasks.
The DACL method is a model-agnostic framework. However, considering the limitations of generative models in error detection and correction, future research will focus on effectively extending our method framework to multi-model ensemble methods to reduce or suppress under-correction and over-correction, further improving the quality and reliability of error correction.

Author Contributions

Conceptualization, Z.M. and R.D.; methodology, Z.M. and R.D.; validation, Z.M., R.D., Y.Z. and X.L.; data curation, Z.M.; writing—original draft preparation, Z.M.; writing—review and editing, Z.M. and R.D.; supervision, R.D., Z.D. and Z.M. All authors have read and agreed to the published version of the manuscript.


National Science Foundation of Beijing (L233008). National Natural Science Foundation of 369 China (No. 62176023). National Science Foundation of Beijing (No. 4224090).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available in the article.

Conflicts of Interest

Author Zhigang Ding was employed by the company China Mobile Group Design Institute Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.


  1. Zhang, Y.; Zhang, B.; Jiang, H.; Li, Z.; Li, C.; Huang, F.; Zhang, M. NaSGEC: A Multi-Domain Chinese Grammatical Error Correction Dataset from Native Speaker Texts. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 9935–9951. [Google Scholar] [CrossRef]
  2. Zhou, H.; Liu, Y.; Li, Z.; Zhang, M.; Zhang, B.; Li, C.; Zhang, J.; Huang, F. Improving Seq2Seq Grammatical Error Correction via Decoding Interventions. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2023, Toronto, ON, Canada, 9–14 July 2023; Bouamor, H., Pino, J., Bali, K., Eds.; Association for Computational Linguistics: Singapore, 2023; pp. 7393–7405. [Google Scholar] [CrossRef]
  3. Zhang, B. Features and functions of the HSK dynamic composition corpus. Int. Chin. Lang. Educ. 2009, 4, 71–79. [Google Scholar]
  4. Bengio, Y.; Louradour, J.; Collobert, R.; Weston, J. Curriculum learning. In Proceedings of the 26th Annual International Conference on Machine Learning, ICML ’09, New York, NY, USA, 14–18 June 2009; pp. 41–48. [Google Scholar] [CrossRef]
  5. Brockett, C.; Dolan, B.; Gamon, M. Correcting ESL Errors Using Phrasal SMT Techniques. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, Sydney, Australia, 17–18 July 2006; Association for Computational Linguistics: Stroudsburg, PA, USA, 2006. [Google Scholar]
  6. Junczys-Dowmunt, M.; Grundkiewicz, R.; Guha, S.; Heafield, K. Approaching Neural Grammatical Error Correction as a Low-Resource Machine Translation Task. arXiv 2018, arXiv:1804.05940. [Google Scholar]
  7. Zhao, W.; Wang, L.; Shen, K.; Jia, R.; Liu, J. Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data. arXiv 2019, arXiv:1903.00138. [Google Scholar]
  8. Kaneko, M.; Mita, M.; Kiyono, S.; Suzuki, J.; Inui, K. Encoder-Decoder Models Can Benefit from Pre-trained Masked Language Models in Grammatical Error Correction. arXiv 2020, arXiv:2005.00987. [Google Scholar]
  9. Yang, L.; Li, H.; Li, L.; Xu, C.; Xia, S.; Yuan, C. LET: Leveraging Error Type Information for Grammatical Error Correction. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 5986–5998. [Google Scholar] [CrossRef]
  10. Li, J.; Guo, J.; Zhu, Y.; Sheng, X.; Jiang, D.; Ren, B.; Xu, L. Sequence-to-Action: Grammatical Error Correction with Action Guided Sequence Generation. Proc. AAAI Conf. Artif. Intell. 2022, 36, 10974–10982. [Google Scholar] [CrossRef]
  11. Zhang, Y.; Zhang, B.; Li, Z.; Bao, Z.; Li, C.; Zhang, M. SynGEC: Syntax-Enhanced Grammatical Error Correction with a Tailored GEC-Oriented Parser. arXiv 2022, arXiv:2210.12484. [Google Scholar]
  12. Wang, L.; Zhao, W.; Jia, R.; Li, S.; Liu, J. Denoising based Sequence-to-Sequence Pre-training for Text Generation. arXiv 2019, arXiv:1908.08206. [Google Scholar]
  13. Wan, Z.; Wan, X.; Wang, W. Improving Grammatical Error Correction with Data Augmentation by Editing Latent Representation. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; Scott, D., Bel, N., Zong, C., Eds.; International Committee on Computational Linguistics: Barcelona, Spain, 2020; pp. 2202–2212. [Google Scholar] [CrossRef]
  14. Yue, T.; Liu, S.; Cai, H.; Yang, T.; Song, S.; Yu, T. Improving Chinese Grammatical Error Detection via Data augmentation by Conditional Error Generation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2022, Dublin, Ireland, 22–27 May 2022; Muresan, S., Nakov, P., Villavicencio, A., Eds.; Association for Computational Linguistics: Dublin, Ireland, 2022; pp. 2966–2975. [Google Scholar] [CrossRef]
  15. Cao, H.; Yang, W.; Ng, H.T. Grammatical Error Correction with Contrastive Learning in Low Error Density Domains. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 16–20 November 2021; Moens, M.F., Huang, X., Specia, L., Yih, S.W.t., Eds.; Association for Computational Linguistics: Punta Cana, Dominican Republic, 2021; pp. 4867–4874. [Google Scholar] [CrossRef]
  16. Junyi, H.; Junbin, Z.; Xia, L. Dynamic Negative Example Construction for Grammatical Error Correction using Contrastive Learning. In Proceedings of the 21st Chinese National Conference on Computational Linguistics, Nanchang, China, 14–16 October 2022; Sun, M., Liu, Y., Che, W., Feng, Y., Qiu, X., Rao, G., Chen, Y., Eds.; Springer: Nanchang, China, 2022; pp. 945–957. [Google Scholar]
  17. Mohiuddin, T.; Koehn, P.; Chaudhary, V.; Cross, J.; Bhosale, S.; Joty, S. Data Selection Curriculum for Neural Machine Translation. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; Goldberg, Y., Kozareva, Z., Zhang, Y., Eds.; Association for Computational Linguistics: Abu Dhabi, United Arab Emirates, 2022; pp. 1569–1582. [Google Scholar] [CrossRef]
  18. Lu, J.; Zhang, J. Exploiting Curriculum Learning in Unsupervised Neural Machine Translation. arXiv 2021, arXiv:2109.11177. [Google Scholar]
  19. Platanios, E.A.; Stretcu, O.; Neubig, G.; Póczos, B.; Mitchell, T.M. Competence-based Curriculum Learning for Neural Machine Translation. arXiv 2019, arXiv:1903.09848. [Google Scholar]
  20. Zhang, X.; Shapiro, P.; Kumar, G.; McNamee, P.; Carpuat, M.; Duh, K. Curriculum Learning for Domain Adaptation in Neural Machine Translation. arXiv 2019, arXiv:1905.05816. [Google Scholar]
  21. Rothe, S.; Mallinson, J.; Malmi, E.; Krause, S.; Severyn, A. A Simple Recipe for Multilingual Grammatical Error Correction. arXiv 2021, arXiv:2106.03830. [Google Scholar]
  22. Katsumata, S. Stronger Baselines for Grammatical Error Correction Using a Pretrained Encoder-Decoder Model. J. Nat. Lang. Process. 2021, 28, 276–280. [Google Scholar] [CrossRef]
  23. Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
  24. Zhang, Y.; Li, Z.; Bao, Z.; Li, J.; Zhang, B.; Li, C.; Huang, F.; Zhang, M. MuCGEC: A Multi-Reference Multi-Source Evaluation Dataset for Chinese Grammatical Error Correction. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; Carpuat, M., de Marneffe, M.C., Meza Ruiz, I.V., Eds.; Association for Computational Linguistics: Seattle, WA, USA, 2022; pp. 3118–3130. [Google Scholar] [CrossRef]
  25. Zhao, Y.; Jiang, N.; Sun, W.; Wan, X. Overview of the NLPCC 2018 Shared Task: Grammatical Error Correction. In Proceedings of the Natural Language Processing and Chinese Computing, Foshan, China, 26–30 August 2018; Zhang, M., Ng, V., Zhao, D., Li, S., Zan, H., Eds.; Springer: Cham, Switzerland, 2018; pp. 439–445. [Google Scholar]
  26. Fang, T.; Liu, X.; Wong, D.F.; Zhan, R.; Ding, L.; Chao, L.S.; Tao, D.; Zhang, M. TransGEC: Improving Grammatical Error Correction with Translationese. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 October 2023; Rogers, A., Boyd-Graber, J., Okazaki, N., Eds.; Association for Computational Linguistics: Toronto, ON, Canada, 2023; pp. 3614–3633. [Google Scholar] [CrossRef]
  27. Zhao, Z.; Wang, H. MaskGEC: Improving Neural Grammatical Error Correction via Dynamic Masking. Proc. AAAI Conf. Artif. Intell. 2020, 34, 1226–1233. [Google Scholar] [CrossRef]
  28. Deng, Q.; Chen, S.; Ye, J. Chinese Grammatical Error Correction Based on Grammatical Knowledge Enhancement. Comput. Eng. 2023, 49, 77–84. [Google Scholar] [CrossRef]
  29. Omelianchuk, K.; Atrasevych, V.; Chernodub, A.N.; Skurzhanskyi, O. GECToR—Grammatical Error Correction: Tag, Not Rewrite. arXiv 2020, arXiv:2005.12592. [Google Scholar]
  30. Fang, T.; Yang, S.; Lan, K.; Wong, D.F.; Hu, J.; Chao, L.S.; Zhang, Y. Is ChatGPT a Highly Fluent Grammatical Error Correction System? A Comprehensive Evaluation. arXiv 2023, arXiv:2304.01746. [Google Scholar]
  31. Shao, Y.; Geng, Z.; Liu, Y.; Dai, J.; Yang, F.; Zhe, L.; Bao, H.; Qiu, X. CPT: A Pre-Trained Unbalanced Transformer for Both Chinese Language Understanding and Generation. arXiv 2021, arXiv:2109.05729. [Google Scholar] [CrossRef]
  32. Meyer, P.G.; Cherstvy, A.G.; Seckler, H.; Hering, R.; Blaum, N.; Jeltsch, F.; Metzler, R. Directedeness, correlations, and daily cycles in springbok motion: From data via stochastic models to movement prediction. Phys. Rev. Res. 2023, 5, 043129. [Google Scholar] [CrossRef]
Figure 1. Overall framework of the DACL method.
Figure 1. Overall framework of the DACL method.
Electronics 13 04079 g001
Figure 2. Difficulty assessment flowchart.
Figure 2. Difficulty assessment flowchart.
Electronics 13 04079 g002
Table 1. Datasets.
Table 1. Datasets.
TrainingLang8 + HSK1,389,672
TestingNLPCC 2018-test2000
Table 2. The impact of the hyperparameter k on the performance of the Chinese grammatical error correction model. Bold represents the best values.
Table 2. The impact of the hyperparameter k on the performance of the Chinese grammatical error correction model. Bold represents the best values.
The Value of kPRF0.5
Table 3. Experimental Results of Different Models. Bold represents the best values.
Table 3. Experimental Results of Different Models. Bold represents the best values.
Table 4. Ablation Study Results for Difficulty Assessment Methods. Bold represents the best values.
Table 4. Ablation Study Results for Difficulty Assessment Methods. Bold represents the best values.
Table 5. Impact of different improvements on model performance. Bold represents the best values.
Table 5. Impact of different improvements on model performance. Bold represents the best values.
MethodMuCGECNLPCC 2018
w/o CL43.8127.7239.2542.0631.6239.45
w/o DA43.6529.0539.6641.9333.0239.78
Table 6. Comparison of Error Correction Results for Sentence 1.
Table 6. Comparison of Error Correction Results for Sentence 1.
ModelError Correction Results
If, on the other hand, you break down the barriers of the
self, and your heart turns into a vast lake, that pinch of
salt will have no effect on its taste.
If, on the other hand, you break down the barriers of the
self, and your heart becomes a vast lake, that salt will have
no effect on its taste.
If, on the other hand, you break down the barriers of the
self, and your heart becomes a vast lake, then that pinch
of salt will have no effect on its taste.
Table 7. Comparison of error correction results for sentence 2.
Table 7. Comparison of error correction results for sentence 2.
ModelError Correction Results
Not only that, prospect education can also play a role in
solving the education problems in South Korea.
Not only that, pre education can also be a feasible way
to solve the education problem in South Korea.
Not only that, prospect education can also become a
feasible way to solve the education problems in South Korea.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Duan, R.; Ma, Z.; Zhang, Y.; Ding, Z.; Liu, X. Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction. Electronics 2024, 13, 4079.

AMA Style

Duan R, Ma Z, Zhang Y, Ding Z, Liu X. Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction. Electronics. 2024; 13(20):4079.

Chicago/Turabian Style

Duan, Ruixue, Zhiyuan Ma, Yangsen Zhang, Zhigang Ding, and Xiulei Liu. 2024. "Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction" Electronics 13, no. 20: 4079.

APA Style

Duan, R., Ma, Z., Zhang, Y., Ding, Z., & Liu, X. (2024). Dynamic Assessment-Based Curriculum Learning Method for Chinese Grammatical Error Correction. Electronics, 13(20), 4079.

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop