Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data

Appl. Sci. 2023, 13(5), 3334; https://doi.org/10.3390/app13053334

by Tingkai Hu¹

, Zuqin Chen^2,*, Jike Ge¹, Zhaoxu Yang¹ and Jichao Xu¹

Reviewer 1: Anonymous

Reviewer 2:

Seungwoo Lee

Reviewer 3:

Baoshan Sun

Appl. Sci. 2023, 13(5), 3334; https://doi.org/10.3390/app13053334

Submission received: 5 February 2023 / Revised: 25 February 2023 / Accepted: 2 March 2023 / Published: 6 March 2023

(This article belongs to the Special Issue Natural Language Processing: Recent Development and Applications)

Round 1

Reviewer 1 Report

The paper proposes a method for Chinese few-shot text classification called CIPLUD. The method combines a prompt learning method with multiple-masks and a One-Class Support Vector Machine-based Unlabeled Data Leveraging (OCSVM-UDL) Module. Experimental results on the FewCLUE datasets show that the proposed method outperforms methods such as PET (with unlabeled data) and P-tuning and EFL (without unlabeled data).

Weaknesses:

1. Some key details of the proposed method are lacked. For example,

1) In the MMOPL module, how is the prompt template generated, and what is the template like?

2) In the OCSVM-UDL module, how is the one-class SVM designed and trained? What is the difference between the so-called one-class SVM and SVM? What are the spherical constraint boundaries and how are they generated? At least there should be a reference about all this.

3) For the text classification task, how are the text classes determined using the prediction of the masks?

4) Figure 1 and 2 are given without enough explanation and it is difficult to understand the details of the figures. For example, what do U1, …, Um mean? Are they the same as h0, …, hi in Line 222?

Without these details, it is difficult to judge the novelty and soundness of the method.

2. The comparison to other methods seems unfair in some way.

Since the P-tuning and EFL are tested without using the unlabeled data, it seems unfair for the two methods. Can the two methods be tested with unlabeled data (perhaps with a similar OCSVM-UDL module)? Or are there results of the proposed method that only performs supervised few-shot training?

3. There are many grammar errors and typos in the paper, so the language need to be polished. For example,

1) Line 42: seriously -> serious.

2) Line 46: A period is missed after ‘adopted’.

3) Line 68: There should be comma before ‘Recently’ and ‘recently’ should not be capitalized.

4) Caption of 2.1 is not correct.

5) Line 286: ‘We from.’

6) Line 434: ‘has been effectiveness’.

Author Response

We would like to express our gratitude for your valuable feedback and constructive suggestions on our manuscript.

Point 1: Some key details of the proposed method are lacked. For example,

1) In the MMOPL module, how is the prompt template generated, and what is the template like?

3) For the text classification task, how are the text classes determined using the prediction of the masks?

Without these details, it is difficult to judge the novelty and soundness of the method.

Response 1: Thank you for your insightful comments and suggestions. Regarding your concerns about the lack of key details in our proposed method, we apologize for any confusion caused by the missing information. We have carefully considered your comments and have made the necessary revisions to our manuscript. Specifically, we have addressed your concerns regarding the lack of key details in our proposed method, such as the prompt template generation process, one-class SVM design and training, and text class determination using mask predictions. To provide more clarity, we have included additional information and references in the modified Section 3.

Regarding your specific questions, we have addressed them as follows:

1）how is the prompt template generated ? In prompt learning, there are two primary engineering steps, namely prompt engineering and answer engineering. Prompt engineering involves designing a prompt function, F_prompt(X), that elicits the most optimal performance in downstream tasks. In the past, discrete prompts that are created through a painstaking manual process have been used. However, this approach is both time-consuming and requires considerable expertise, and even experienced prompt designers may not be able to create the best prompt [35]. To overcome this limitation, we propose the use of continuous prompts that are automatically learned by the model. These prompts are tensors designed to enable the effective execution of tasks by the language model, and they do not necessarily have to be limited to human-understandable natural language. Continuous prompts are advantageous because they eliminate the need for extensive time and effort spent on manual search and adjustment of discrete prompts [36]. To further enhance the effectiveness of prompt learning for Chinese text classification tasks, we have introduced the use of multiple masks in the MMOPL module. This approach involves creating a set of learnable tensors, [U1, …, Um], "m" refers to the number of learnable tensors, for each original text X as continuous prompts. We then concatenate the mask token, which corresponds to the label that requires classification. By formalizing this into a prompt template, [U1, …, Um, E(“[MASK]”)], we enable the creation of a prompt that is effective for the specific task at hand. To achieve optimal results in prompt learning, we have improved the mask token and answer engineering. Specifically, we have developed an adaptive method for expanding the mask token to the required number when generating the input sequence. To achieve this, we first calculate the maximum length value, n, of the label text in the label set. This value is then used as the number of MASK tokens. For labels with lengths that are less than n, we use the [PAD] placeholder to fill in the gaps. By concatenating the continuous prompts with multiple masks and the original sentence E(x), we create an embedded input sequence, = [U₁, …, U_i, E(“[MASK]1”) ... E(“[MASK]n”), E(X)]. Here, E(“[MASK]₁”) ... E(“[MASK]_n”) are mask tokens that are adaptively expanded according to the label, and E(X) is the original text.

2）We appreciate the reviewer's comments and have added more details to Section 3.2 regarding the OCSVM-UDL module, including the use of the one-class SVM algorithm and the spherical constraint boundaries. We have also provided a reference [40] to a related study that highlights the effectiveness of the spherical constraint boundaries in text classification. Specifically, we explain how the OCSVM algorithm constructs multiple spherical constraint boundaries for different classes and uses these boundaries to filter out anomalous unlabeled data and assign appropriate pseudo-labels. We have added a detailed description of the OCSVM algorithm and the optimization problem that it solves. We have also explained how we adjust the hyperparameter to control the compactness of the constraint boundaries, which is crucial in the process of assigning suitable pseudo-labels to unlabeled data. Finally, we have included an algorithm that describes the iterative training process and explains how the new pseudo-labeled data is mixed with the original labeled data to train the MMOPL text classification model. We believe that these additions will help the reader better understand the OCSVM-UDL module and its effectiveness in few-shot text classification tasks.

3）For text classification, the text classes are determined using the prediction of the masks. Specifically, we calculate the joint probability of the labels corresponding to the mask positions. Length constraint processing is performed on all joint probability labels, and an activation function is used to normalize the probability. We then select the label with the highest joint probability value as the final result Y.

4）We have revised our paper to provide more detailed explanations of Figures 1 and 2. The U1, ..., Um represent the continuous prompt tensors. To clarify, we have modified h0, ..., hi with U1, ..., Um to accurately reflect the content in figure 1.

We hope that these additional explanations are help address the concerns raised by the reviewer. Thank you for your guidance and assistance in improving the quality of our manuscript.

[35] Jiang Z.; Xu, F.F.; Araki J.; et al. How can we know what language models know?. TACL. 2020, 8: 423-438.

[36] Lester B.; Al-Rfou R.; Constant N. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Online and Punta Cana, Dominican Republic, November 2021, 2021: 3045-3059.

[40] Fei G.; Liu B. Breaking the Closed World Assumption in Text Classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego, California, June 2016, pp. 506–514.

Point 2: The comparison to other methods seems unfair in some way. Since the P-tuning and EFL are tested without using the unlabeled data, it seems unfair for the two methods. Can the two methods be tested with unlabeled data (perhaps with a similar OCSVM-UDL module)? Or are there results of the proposed method that only performs supervised few-shot training?

Response 2: Thank you for your comments and suggestions. We appreciate your concern regarding the comparison of the P-tuning and EFL methods to our proposed method in the study. We understand your point that testing P-tuning and EFL without using the unlabeled data might seem unfair. We would like to clarify that the purpose of our proposed method was to compare the difference between using and not using unlabeled data for prompt learning. However, we do acknowledge the unfairness in the comparison set, and to address your concern, we have added the testing scores of the MMOPL module, which is a prompt learning module that does not utilize unlabeled data. Additionally, we have added an explanation in the discussion section to clarify the difference in the use of unlabeled data between PET, P-tuning, EFL, and our proposed method. We hope that this response addresses your concerns and clarifies the focus of our method and experimental comparison. The modified Table 4 is as follows:

Table 4. The comparison results between our proposed model CIPLUD and baselines. Human row refers to the performance of human annotators on the tasks, and the underlined values indicate the second-best results achieved among the baseline methods. we will bold-mark the highest score in each column.

Method	EPRSTMT (Acc.%)	CSLDCP (Acc. %)	TNEWS (Acc. %)	IFLYTEK (Acc. %)	Avg (Acc. %)
Human	90.0	68.0	71.0	66.0	73.6
Fine-Tuning	66.5	57.0	51.6	42.1	54.3
PET	84.0	59.9	56.4	50.3	62.7
P-tuning	80.6	56.6	55.9	52.6	61.4
EFL	76.7	47.9	56.3	52.1	58.3
MMOPL	82.1	59.8	56.4	52.2	62.6
CIPLUD	85.4	60.4	57.2	52.8	64.0

Point 3: There are many grammar errors and typos in the paper, so the language need to be polished. For example,

1) Line 42: seriously -> serious.

2) Line 46: A period is missed after ‘adopted’.

3) Line 68: There should be comma before ‘Recently’ and ‘recently’ should not be capitalized.

4) Caption of 2.1 is not correct.

5) Line 286: ‘We from.’

6) Line 434: ‘has been effectiveness’.

Response 3: Thank you for your comments on the grammar and typos in the manuscript. we revised the whole manuscript carefully to avoid language errors. In addition, we consulted a professional editing service and asked several colleagues who are native English speakers to check the English. We believe that the language is now acceptable for the review process. Once again, we appreciate your feedback, which is critical in helping us improve the quality and impact of our work.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors suggest how to improve prompt learning and how to utilize unlabeled data together with prompt learning for Chinese few-shot text classification. The proposed methods are properly explained and proved to give performance improvement with appropriate experiments. There are only some lack of explanation, which should be clarified for better understanding of readers:

In section 3.1 and Fig 1, explain about U1 ... Um in Prompt part in Fig 1, how they are generated, how they are different from hand-craft ones used in Experiments, and compare to the existing prompt learning works applied for Chinese. Is the order of the prompt and input text important in prompt learning and why?

In Algorithm 1, M1 would be better understandable when explained as text classification model or MMOPL model, not a pre-trined LM.

In Table 4, explain what is the row 'Human' and how the score was obtained. What is the meaning of the underline? The underlined value is not the highest one. Instead, mark the highest value in each column.

In line 355-356, the explanation seems to be incorrect, according to the Table 4. For the task CSLDCP, some prompt learning methods (P-tuning, EFL) shows worse score than the fine-tuning method's score. Correct the explanation and explain the reason.

There are many typos, missing words, or incomplete sentences:

line 46: add a period after 'adopted'

line 77: add 'data' after 'unlabeled'

line 127: modify subsection title into 'Few-shot text classification'

line 199: modify '3.2' into '3.1'

line 201, modify '3.3' into '3.2'

line 227, capitalize the first character of the sentence: 'the' --> 'The'

line 305, add 'to' before 'predict'

line 316, modify '4.1' into '4.2'

line 326, remove the bullet.

line 332, modify the table caption: 'The difference of the baseline methods'

line 339, add comma after '1.8.0'

line 375-376, in-complete sentence; there is no subject and verb.

line 405, remove 'in order to study the effect' due to duplication.

Author Response

We would like to express our gratitude for your valuable feedback and constructive suggestions on our manuscript.

Point 1: In section 3.1 and Fig 1, explain about U1 ... Um in Prompt part in Fig 1, how they are generated, how they are different from hand-craft ones used in Experiments, and compare to the existing prompt learning works applied for Chinese. Is the order of the prompt and input text important in prompt learning and why?

Response 1: Thank you for your valuable feedback on our manuscript. We greatly appreciate your time and effort in reviewing our work. We have made the necessary changes to the manuscript based on your suggestions. In particular, we have updated section 3.1 and Fig 1 to provide a more detailed explanation of the multiple masks optimization-based prompt learning module (MMOPL). In section 3.1 of our manuscript, we have now provided a more detailed explanation of the generation of continuous prompts [U1, …, Um], which are automatically learned by the model and are different from hand-crafted prompts used in previous works. We have compared our approach to existing prompt learning works applied for Chinese in section 4.2 of our manuscript, specifically in Table 2.

Table 2. The difference between the baseline methods

Method	Prompt Designing		Prompt Style	Use Unlabeled
Method	Templates	Mask Number	Prompt Style	Use Unlabeled
Fine-tuning	—	—	—	No
PET	Hand-craft	Single	Discrete	YES
P-tuning	Auto	Single	Continuous	No
EFL	Hand-craft	Single	Discrete	No
Ours	Auto	Multiple	Continuous	YES

In addition, we have emphasized the importance of the prompt and input text order in prompt learning. Specifically, we have described how the placement of the prompt template is contingent on the choice of pre-trained model. In our case, we have chosen to use ERNIE the pre-trained model that have a maximum input length of 512 characters. To ensure that the prompt is not truncated, we have opted to employ the prefix concatenation method. We hope that these revisions have addressed your concerns and have provided a more comprehensive explanation of our proposed approach. Thank you for bringing these issues to our attention, and please let us know if you have any further questions or concerns.

Point 2: In Algorithm 1, M1 would be better understandable when explained as text classification model or MMOPL model, not a pre-trined LM.

Response 2: Thank you for your comments on Algorithm 1. We agree that it would be better to explain M1 as a text classification model or MMOPL model instead of a pre-trained LM. We will update the manuscript to clarify the nature of M1 and improve its understandability. The modified Algorithm 1 is as follows:

Algorithm 1 The iterative training process of the OCSVM-UDL

Input: Training set D, validation set D’, Unlabeled set U, Mixed Training set F, MMOPL model M₁, OCSVM model M₂.

1 Initialize F = D // Mixed Training set equal Training set

2 repeat

3 repeat

4: Load a batch size of instances B belong F and add a prompt template

5: Generate input embedding vector using the M1 for each instance in B

6: Update parameter by minimizing

7: Save the best model M₁^’ according to the average performance at all labels on D’

8: until no more batches

9: Load a batch size of instances B belong F and add a prompt template

10: Generate input embedding vector using the M₁^’ for each instance in B

11: Generate a constrained boundary for each label using the M2.

12: Filtering a batch size of instances u belong U and get pseudo-label data P

13: Update mixed training set F = D + P and duplicate removal

14: until M1 convergence

Point 3: In Table 4, explain what is the row 'Human' and how the score was obtained. What is the meaning of the underline? The underlined value is not the highest one. Instead, mark the highest value in each column.

Response 3: Thank you for your comments on Table 4. We apologize for the confusion regarding the row labeled 'Human' and the underlined values. We will update the manuscript to explain that the 'Human' row refers to the performance of human annotators on the tasks, and the underlined values indicate the second-best results achieved among the baseline methods. Additionally, we will bold-mark the highest value in each column to make the table easier to read. The modified Table 4 is as follows:

Method	EPRSTMT (Acc.%)	CSLDCP (Acc. %)	TNEWS (Acc. %)	IFLYTEK (Acc. %)	Avg (Acc. %)
Human	90.0	68.0	71.0	66.0	73.6
Fine-Tuning	66.5	57.0	51.6	42.1	54.3
PET	84.0	59.9	56.4	50.3	62.7
P-tuning	80.6	56.6	55.9	52.6	61.4
EFL	76.7	47.9	56.3	52.1	58.3
MMOPL	82.1	59.8	56.4	52.2	62.6
CIPLUD	85.4	60.4	57.2	52.8	64.0

Point 4: In line 355-356, the explanation seems to be incorrect, according to the Table 4. For the task CSLDCP, some prompt learning methods (P-tuning, EFL) shows worse score than the fine-tuning method's score. Correct the explanation and explain the reason.

Response 4: We would like to express our gratitude for your valuable feedback and constructive suggestions on our manuscript. Your careful examination of our work and helpful comments have greatly contributed to improving the clarity and accuracy of our study. We have carefully considered your feedback and have made the necessary revisions to address the issues you raised. In response to your fourth point, we acknowledge that our original statement about the performance of prompt learning methods was too broad. Upon re-examination, we found that, as shown in Table 4, the majority of prompt learning methods demonstrate an improved level of performance when compared to PLM fine-tuning methods. However, we also acknowledge your observation that there is an important exception in the case of the CSLDCP dataset, where P-tuning and EFL methods are comparatively weaker than fine-tuning approaches. We would like to add that the underlying reasons for this discrepancy are rooted in the characteristics of the CSLDCP data set. Specifically, this dataset contains subject-specific labels that are longer than those of other datasets, making it more difficult to map them to shorter sequences, answer eng. The answer engineering gets more errors in the prediction label. Thus, P-tuning and EFL methods are comparatively weaker than fine-tuning approaches. We have incorporated this additional information in the revised manuscript to provide a more accurate and detailed explanation for the performance differences observed in our study.

Point 5: There are many typos, missing words, or incomplete sentences:

line 46: add a period after 'adopted'

line 77: add 'data' after 'unlabeled'

line 127: modify subsection title into 'Few-shot text classification'

line 199: modify '3.2' into '3.1'

line 201, modify '3.3' into '3.2'

line 227, capitalize the first character of the sentence: 'the' --> 'The'

line 305, add 'to' before 'predict'

line 316, modify '4.1' into '4.2'

line 326, remove the bullet.

line 332, modify the table caption: 'The difference of the baseline methods'

line 339, add comma after '1.8.0'

line 375-376, in-complete sentence; there is no subject and verb.

line 405, remove 'in order to study the effect' due to duplication.

Response 5: Thank you for your comments on the grammar and typos in the manuscript. we revised the whole manuscript carefully to avoid language errors. In addition, we consulted a professional editing service and asked several colleagues who are native English speakers to check the English. We believe that the language is now acceptable for the review process.

Once again, we appreciate your feedback, which is critical in helping us improve the quality and impact of our work.

Author Response File: Author Response.docx

Reviewer 3 Report

1. Please note that each paragraph indentation, such as line 325;

2. The punctuation has to be regulated, such as line 107, line 318, etc;

3. Whether the use of numbering is uniform, such as line 131;

4. Please note that the first letter case, such as line 239 words:"if";

5. Please note that statements whether smooth, such as line 286;

6. Please pay attention to the unity of the primary and secondary title format, such as whether you need to first letters capitalized words;

7. The defects of this model? How to improve?

Author Response

We would like to express our gratitude for your valuable feedback and constructive suggestions on our manuscript.

Point 1:

Please note that each paragraph indentation, such as line 325;

Response 1: Thank you for your comments on the manuscript. We have taken note of your feedback regarding paragraph indentation and have made sure that all paragraphs are properly indented for better readability.

Point 2: The punctuation has to be regulated, such as line 107, line 318, etc;

Response 2: We appreciate your comments on the punctuation in the manuscript. We have reviewed the manuscript and made the necessary changes to ensure that the punctuation is regulated and consistent throughout the paper.

Point 3: Whether the use of numbering is uniform, such as line 131;

Response 3: Thank you for bringing to our attention the issue of numbering consistency in the manuscript. We have reviewed the document and made necessary corrections to ensure uniformity in numbering.

Point 4: Please note that the first letter case, such as line 239 words: "if";

Response 4: We appreciate your feedback on the use of the first letter case. We have reviewed the manuscript and made necessary corrections, such as in line 239 where "if" has been capitalized.

Point 5: Please note that statements whether smooth, such as line 286;

Response 5: Thank you for your feedback on the statement's smoothness in the manuscript. We have reviewed the document and made necessary revisions to ensure that the statements flow smoothly and are easy to understand, such as in line 286.

Point 6: Please pay attention to the unity of the primary and secondary title format, such as whether you need to first letters capitalized words;

Response 6: We thank you for your comments regarding the unity of the primary and secondary title format. We have reviewed the manuscript and made necessary corrections to ensure that the title format is consistent, including capitalizing the first letters of necessary words.

Point 7: The defects of this model? How to improve?

Response 7: Thank you for bringing up the question regarding the defects of our proposed model. We acknowledge that there is no perfect model, and our proposed model also has its limitations. One potential limitation is that the effectiveness of the OCSVM algorithm in generating pseudo-labels and filtering out noisy data may be affected by the selection of hyperparameters. In addition, as we noted in the conclusions and future work section 6 of the manuscript, we have already identified that the candidate pseudo-labels obtained through semi-supervised training may be affected by imbalanced label categories when the number of label categories in a task increases. This can potentially degrade the performance of the model. To address these limitations, we plan to conduct further experiments and investigate alternative approaches for generating and utilizing pseudo-labels and explore the potential of incorporating external knowledge and resources to enhance the performance of the model. Furthermore, we plan to investigate the potential of combining prompt learning with semi-supervised learning to further improve the performance of our model.

Lastly, we revised the whole manuscript carefully to avoid language errors. In addition, we consulted a professional editing service and asked several colleagues who are native English speakers to check the English. We believe that the language is now acceptable for the review process. Once again, we thank you for your valuable comments and feedback, and we look forward to implementing these improvements in our future research.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors addressed most of the conerns of the reviewer in the response and the manuscript was also revised.

Article Menu

A Chinese Few-Shot Text Classification Method Utilizing Improved Prompt Learning and Unlabeled Data

Further Information

Guidelines

MDPI Initiatives

Follow MDPI