Next Article in Journal
RPC-EAU: Radar Plot Classification Algorithm Based on Evidence Adaptive Updating
Previous Article in Journal
Aerodynamic Analysis of Hovering Flapping Wing Using Multi-Plane Method and Quasi-Steady Blade Element Theory
 
 
Article
Peer-Review Record

RQ-OSPTrans: A Semantic Classification Method Based on Transformer That Combines Overall Semantic Perception and “Repeated Questioning” Learning Mechanism

Appl. Sci. 2024, 14(10), 4259; https://doi.org/10.3390/app14104259
by Yuanjun Tan 1, Quanling Liu 1, Tingting Liu 2, Hai Liu 1,3,4, Shengming Wang 1 and Zengzhao Chen 1,3,*
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4:
Appl. Sci. 2024, 14(10), 4259; https://doi.org/10.3390/app14104259
Submission received: 13 April 2024 / Revised: 5 May 2024 / Accepted: 15 May 2024 / Published: 17 May 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study under description seems to concentrate on improving pre-trained language models' capabilities, especially with regard to long-text subject detection. The repeat question module and the overall semantic perception module are two parallel learning modules that are used in the RQ-OSPTrans method that is introduced. The comprehensive treatment of problems like control and label ambiguity, along with the extensive validation on several datasets, including small-scale domain-specific ones, are what make this technique distinctive. This suggests a thorough assessment of the robustness and applicability of the method.

Analyses using quantitative methodologies and a large number of figures and tables provide evidence for the findings.

The reference section is quite substantial and the cited resources are modern and appropriate to the topic.

Overall, by providing an improved methodology for using trained language models in subject identification tasks, the study advances the discipline.

Comments on the Quality of English Language

Some grammar and stylistic errors are present, and the article needs to be proofread and corrected in this regard. 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Summary 

The paper presents a deep learning model based on a combination of BERT and MLP, which is designed to classify long texts into thematic categories such as research topics and categories of newspapers. The presented method, labelled RQ-OSPTrans, consists of two parallel trained modules that aim to capture the overall semantics of long texts. In several experiments, RQ-OSPTrans has been shown to outperform existing state-of-the-art models.  

Overall impression 

The methodology and the results are mainly comprehensible, but the introduction, in particular, requires revision to ensure that the actual aim of the work is clearly articulated. Whether the term "topic identification" accurately describes the addressed task should be considered, as it is often associated with unsupervised learning. Instead, this work classifies texts into research topics or thematic focal points in newspapers. Furthermore, "overall semantics" requires a definition that is essential for understanding the paper.  

With regard to the methodology, consideration should be given to whether a k-fold cross-validation should be carried out, as this would be more meaningful than testing based on a simple split. A significance test should also be carried out to ascertain whether the results of the model are significantly better than the baseline models. 

Abstract 

The summary clearly describes the methodology employed and the results obtained. The objective is also clearly stated. However, "Topic Identification" should be replaced (see Overall Impression). Furthermore, consideration should be given to whether the difficulty described, namely that the label range cannot be controlled after interactive prompting, should be included in the abstract, as this is hardly thematised in the rest of the paper. Instead, the challenges addressed should be emphasised, namely the classification of very long texts, including those containing colloquial expressions. 

Introduction 

The introduction should commence with a more precise delineation of the specific objective being addressed, namely the classification of long texts into thematic categories. At present, the impression is given that any text classification task should be mastered, but the experiments are geared explicitly towards thematic categorisation.

Furthermore, it is also necessary to indicate which challenges were specifically addressed. It would be beneficial to provide a clearer structure for subsection 1.1. The initial sentence appears to convey the impression that the challenge is the classification of lengthy texts. However, this is followed by a discussion of the problems and solutions in multi-label classification and the imbalance of data sets, which can occur not only in the classification of long texts.  

Moreover, Figure 1 would benefit from additional clarity regarding the relationship between the tiny images and the challenges they represent. One potential solution to this issue is to insert explanatory text fields above the smaller images, with the text "Challenge 1" and so forth. 

Additionally, it is somewhat confusing that the challenges of educational assessment are enumerated at the end of section 1.1, given that the paper focuses on thematic classification. If the methodology is particularly advantageous for educational assessment, this should be emphasised more clearly. Furthermore, the utilisation of speech recognition systems is not addressed in the remainder of the paper. Thus, consideration should be given to not listing this challenge. The challenge of noise and colloquial expressions is present in the experiments, not in the educational domain, but in the texts of the THUCNews dataset, i.e. texts from media outlets on the Internet. It is essential to clarify what the challenges were, what caused them, and in which texts they are prevalent.

Furthermore, the previous work on dealing with long texts and the deep learning models listed in Section 1.1 should be moved to the Related Work section.

Finally, defining the "overall semantics of texts" and differentiating it from word-level and sentence-level semantics is necessary. Does this mean the entire text is used as input, and if so, for what purpose?  

Related Work 

The section should include the works now presented in the introduction. In addition, a short introductory sentence between 2 and 2.1 would be beneficial, explaining why the work presented in the following section is relevant to the author's work. Although existing gaps in the literature and problems are pointed out, the extent to which the author's work addresses these could be made more explicit. In order to achieve this, a conclusion should be drawn from the literature at the end of the related work section, in which the central problems of the previous approaches are also emphasised once again. 

Proposed Method 

The methods were described in detail. However, comprehension would be enhanced by integrating Figure 2 more into the text, for example, by numbering the individual components and referring to these numbers in the body of the text.  

Furthermore, the extent to which the global feature state pooler captures the overall semantics of the text could be explained in more detail. Additionally, the methodology's utility for the classification of lengthy texts and its ability to overcome the challenges previously identified should be more explicitly highlighted.  

To this end, explaining the basic principles and existing models, particularly BERT, can be shortened. In the case of BERT, it is also essential for the reader to see more clearly what adjustments the authors have made to the existing model. 

Furthermore, repeat question learning should be briefly explained, as this is not a common term. Moreover, instead of the terms "LayerNorm" and "BatchNorm," which refer to Pytorch modules, the terms "Layer Normalization" and "Batch Normalisation" should be employed. 

Experiment 

Datasets (4.1): Additional information would help us gain a more comprehensive understanding of the datasets. For example, details of the class distribution should be provided for the AG's News data set.

Concerning the self-created CIPCC dataset, information on the language and accessibility of the dataset should be added. Furthermore, it would be advantageous to provide a detailed description of the subset of CIPCC data used, including the number of categories and a description of the class distribution. Table 1 contains examples from the CIPCC dataset; therefore, it would be more appropriate to change the title of the figure to "Information on exemplary nurturing elements from the CIPCC dataset". 

Furthermore, given that the study's objective is to classify lengthy texts, it would be advantageous to include the average word count of the documents for each dataset.

Hyperparameter Settings (4.2): It is recommended that a link to the BERT model used be provided. Furthermore, whether different pre-trained BERT models were employed for English and Chinese or whether the multilingual BERT was utilised should be clarified.

Evaluation Method (4.3): Formulas (25) and (26) are identical, and formula (26) should be revised accordingly. Defining accuracy, precision, and recall is unnecessary, as they can be assumed to be known.  

Comparative Methods (4.4): The baseline models were described in a comprehensible manner. The reasons behind selecting these models for comparison should be provided at the beginning of the subsection. Furthermore, it is recommended that a brief description of whether the corresponding models have undergone any form of fine-tuning or retraining on the dataset be included.  

Experimental Results and Analysis (4.5): The experimental results were well discussed. However, using a k-fold cross-validation instead of a simple random sample to obtain more meaningful results would be advisable. Furthermore, a significance test could be conducted to support the authors' hypothesis that their new model significantly outperforms other state-of-the-art models. Finally, the column in Tables 2-5 could be omitted, and the number could be given in the text or the caption of the figure instead, as it is identical for all models.  

Confusion Matrix Analysis (4.6): The confusion matrices are somewhat difficult to interpret due to their small font size. Additionally, it is not readily apparent that they represent different models and data sets. Consequently, it is advised to display only the confusion matrix of a model and the baseline and limit the confusion matrixes to one data set.  

Ablation Study (4.7): The results presented in the text are clearly and accurately described. However, the illustrations should be revised. As most noticeable in Figure 7, three-dimensional diagrams should be avoided as this distorts the result. It is also unclear why the F1 score on the x-axis is compared with the improved accuracy on the y-axis. In addition, 0% should be selected as the starting point for the y-axis in Figure 5 b), as otherwise, no accurate comparison with Figure 5 a) is possible. In general, consideration should be given to depicting the actual accuracy (not just the improvement).  

Conclusions 

The conclusion presents a comprehensive overview of the essential findings and interesting ideas for future research. However, it should not be assumed that there has been a significant improvement compared to state-of-the-art baselines, as the significance of this improvement has not been tested.  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents a proposal for a text classification method, RQ-OSPTrans, based on the residual connected Transformer encoder and parallel networks to address the challenge of accurately identifying topics in long texts containing colloquial expressions and noise. The presented experimental results demonstrate that the method achieves a better performance than the best methods in the literature, in topic recognition for texts longer than or equal to 256 tokens. However, there are some points to be enhanced.

  1. The experiment definition (section 4) lacks a dependent variables definition; it is unclear what was measured to compare the different methods. 
  2. In the results section, the authors claim that the obtained results were significantly better than the compared methods. However, there are no statistical tests or any comparison parameters to ensure that the difference is significant.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

* Introduction

The authors have taken a slightly different approach to the usual journal submission format, which means there isn't a clear and detailed problem statement in the introduction that explicitly describes the specific problem or gap that this research seeks to address in the field of text categorisation. This makes it a little tricky for us to understand the significance of the research and how it advances the current state of knowledge.

As a reviewer, I would usually point out that the author might want to consider the connection between the background and the aims of the study. It seems like there might be a potential disconnect between the background of the study and the aims or objectives of the study, which makes it a little tricky to make the logical transition to the subsequent challenges, observations, and contributions of the study. Ultimately, the author creates a bit of a muddle in the introduction, which makes it tricky for the reader to see how the background information leads to the author's specific challenges and observations. In particular, the author mentions challenges in text classification in the introduction, but doesn't explain why these challenges are relevant to the current study or how they specifically impact the field.

I think it's really important that in the introduction, the authors make it clear what the problem or gap is that this study is trying to address. This helps the reader understand why this study is important and how it addresses the research questions. It also helps to link the research objectives or questions to the study.

 

* "Related work" vs. "Literature review“

As a reviewer, I would suggest revising the language used by the authors in this study to discuss a literature review, which generally focuses on directly relevant studies and directly influences the research being presented, rather than the language used by the authors about related work. I would suggest revising the title to literature review because, naturally, during a literature review, the author is providing a comprehensive overview of all relevant literature, including its theoretical and methodological underpinnings, to contextualise the broader body of scholarly work by covering a wider scope. Furthermore, I would like to see the contribution and distinction of the study to the field of text categorisation in terms of clarity, relevance and academic rigour, and to include an implications section at the end.

 

* Methodology

Firstly, the authors struggle to understand why Figure 2 appears so early in the methodology of this study without sufficient textual justification, preventing the reader from grasping its relevance and detail. I therefore suggest that the placement of the figure be revised so that it is integrated with the text in the main text, followed by a description of the figure in the text. Ideally, the text should describe each component sequentially as it appears in the figure.

Second, the authors should structure the description of the function of each component and its integration with the overall model in sufficient detail to include how inputs are processed in this study, how outputs are generated, and how this contributes to achieving the goals of the model. In particular, by dividing the methodology of this study into clearly labelled subsections dedicated to specific components of the model (e.g., learning BERT word embeddings, learning repetition questions, overall semantic recognition), the structure should guide the reader through the architecture of the model in a logical way to understand why this study is logically consistent and plausible.

Third, in terms of simplifying technical terms and refining the description, it is necessary to make the novel contributions of the RQ-OSPTrans model easier to understand and to ensure that the methodology section effectively communicates the innovation and utility of the study. For example, it is important to explain how the iterative question learning module improves feature integrity and how the full semantic recognition module contributes to the final classification accuracy, avoiding unnecessary jargon to make the presentation accessible to the intended audience.

 

* Experiment

Firstly, in order to provide clarity and justification of the choice of datasets in this study, the authors do not provide sufficient rationale as to why the datasets used (THUCNews, AG's News, arXiv-10, CIPCC) are suitable for testing the proposed model, especially given their diversity in content and structure. Therefore, the authors would like to emphasise that linking the characteristics of the datasets to the specific challenges or competencies that the RQ-OSPTrans model aims to address can help validate the choice of datasets as an effective tool to demonstrate the effectiveness of the model, based on strengthening the rationale for the choice of datasets. I also believe that it would be very important to include a more detailed discussion of the comparative results, focusing on specific features or architectural decisions of RQ-OSPTrans that contribute to its performance relative to other models, and conversely, to address cases where RQ-OSPTrans performed poorly, providing the authors' insights into potential limitations or areas for improvement.

Secondly, regarding the impact of hyperparameter settings, the authors note that while there is some mention of hyperparameter tuning in the study, the impact of these settings on the overall performance is not analysed in depth; therefore, i believe it would be very important for the authors to provide a subsection dedicated to hyperparameter tuning to identify how different settings affect the performance of the model, which could necessarily include a sensitivity analysis, which would greatly enhance the depth of the document and present the authors' insights into the robustness of the model.

Third, while the RQ-OSPTrans model provides the key to more rigorously evaluating the model's performance in different contexts or dataset splits, either through cross-validation results or by using the RQ-OSPTrans model, the model is still tested on datasets with varying levels of class imbalance and noise, but there is limited discussion of how the model handles these issues; therefore, please include a more detailed analysis of the model's performance in the context of unbalanced data. I would like to see authors discuss specific techniques or model tuning used to manage class imbalance and noise, and consider providing performance metrics that specifically reflect improvements in these areas.

 

* Conclusion.

It's really important to be upfront about the limitations of this study. For instance, authors need to be aware that they using large pre-trained models like BERT, which might make things more complex or limit their use in certain situations. i should also think about how scalable and computationally demanding the model is. For instance, i think it would be great if the authors could discuss potential methodologies for incorporating multimodal capabilities, or discuss in more depth specific types of domain data sets that could benefit from the proposed improvements. This would help future research to be more specific and actionable.

I would also like to see the conclusions of this study address the issue of highlighting the performance improvements of the proposed model. It would be great if authors could contextualise these improvements within the wider field, so i can have a discussion about how advances in RQ-OSPTrans contribute to the wider field of natural language processing or text classification. It would also be really helpful if authors could compare this approach to or contribute to ongoing advances in dealing with noisy or colloquial data in text classification. I would like to see the presentation include a discussion of potential applications in industries such as social media analytics, sentiment analysis, or automated content moderation. It would also be great to hear how the model’s ability to deal with long texts containing noisy and colloquial expressions can benefit these areas.

 

Comments on the Quality of English Language

* Introduction

The authors have taken a slightly different approach to the usual journal submission format, which means there isn't a clear and detailed problem statement in the introduction that explicitly describes the specific problem or gap that this research seeks to address in the field of text categorisation. This makes it a little tricky for us to understand the significance of the research and how it advances the current state of knowledge.

As a reviewer, I would usually point out that the author might want to consider the connection between the background and the aims of the study. It seems like there might be a potential disconnect between the background of the study and the aims or objectives of the study, which makes it a little tricky to make the logical transition to the subsequent challenges, observations, and contributions of the study. Ultimately, the author creates a bit of a muddle in the introduction, which makes it tricky for the reader to see how the background information leads to the author's specific challenges and observations. In particular, the author mentions challenges in text classification in the introduction, but doesn't explain why these challenges are relevant to the current study or how they specifically impact the field.

I think it's really important that in the introduction, the authors make it clear what the problem or gap is that this study is trying to address. This helps the reader understand why this study is important and how it addresses the research questions. It also helps to link the research objectives or questions to the study.

 

* "Related work" vs. "Literature review“

As a reviewer, I would suggest revising the language used by the authors in this study to discuss a literature review, which generally focuses on directly relevant studies and directly influences the research being presented, rather than the language used by the authors about related work. I would suggest revising the title to literature review because, naturally, during a literature review, the author is providing a comprehensive overview of all relevant literature, including its theoretical and methodological underpinnings, to contextualise the broader body of scholarly work by covering a wider scope. Furthermore, I would like to see the contribution and distinction of the study to the field of text categorisation in terms of clarity, relevance and academic rigour, and to include an implications section at the end.

 

* Methodology

Firstly, the authors struggle to understand why Figure 2 appears so early in the methodology of this study without sufficient textual justification, preventing the reader from grasping its relevance and detail. I therefore suggest that the placement of the figure be revised so that it is integrated with the text in the main text, followed by a description of the figure in the text. Ideally, the text should describe each component sequentially as it appears in the figure.

Second, the authors should structure the description of the function of each component and its integration with the overall model in sufficient detail to include how inputs are processed in this study, how outputs are generated, and how this contributes to achieving the goals of the model. In particular, by dividing the methodology of this study into clearly labelled subsections dedicated to specific components of the model (e.g., learning BERT word embeddings, learning repetition questions, overall semantic recognition), the structure should guide the reader through the architecture of the model in a logical way to understand why this study is logically consistent and plausible.

Third, in terms of simplifying technical terms and refining the description, it is necessary to make the novel contributions of the RQ-OSPTrans model easier to understand and to ensure that the methodology section effectively communicates the innovation and utility of the study. For example, it is important to explain how the iterative question learning module improves feature integrity and how the full semantic recognition module contributes to the final classification accuracy, avoiding unnecessary jargon to make the presentation accessible to the intended audience.

 

* Experiment

Firstly, in order to provide clarity and justification of the choice of datasets in this study, the authors do not provide sufficient rationale as to why the datasets used (THUCNews, AG's News, arXiv-10, CIPCC) are suitable for testing the proposed model, especially given their diversity in content and structure. Therefore, the authors would like to emphasise that linking the characteristics of the datasets to the specific challenges or competencies that the RQ-OSPTrans model aims to address can help validate the choice of datasets as an effective tool to demonstrate the effectiveness of the model, based on strengthening the rationale for the choice of datasets. I also believe that it would be very important to include a more detailed discussion of the comparative results, focusing on specific features or architectural decisions of RQ-OSPTrans that contribute to its performance relative to other models, and conversely, to address cases where RQ-OSPTrans performed poorly, providing the authors' insights into potential limitations or areas for improvement.

Secondly, regarding the impact of hyperparameter settings, the authors note that while there is some mention of hyperparameter tuning in the study, the impact of these settings on the overall performance is not analysed in depth; therefore, i believe it would be very important for the authors to provide a subsection dedicated to hyperparameter tuning to identify how different settings affect the performance of the model, which could necessarily include a sensitivity analysis, which would greatly enhance the depth of the document and present the authors' insights into the robustness of the model.

Third, while the RQ-OSPTrans model provides the key to more rigorously evaluating the model's performance in different contexts or dataset splits, either through cross-validation results or by using the RQ-OSPTrans model, the model is still tested on datasets with varying levels of class imbalance and noise, but there is limited discussion of how the model handles these issues; therefore, please include a more detailed analysis of the model's performance in the context of unbalanced data. I would like to see authors discuss specific techniques or model tuning used to manage class imbalance and noise, and consider providing performance metrics that specifically reflect improvements in these areas.

 

* Conclusion.

It's really important to be upfront about the limitations of this study. For instance, authors need to be aware that they using large pre-trained models like BERT, which might make things more complex or limit their use in certain situations. i should also think about how scalable and computationally demanding the model is. For instance, i think it would be great if the authors could discuss potential methodologies for incorporating multimodal capabilities, or discuss in more depth specific types of domain data sets that could benefit from the proposed improvements. This would help future research to be more specific and actionable.

I would also like to see the conclusions of this study address the issue of highlighting the performance improvements of the proposed model. It would be great if authors could contextualise these improvements within the wider field, so i can have a discussion about how advances in RQ-OSPTrans contribute to the wider field of natural language processing or text classification. It would also be really helpful if authors could compare this approach to or contribute to ongoing advances in dealing with noisy or colloquial data in text classification. I would like to see the presentation include a discussion of potential applications in industries such as social media analytics, sentiment analysis, or automated content moderation. It would also be great to hear how the model’s ability to deal with long texts containing noisy and colloquial expressions can benefit these areas.

 

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Back to TopTop