Next Article in Journal
Application of a Local Dynamic Model of Large Eddy Simulation to a Marine Propeller Wake
Next Article in Special Issue
Knowledge Interpolated Conditional Variational Auto-Encoder for Knowledge Grounded Dialogues
Previous Article in Journal
Special Issue on Small Satellites Missions and Applications
Previous Article in Special Issue
FREDA: Few-Shot Relation Extraction Based on Data Augmentation
 
 
Article
Peer-Review Record

Semantic Similarity Analysis for Examination Questions Classification Using WordNet

Appl. Sci. 2023, 13(14), 8323; https://doi.org/10.3390/app13148323
by Thing Thing Goh 1,2, Nor Azliana Akmal Jamaludin 2,*, Hassan Mohamed 2, Mohd Nazri Ismail 2 and Huangshen Chua 1
Reviewer 2: Anonymous
Reviewer 3:
Appl. Sci. 2023, 13(14), 8323; https://doi.org/10.3390/app13148323
Submission received: 20 April 2023 / Revised: 19 May 2023 / Accepted: 13 June 2023 / Published: 19 July 2023
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

Round 1

Reviewer 1 Report

In this work, the authors have extended their own work. The proposed work is novel is nature. There are following suggestions

1. Don't abbreviate the titles of sections or subsections.

2. Compare the results with the existing techniques.

Language of the paper is OK.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

The manuscript proposes a Question Classification System (QCS) to classify examination questions based on Bloom's Taxonomy (BT). The authors highlight the lack of research in classifying multi-sentence and multi-subject questions. However, the paper does not adequately address these issues. The authors tackle the problem of multi-sentence questions by slicing questions composed of multiple sentences and analyzing each sentence individually, thus returning to the single-sentence scenario. The issue of multi-subject questions (which actually refers to the ability of the QCS system to be domain-independent) is not addressed, as there is no error analysis carried out for questions of different domains and the searched Bloom's taxonomy verbs are the same regardless of the domain.

 

The proposed system combines existing Natural Language Processing tools, such as NLTK and Stanford POS tagger, along with WordNet Similarity, to identify BT verbs and classify questions according to them. While the manuscript addresses an important issue of examination question classification, the approach lacks originality and depth. Although the idea of using Wordnet is interesting, the methodology used is too simplistic and it doesn't reflect the current state-of-the-art in the field. The focus on comparing the performance of different models for performing the classical NLP pipeline (sentence splitting, tokenization, POS tagging) is unnecessary. Additionally, the authors could have carried out an extensive evaluation reporting the results of the human annotators in terms of inter-annotator agreement, discussing why these sentences should be considered as particularly difficult for the model and performing an in-depth error analysis with possible motivations for erroneous annotations. Furthermore, the labels for POS tags are not explained (what do VBP and VBZ stand for in the tagset?).

 

In light of these major flaws, I cannot recommend the manuscript for publication. However, I suggest that the authors revise their approach and methodology, possibly by considering the results of more recent deep language models, and increase the sample size to evaluate the effectiveness of their proposed system. Additionally, an evaluation reporting the results of human annotators in terms of inter-annotator agreement would also strengthen the manuscript.

The manuscript needs extensive revision to improve the quality of the English language. Just a few examples from the Introduction section:

- line 18: Most of the research are

- line 43:  caused the inconsistent of labelling

- line 45: can be reduce

- Recent years, various studies

- it is not an easy to teach

- in order to understand by machine

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Please describe all abbreviations in the paper (“VB”, “VBD”, “VBG”, “VBN”, “VBP”, and “VBZ”).

It is unclear how Similarity Matrix Scores computed. What kind of algorithm is used? A detailed example should be provided.

As a baseline model, other existing methods should be experimented with the same 200 dataset to show how well the current approach outperformed the previous existing approaches.

Sending the draft to a proofreading service would improve the quality of English language.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

The authors enriched the analysis reported in the paper, as requested in the review. Additionally, the authors explained in more detail the methodology for multi-sentence question processing. Such interventions improved the overall quality of the paper making the presentation more clear. 

However, from an NLP perspective, the methodology proposed is too simplistic and not original. The proposed work doesn't reflect the current state-of-the-art in the field, and it is not original in the application scenario concerning the classification of questions into Bloom's taxonomy.  For this reason, I do not think that the paper contributes sufficiently to the research field. 

Minor: It should be clearly stated from the beginning that the work is part of an ongoing work which has partially been published already. See the conclusions: "This paper presents a continuous study of the framework proposed by Goh et al. [17]".

Reviewer 3 Report

The authors address all of my concerns.

Back to TopTop