Next Article in Journal
A Multivariate Local Descriptor Registration Method for Surface Topography Evaluation
Previous Article in Journal
Preservation Surgery of Septic Osteoarthritis and Osteomyelitis in the Diabetic Foot Using S53P4 Bioactive Glass—A Case Series
 
 
Article
Peer-Review Record

Thematic Analysis: A Corpus-Based Method for Understanding Themes/Topics of a Corpus through a Classification Process Using Long Short-Term Memory (LSTM)

Appl. Sci. 2023, 13(5), 3308; https://doi.org/10.3390/app13053308
by Yaser Altameemi 1,* and Mohammed Altamimi 2
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(5), 3308; https://doi.org/10.3390/app13053308
Submission received: 25 January 2023 / Revised: 27 February 2023 / Accepted: 27 February 2023 / Published: 5 March 2023

Round 1

Reviewer 1 Report

The manuscript used the same data in Altameemi's thesis, which is benefit for verification of the submission method. However, there is no comparative analysis in the results analysis (Section 7 and 8). If possible, it is recommended to supplement appropriately.

Author Response

Dear Reviwer 1,   We would like to thank the reviewers for their constructive feedback in which this has strengthen the quality of the paper. In the attached file, we present the fulfillment of the comments and suggestions provided by Reviewer 1. Then, after the table we present the article after the revision with the consideration of all the points including those mentioned by other reviewers. Please note that, the final version of the paper including the proofreading by MDPI will be sent after the acceptance of the reviewers for required changes.   All the best, Yaser Altameemi

Author Response File: Author Response.docx

Reviewer 2 Report

Overall

This paper describes thematic analysis using a corpus-based method to understand the themes of a corpus using LSTM. The current version of the paper leaves me rather confused and I am uncertain of what the contribution to the research literature is. I have a number of major issues for the authors to address. I also found the paper difficult to follow in a number of places due to the language choices made.  

Major issues

1. Lack of clear contribution

The authors claim “the main contribution of the article is the use of LTSM [sic] for the automatic thematic analysis of a corpus”, yet LSTM can categorize and classify any data.

To be worthy of publication the paper must offer something that is novel, substantial, rigorous and/or important. However, I am unsure what the authors believe their contribution offers in any of these four aspects.  The first paper to introduce LSTM was published over 25 years ago, and so simply using LSTM on a new dataset is far from novel.  From the perspective of a corpus linguist, perhaps their contribution is to extract the KWIC concordance lines and use only those excepts for thematic analysis. However, given that no details are providing on extracting lines perhaps, the authors simply used the whole corpus.

2.  Lack of baseline

The authors claim to have created a novel methodology that achieves good results, yet they have not provided any comparison data. The authors should use an existing methodology on the same dataset, and then compare their results to that. Alternatively, if they cannot do so, they could at least compare some alternative algorithms and show that their LSTM outperforms the others. Currently, the claim appears to be: we made a new algorithm, it achieves an accuracy of XX%, which we state is good.  This is not robust enough to be published in Applied Sciences.

3. Literature review

The authors provided two brief reviews of related works. However, neither of the reviews showed the readers the specific niche or weakness that their research addresses. The CL and TA review detailed a number of papers, but did not use them to justify their research.

The LSTM review is similarly shallow. The authors claim that “These studies do not contribute in providing detailed and complex thematic analysis for a corpus around a particular topic”. Yet, using LSTM to classify a corpus into themes is a straightforward task.

4. Method

This needs a far more detailed explanation.

a. There is insufficient detail given about parameters and the justification for the choices of the settings for the LSTM. It is impossible to replicate this research even if the dataset is shared, since there is too little information given.

b. There are sparse details about the dataset: not even the number of tokens is given.

c. The authors mix lines (e.g. line 226) and concordance lines. The lines given in Table 1 are all sentences and not concordance lines. The concordance line is the span of text around the keyword regardless of punctuation.

d. Were the 685 lines mentioned (line 226) used as training data, or is that the test data. If there was training, test data; what was the split?

e. I may be wrong, but I am left with the impression that the authors are using LSTM on a dataset that is far too small. 

Minor issues

1. Self-citation

There are twelve citations/references to the as yet unpublished PhD thesis of the first author, and so the veracity of any claims cannot be checked. The author must refer to the thesis, but perhaps this can be done more efficiently.

2. CDA

Although CDA is a fascinating area and I can understand how this relates to the content of the texts, this paper does not deal with CDA and so I fail to understand why this term is mentioned on five separate occasions

3. Identical quotation repeated

“According to Baker [1, p. 96] collocation is a “way of understanding meanings and associations between words which are otherwise difficult to ascertain from a small-scale analysis of a single text.”

The authors rightly quote Paul Baker, but there is no reason to repeat the identical quote on subsequent pages.

Language issues

Although the article is generally grammatically accurate, the whole paper needs to be carefully proofread to ensure that meaning is expressed clearly and concisely.

Some of the more humorous errors include:

1. Lines 161-162

However, these studies do not provide a mythology for a thematic analysis

2. Line 146

recreant neural networks (RNN).

3. Line 265

Evaluation Measurers

4. Line 387

LTSM  (= Learning Teaching Subject Material)

 

However, there are far more language errors that impact meaning negatively leaving the reader confused.

Author Response

Dear Reviewer 2,   We would like to thank you for the constructive feedback in which this has strengthen the quality of the paper. In the attached file, we present the fulfillment of the comments and suggestions provided by Reviewer 2. Then, after the table we present the article after the revision with the consideration of all the points including those mentioned by other reviewers. Please note that, the final version of the paper including the proofreading by MDPI will be sent after the acceptance of the reviewers for required changes.   All the best,  Yaser Altameemi

Author Response File: Author Response.docx

Reviewer 3 Report

In general, the article need to be revised

1.In the introduction, the contribution of this paper should be emphasized. At present, the author does not highlight the innovation of the article

2. At present, only 25 articles are seriously insufficient. 

3.The author can combine‘Corpus Linguistics and Thematic Analysis’and “Long Short-Term Memory Networks”into a literature review.

4.The article about Corpus Linguistics and Thematic Analysis methods are recommended to the author. 

In lines 100, the author proposed“the corpus approach through using the Mutual Information score to measure the word associations in the corpora”

This article(10.1109/ACCESS.2019.2920091) uses the corpus method to expand the corpus by using mutual information to improve the accuracy of topic analysis.

In lines 105, the author proposed“Analyzing the collocation is also used as a key tool for analyzing the lexical-thematic analysis.” 

This article(10.1109/ACCESS.2019.2919734) surveys the main themes in e-commerce teahouses. Through the use of LDA  method, it is classified as a topic according to the frequent word groups.

5.The article about LSTM methods is recommended to the author. The advantages of LSTM over other algorithms can be proved.

In lines 105, the author proposed“Long Short-Term Memory (LSTM) has exceled recently in the field of artificial intelligence and deep learning. It is an advanced method of recreant neural networks (RNN)..”

This article(10.3390/app11136199) pointed out that the LSTM model has  advantages over the SVM 、BP neural networks method in prediction.

6. “Experiments、Evaluation Measurers、Results”should be combined into a whole chapter.

7. The table should be beautiful, and the author needs to adjust the font

8. The author should add a contrast model to prove the superiority of LSTM in the experimental process. This is the key point

9. The conclusion is not good enough. It should be discussed in combination with the result.

10. The author needs English professionals to revise this article.

Author Response

Dear Reviewer 3,   We would like to thank you for the constructive feedback in which this has strengthen the quality of the paper. In the attached file, we present the fulfillment of the comments and suggestions provided by Reviewer 3. Then, after the table we present the article after the revision with the consideration of all the points including those mentioned by other reviewers. Please note that, the final version of the paper including the proofreading by MDPI will be sent after the acceptance of the reviewers for required changes.   All the best,  Yaser Altameemi

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Second review

Overall

The authors have made a good effort in addressing the issues raised by reviewers in the first round of reviews. I can now more easily understand the intended contribution.  The authors have sufficiently dealt with many of the issues regarding content that I raised in the first round. Language issues still make it slightly difficult to follow but I am sure the professional proofreading service will be able to improve that. The main outstanding issue relates to the lack of baseline.

Outstanding issues

Original comment

2.  Lack of baseline

The authors claim to have created a novel methodology that achieves good results, yet they have not provided any comparison data. The authors should use an existing methodology on the same dataset, and then compare their results to that. Alternatively, if they cannot do so, they could at least compare some alterative algorithms and show that their LSTM outperforms the others. Currently, the claim appears to be: we made a new algorithm, it achieves an accuracy of XX%, which we state is good.  This is not robust enough to be published in Applied Sciences.

I have clarified my original comment below

2.1 baseline – human vs computer

The authors note that they compared the results of their LSTM experiment with the manual results of Altameemi. I suggest that the authors provide a clear side-by-side tabular comparison showing the results of the LSTM and Altaneemi. Presumably, Altameemi provided evidence of the accuracy of the results in his/her thesis. Manual annotation is fraught with accuracy issues, hence the necessity for double annotation. The lack of a ground truth further exacerbates the problem of comparing automatic and human annotation/classification.

2.2 baseline – reported LSTM experiment vs different LSTM experiments

 

The authors report one experiment, yet when performing deep learning, it is usual to text multiple algorithms/methods and select the best performing configuration. Does the lack of reporting this mean that the authors used only one configuration? If so, the authors should provide cogent arguments to support why that deemed the most accurate? However, a better resolution is for the authors to provide details of other experiments with different machine learning algorithms.

Author Response

Dear Reviewer 2, 

First, we would like to thank you for the constructive feedback in which this has strengthen the quality of the paper. In the attached file we present the fulfillment of the comments. We started directly with the specific points mentioned by the Reviewer as he/she suggested “I have clarified my original comment below”. The uploaded article includes all the required changes in track changes with the consideration of all points including those mentioned by other reviewers. Please note that, the final version of the paper including the proofreading by MDPI will be sent after the acceptance of the reviewers for required changes.

 

All the best,

 

Yaser Altameemi

Author Response File: Author Response.docx

Reviewer 3 Report

The author has carefully revised the article, and its work results are worthy of recognition. However, there are still some problems to be corrected.

1. Author's answered“The result of the article is already compared with the Altameemi’s [3] findings who used the same data. Altameemi analyzed the concordance lines manually. Then, he verified the findings by testing the classification of the concordance lines from two linguistic experts. At the end of his thesis, Altameemi suggested that the analyzed concordance lines are randomly selected, and he did not analyze all the concordance lines of the keywords due to the time limit. He proposed the importance of automatic thematic analysis of the lines to make the results of the analysis more representative by analyzing all the concordance lines. This point is discussed not only for Altameemi’s work but, this is an issue for the approach of thematic analysis using the collocational network. Therefore, the central contribution of the article is about testing the effectiveness of applying LSTM in the thematic analysis of concordance lines. ”

It is suggested that the author should highlight the results of the experiment in the text table, otherwise it is easy to ignore this information.

2. It seems that we haven't seen the neural network comparison model in the article. The author should use the same data set and use a different neural network model, such as RNN, to carry out comparative experiments to prove the advantages of LSTM model. This is important and easy to realize.

Author Response

Dear Reviewer 3, 

First, we would like to thank you for the constructive feedback in which this has strengthen the quality of the paper. In the attached file we present the fulfillment of the comments. The uploaded article includes all the required changes in track changes with the consideration of all points including those mentioned by other reviewers. Please note that, the final version of the paper including the proofreading by MDPI will be sent after the acceptance of the reviewers for required changes.

 

All the best,

Yaser Altameemi

Author Response File: Author Response.docx

Back to TopTop