Next Article in Journal
Dimensional Synthesis for Multi-Linkage Robots Based on a Niched Pareto Genetic Algorithm
Previous Article in Journal
A NARX Model Reference Adaptive Control Scheme: Improved Disturbance Rejection Fractional-Order PID Control of an Experimental Magnetic Levitation System
Previous Article in Special Issue
Local-Topology-Based Scaling for Distance Preserving Dimension Reduction Method to Improve Classification of Biomedical Data-Sets
 
 
Article
Peer-Review Record

Methodology for Analyzing the Traditional Algorithms Performance of User Reviews Using Machine Learning Techniques

Algorithms 2020, 13(8), 202; https://doi.org/10.3390/a13080202
by Abdul Karim 1, Azhari Azhari 1, Samir Brahim Belhaouri 2,*, Ali Adil Qureshi 3 and Maqsood Ahmad 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Algorithms 2020, 13(8), 202; https://doi.org/10.3390/a13080202
Submission received: 12 May 2020 / Revised: 23 June 2020 / Accepted: 2 July 2020 / Published: 18 August 2020
(This article belongs to the Special Issue Advanced Data Mining: Algorithms and Applications)

Round 1

Reviewer 1 Report

Dear authors,

here my comments.

1) I think that the framework should be described in a single Section, while in the result Section, the authors should follow the schema of the framework highlighting the most important steps.

2) In the preprocessing step, should be applied also the lemmatization, which in linguistics is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word's lemma, or dictionary form.

3) The logistic regression is for Boolean response, so you should describe the class comparison during the theory (e.g., one class versus the others and so on).

4) In this paper:
Data miners' little helper: data transformation activity cues for cluster analysis on document collections. T Cerquitelli, E Di Corso, F Ventura, S; are described interesting metrics able to capture the lexical richness of the collection under analysis. I think that a comparison could be interesting, especially to compare the lexical richness of the different categories.


5) For the sentiment analysis, the three well-known lexicons are

- AFINN from Finn Årup Nielsen
- bing from Bing Liu and collaborators
- nrc from Saif Mohammad and Peter Turney

All three of these lexicons are based on unigrams (or single words). These lexicons contain many English words and the words are assigned scores for positive/negative sentiment, and also possibly emotions like joy, anger, sadness, and so forth. I think that a comparison could be interesting for the presented collection.


6) Which are the tuning parameters for each algorithm? Should be reported the hyper-parameter tuning to avoid overfitting.

7) What about LDA or embedding algorithms? Moreover, using an unsupervised learning algorithm could help the authors to understand the main topics in the document collection. Have you considered to apply BERT embedding?


8) In the state-of-the-art, I expected to find the following papers:

For the different weighting strategies and not only TF-IDF which is well-known:
1) P. Nakov, A. Popova, and P. Mateev. Weight functions impact on LSA performance.
In EuroConference RANLP 2001, pages 187--193.

For the other methodologies:
1) Sparse PCA: Convex Relaxations, Algorithms and Applications Youwei Zhang, Alexandre d’Aspremont , and Laurent El Ghaoui.

2) Latent Dirichlet allocation. DM Blei, AY Ng, MI Jordan.

3) Probabilistic latent semantic analysis. T. Hofmann.

For innovative visualisation techniques of the textual results (specifically, graph views, t-SNE):
1) Towards automated visualisation of scientific literature
E Di Corso, S Proto, T Cerquitelli, S Chiusano
European Conference on Advances in Databases and Information Systems, 28-36


Minor comments:

1) reduce the font of the tables and adapt the page to the column width.

2) remove space before " ," in several pages.

3) title"regular expression" ... "of'routine". Read carefully the paper to remove extra spaces and add them when necessary.

Author Response

Cover Letter in Response to Reviewer’s Comments

 

 

Professor Ms. Alina Chen

Assistant Editor

Name of Journal: MDPI Algorithms for Databases, and Data Structures

Manuscript ID: algorithms-817069
Type of manuscript: Article
Title: Methodology for Analyzing the Traditional Algorithms Performance of
User’s Reviews Using Machine Learning Techniques

Date: 4 June 2020

 

Respected Professor Alina Chen

 

 

Thank you for the opportunity to revise our work. Please see our response to the reviewer’s comments (in the yellow highlighted text). We also greatly appreciate the reviewers for their complimentary comments and suggestions. We have carried out the experiments that the reviewers suggested and revised the manuscript accordingly. Please find attached a point-by-point response to reviewer’s concerns. We hope that you find our responses satisfactory and that the manuscript is now acceptable for publication.

 

I am very thankful to the reviewer; according to reviewers' comments, we update our manuscript I am happy to learn more because of reviewers' comments these comments give me lot of knowledge, and I learn many things. These comments further enhanced my skill as well as gave me the opportunity to learn more and more, and these comments increased my research and my study. I am very thankful to the reviewers for making this possible because of your comments. Your comments open my mind, and my interest grew, and so did my learning.   Thanks a lot, Sir.

 

Reviewer 1:

 

We appreciate that the reviewer’s comments. The followings are our point-by-point responses:

 

  • I think that the framework should be described in a single Section, while in the result Section, the authors should follow the schema of the framework highlighting the most important steps.

Response: Thank you for bringing this inconsistency to our attention. Now the framework included in a single section and most important steps are highlighted.

  • In the preprocessing step, should be applied also the lemmatization, which in linguistics is the process of grouping together the inflected forms of a word so they can be analyzed as a single item, identified by the word's lemma, or dictionary form.

Response: As suggested by the reviewer, we have Lemmatization include in the figure.

  • The logistic regression is for Boolean response, so you should describe the class comparison during the theory (e.g., one class versus the others and so on).

Response: We use a multiclass parameter in Logistic Regression, the result of this parameter, mostly in probability. We give text data to Logistic Regression Now Logistic regression predicts the data to check the data belong from which class when it’s got a text data then it calculates the probability of each class which class probability it more,  so Logistic Regression result about this class is more chances as compare to other class and this class is belong to this text data according to the probability more chances, (Probability score is like 0.01 and 0.02 the probability is more than the 5 so chances of that class).

 

 

 

 

 

 

  • In this paper:


Data miners' little helper: data transformation activity cues for cluster analysis on document collections. T Cerquitelli, E Di Corso, F Ventura, S; are described interesting metrics able to capture the lexical richness of the collection under analysis. . I think that a comparison could be interesting, especially to compare the lexical richness of the different categories.

Response: Thank you for these comments. Actually, some paper is paid, but I read this paper, and I add this paper reference in my literature review. This paper really increases my knowledge.

  • For the sentiment analysis, the three well-known lexicons are:

- AFINN from Finn Årup Nielsen
- bing from Bing Liu and collaborators
- nrc from Saif Mohammad and Peter Turney

 

All three of these lexicons are based on unigrams (or single words). These lexicons contain many English words and the words are assigned scores for positive/negative sentiment, and also possibly emotions like joy, anger, sadness, and so forth. I think that a comparison could be interesting for the presented collection.

Response: We are thankful to you for putting our focus on lexicon base sentiment analysis, but in this study, we do not use any lexicon base sentiment analysis, so we cannot compare AFINN with our purpose technique. But for knowledge gain we implement it according to your comment and result are below

        Category Name

          AFINN Sentiment Analysis

Sports

2.338

Communication

1.770

Action

2.120

Arcade

2.530

Video players & editors

2.210

Weather

2.318

Card

2.459

Photography

2.576

Shopping

1.407

Health & fitness

3.387

Finance

1.328

Casual

2.896

Medical

2.847

Racing

2.553

 

  • Which are the tuning parameters for each algorithm? Should be reported the hyper-parameter tuning to avoid overfitting.

Response: Thank you for bringing this point to our attention. There are parameters:

Learning Algorithms

Hyperparameter

LR

random_state=2 multi_class=”ovr”

RF

n_estimators=300, random_state=2, max_depth=200

Naïve Bayes Multinomial

We apply Navie Bayes Multinomial with default parameter setting

 

  • What about LDA or embedding algorithms? Moreover, using an unsupervised learning algorithm could help the authors to understand the main topics in the document collection. Have you considered to apply BERT embedding?

Response: LDA is a topic modeling Algorithm, in our research we are follow classification. Now we already write the research paper it is difficult once again we perform topic modeling. But according to your comment, we start work on BERT in next paper I will try to use this concept.

  • In the state-of-the-art, I expected to find the following papers:

 

For the different weighting strategies and not only TF-IDF which is well-known:
1) P. Nakov, A. Popova, and P. Mateev. Weight functions impact on LSA performance.
In EuroConference RANLP 2001, pages 187—193.

 

For the other methodologies:
1) Sparse PCA: Convex Relaxations, Algorithms and Applications Youwei Zhang, Alexandre d’Aspremont , and Laurent El Ghaoui.

 

2) Latent Dirichlet allocation. DM Blei, AY Ng, MI Jordan.

 

3) Probabilistic latent semantic analysis. T. Hofmann

 

For innovative visualisation techniques of the textual results (specifically, graph views, t-SNE):
1) Towards automated visualisation of scientific literature
E Di Corso, S Proto, T Cerquitelli, S Chiusano
European Conference on Advances in Databases and Information Systems, 28-36

 

Response: We are agreed with the reviewers we find these paper with the excellent start of the research and art material.

Minor comments:

  • reduce the font of the tables and adapt the page to the column width
  • remove space before " ," in several pages
  • title"regular expression" ... "of'routine". Read carefully the paper to remove extra spaces and add them when necessary

Response: Yes, we update manuscript according to format and reviewer comments.

 

Thank you so much for such a knowledge of full comments.

 

 

 

All the above changes have been reflected in the manuscript in yellow highlighted text.

We hope you find the revised manuscript acceptable for publication. Thank you once again for your consideration.

 

 

 

Sincerely,

 

Abdul Karim

PhD Computer Science Student

University Gadjah Mada

Yogyakarta

Indonesia

 

Corresponding Author & Prove Reading:

Dr. Samir Brahim Belhaouri

Division of Information & Computer Technology, College of Science & Engineering, Hamad Bin Khalifa University, Doha, Qatar;

 

 

Author Response File: Author Response.docx

Reviewer 2 Report

The topic is interesting and the findings are unique which will contribute to the industry. Please see the detailed comments.

The quality of writing could be improved. There are several instances of poor grammars and sentences throughout the paper.

 

In the introduction, the point is not clear. Please provide strong argument about purpose of this study. The readers want to know why this study is important and significant.

 

In line 58, what kind of scrapping technique has been used? Please mention in the text.

 

In the literature review, it looks like information is scattered.
I think it would be good if this part that would help readers to read this paper easily by listing academic things by keyword.

 

In line 182, how did you scrap the review? Please express what kind of program you have used.

 

In line 187, you said you have removed some character in the prepocessing stage. But you need to explain details of what kind of character you have been deleted with examples. And why you had to delete the single character.

 

In line 211, it's not good to see the general information that might come from the introduction being suddenly mentioned in the methodology. It is complicated enough to write down the methodology without general information.
This general information should be described in the introduction or literature review to support why this study was done in this way.

 

In line 229, you have download 148 apps in 14 categories. I want to ask why it is 148 apps, and is this ok to mix 14 different categories? Is this sample represents other apps as well? There should be logical reason why you have download 148 apps in 14 categories.

 

In line 240, you said this library works quickly and saves the programmer's time as shown in table 1, but the table 1 is showing that how many review have been extracted by each caterogy. How is this table explain regarding to the saving time for the programmer?

 

In section 5. results and experiments, you are explaining method such as logistic regression. This section could be added in the methodology, and section 6 can be the result.

 

In the conclusion, the findings are practically not discussed. Missing relationship with previous studies. In addition, I recommend to indicate why this work is relevant to the scientific community.

 

Overall, I think it is a study that shows the creativity and originality of the subject. However, the logical basis for defining a term is poor. Also, it seems that it is hard to understand the process of deriving the objectives.

 

Author Response

Cover Letter in Response to Reviewer’s Comments

 

 

Professor Ms. Alina Chen

Assistant Editor

Name of Journal: MDPI Algorithms for Databases, and Data Structures

Manuscript ID: algorithms-817069
Type of manuscript: Article
Title: Methodology for Analyzing the Traditional Algorithms Performance of
User’s Reviews Using Machine Learning Techniques

Date: 4 June 2020

 

Respected Professor Alina Chen

 

 

Thank you for the opportunity to revise our work. Please see our response to the reviewer’s comments (in the yellow highlighted text). We also greatly appreciate the reviewers for their complimentary comments and suggestions. We have carried out the experiments that the reviewers suggested and revised the manuscript accordingly. Please find attached a point-by-point response to the reviewer’s concerns. We hope that you find our responses satisfactory and that the manuscript is now acceptable for publication.

 

I am very thankful to reviewer according to reviewers' comments; we update our manuscript I am happy to learn more because of reviewers' comments these comments give me a lot of knowledge and I learn many things. These comments further enhanced my skill as well as gave me the opportunity to learn more and more, and these comments increased my research and my study. I am very thankful to the reviewers for making this possible because of your comments. Your comments open my mind, and my interest grew, and so did my learning.   Thanks a lot, Sir.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reviewer 2:

 

We appreciate that the reviewer’s comments. The followings are our point-by-point responses:

 

  • The quality of writing could be improved. There are several instances of poor grammars and sentences throughout the paper.

Response: Thank you for bringing this inconsistency to our attention. Now we improve the quality of our manuscript according to your mentioned points.

  • In the introduction, the point is not clear. Please provide strong argument about purpose of this study. The readers want to know why this study is important and significant.

Response:  Thank you for raising this point, as suggested by the reviewer, we have included a paragraph in an introduction that clarifies the point about the purpose this study and reader easily understand the important and significant.

  • In line 58, what kind of scrapping technique has been used? Please mention in the text.

Response: According to reviewers raise this point, we use beautiful soup 4 (bs4), requests, regular expression(re) techniques to scrape the reviews.

 

  • In the literature review, it looks like information is scattered.
    I think it would be good if this part that would help readers to read this paper easily by listing academic things by keyword.

Response: Thank you for these comments. This mentioned point helps us to focus on pattern and formatting closely—literature review update according to the reviewer’s comments and pattern.

  • In line 182, how did you scrap the review? Please express what kind of program you have used

Response: We are thankful to you for putting our focus on this. First, we create a function, and inside this function, we set the URL of the google app store, and we create a regular expression to find the reviews and then start a loop to scrape the reviews. After creating a new function and inside this, we create a list and call the previous function that we create the scrape the reviews. After we create the last function inside this, we call the second function at last when we run the program; then, we enter the application URL and number of pages that we want to scrape every page we scrape 40 reviews. So if we enter 10 pages, it scrapes 400 reviews. This is the code we use:

Code:

from __future__ import division

import requests

import json

import re

import bs4 as bs

import urllib

import os

from optparse import OptionParser

def a():

 

    def getchReview(id, page):

        data = requests.post("https://play.google.com/store/getreviews?authuser=0",

                             headers={"CONSENT": "YES+EN.us+20160117-18-0"}, data={

                "reviewType": 0,

                "pageNum": page,

                "id": id,

                "reviewSortOrder": 4,

                "xhr": 1,

                "hl": "en"

            })

        reviews = re.findall("(review-title)(.*?)(review-link)", data.text)

        rating = re.findall("Rated (.*?) stars out of five stars", data.text)

        x = []

        tmp = []

        [x.append(y) for (a, y, b) in reviews]

        for i, rev in enumerate(x):

            tmp.append({"rating": int(rating[i]), "review": rev[25:-24].replace("span", "")})

            # cursor.execute("INSERT INTO form (appid,rev) VALUES ('6','v');")

 

        print ("[I] Fetched " + str(len(tmp)) + " reviews")

        return tmp

 

    def getPages(id, n):

        s = []

        [[s.append(x) for x in getchReview(id, i)] for i in range(n)]

        return s

 

    def main():

        sql = ""

        val = ""

        pages = str(nam.get())

        app_id = str(name.get())

        output = "rev.json"

        # os.remove("rev.json")

 

        print ("[I] Fetching " + str(pages) + " pages of " + app_id)

        s = getPages(app_id, int(pages))

 

        with open(output, "w+") as output_file:

            json.dump({"results": s}, output_file)

 

 

  • In line 187, you said you have removed some character in the prepocessing stage. But you need to explain details of what kind of character you have been deleted with examples. And why you had to delete the single character.

Response: we remove all the single characters. For instance, when we remove the punctuation mark from "David's" and replace it with space, we get "David" and a single character "s", which has no meaning. To remove such single characters, we use \s+[a-zA-Z]\s+ regular expression, which substitutes all the single characters having spaces on either side, with a single space.

 

  • In line 211, it's not good to see the general information that might come from the introduction being suddenly mentioned in the methodology. It is complicated enough to write down the methodology without general information.
    This general information should be described in the introduction or literature review to support why this study was done in this way.

Response: We have made this change. Extra information is deleted. Only relevant and knowledgebase information are added. Now Manuscript is updating according to reviewer’s comments.

  • In line 229, you have download 148 apps in 14 categories. I want to ask why it is 148 apps, and is this ok to mix 14 different categories? Is this sample representing other apps as well? There should be logical reason why you have download 148 apps in 14 categories

Response: Authors follow the version policy of their paper initially, this version of the paper we use 14 categories and mix different categories with 148 applications. It's the model data that we choose randomly and check some trends of users. Next, we have a plan to use deep learning, so for the purpose of deep learning research, we need a huge amount of data.  After the publication of this version, authors are interested in working on another version of this paper, and in this version, the author uses more categories as a sample and mixes with more applications. There is no logical reason, such at all.

  • In line 240, you said this library works quickly and saves the programmer's time as shown in table 1, but the table 1 is showing that how many review have been extracted by each category. How is this table explain regarding to the saving time for the programmer?

Response: All the update makes to follow reviewer’s comments.

  • In section 5. results and experiments, you are explaining method such as logistic regression. This section could be added in the methodology, and section 6 can be the result?

Response: All the update makes to follow reviewer’s comments. We change this.

  • In the conclusion, the findings are practically not discussed. Missing relationship with previous studies. In addition, I recommend to indicate why this work is relevant to the scientific community

Response: Key findings are added in the conclusion as well as separate sections are added in manuscript. Change are made according to reviewer comment.

  • Overall, I think it is a study that shows the creativity and originality of the subject. However, the logical basis for defining a term is poor. Also, it seems that it is hard to understand the process of deriving the objectives

Response: Thank you very much for these kind words. Thank you for your appreciation.

 

Thank you so much for such knowledgeable comments.

All the above changes have been reflected in the manuscript in yellow highlighted text.

We hope you find the revised manuscript acceptable for publication. Thank you once again for your consideration.

 

 

 

Sincerely,

 

Abdul Karim

PhD Computer Science Student

University Gadjah Mada

Yogyakarta

Indonesia

 

Corresponding Author & Prove Reading:

Dr. Samir Brahim Belhaouri

Division of Information & Computer Technology, College of Science & Engineering, Hamad Bin Khalifa University, Doha, Qatar;

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Thank for the reply and the new paper version. However, several points should be improved:

- the SATA, or state-of-the-art is really poor. Moreover, the entire section should be rewritten. I suggest a lot of papers, which are not discussed at all. Moreover, "The author discusses [21] By using topic modeling", please describe better each part, including capitalization in a correct manner. You should try to explain how your model is different wrt the others. Not only copy and paste part of abstracts.

- you should try to explain why TF is better than TF-IDF. Moreover, use bold to highlight the main results in tables, otherwise is not possible to understand the best-model.

- all the numbers in a table should be in the same line. Please, arrange them.

- It is not clear how to apply logistic regression in multi-class problem. Do you compare one class with all the other? If yes, you should try to characterise the main words used in each category. How is the distribution of tf and tf-idf values?

- barchart and tables include the same information, use only one of them. Moreover, try more complex visualisations.

- precision and recall are very low. You should try to explain why. Moreover, looking at the results the accuracy highlights an imbalance problem task.

- the hyper-parameter should be reported in the paper, with a sensitivity analysis to measure the impacts of fluctuations in parameters of each mathematical model.

 


Several English misspellings:
- line 81: and tendencies " " along - extra space
- line 90: to explain "this futher", users reviews ... -> further
- line 91: Recent researches "showned"
- line 97: "negetive" rating
- line 100: of project is possible" ",
- line 111: this may "helps" rectify the fact
- line 124: Describing "a" regular expressions
- line 200: The result is that store rating "are" not energetic
- ect

Author Response

Cover Letter in Response to Reviewer’s Comments

 

 

Professor Ms. Alina Chen

Assistant Editor

Name of Journal: MDPI Algorithms for Databases, and Data Structures

Manuscript ID: algorithms-817069
Type of manuscript: Article
Title: Methodology for Analyzing the Traditional Algorithms Performance of
User’s Reviews Using Machine Learning Techniques

Date: 22 June 2020

 

Respected Professor Alina Chen

 

 

Thank you for the opportunity to revise our work once again. Please see our response to the reviewer’s comments (in the green highlighted text, Review Round 2). We also greatly appreciate the reviewers for their complimentary comments and suggestions. We have carried out the experiments that the reviewers suggested and revised the manuscript accordingly. Please find attached a point-by-point response to the reviewer’s concerns. We hope that you find our responses satisfactory and that the manuscript is now acceptable for publication.

 

I am very thankful to reviewer according to reviewers' comments; we update our manuscript I am happy to learn more because of reviewers' comments these comments give me a lot of knowledge and I learn many things. These comments further enhanced my skill as well as gave me the opportunity to learn more and more, and these comments increased my research and my study. I am very thankful to the reviewers for making this possible because of your comments. Your comments open my mind, and my interest grew, and so did my learning.   Thanks a lot, Sir.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Reviewer 1:

 

We appreciate the reviewer’s comments. The followings are our point-by-point responses:

 

  • the SATA, or state-of-the-art is really poor. Moreover, the entire section should be rewritten. I suggest a lot of papers, which are not discussed at all. Moreover, "The author discusses [21] By using topic modeling", please describe better each part, including capitalization in a correct manner. You should try to explain how your model is different wrt the others. Not only copy and paste part of abstracts.

Response: Respected Sir, thank you for bringing this inconsistency to our notice. Now we include your suggested papers in Literature review such as [35] … [38]. We will also explain why our model is different, and the discussion about [21] is too updated according to your comments. Your comments give us a lot of knowledge we are more enable to study more and more from your references.

  • you should try to explain why TF is better than TF-IDF. Moreover, use bold to highlight the main results in tables, otherwise is not possible to understand the best-model.

Response: The TF vector is easy to interpret for machine learning models. That’s the reason the learning of models is on Feature not too complicated as compare to TF-IDF features.

We will study this reference and find:

According to these references:

Qaiser, S., & Ali, R. (2018). Text mining: use of TF-IDF to examine the relevance of words to documents. International Journal of Computer Applications181(1), 25-29.

In his paper authors find the weaknesses of the TF-IDF. They propose an improve version of TF-IDF. Some of these weaknesses may be better result in TF.

 

Kim, S. W., & Gil, J. M. (2019). Research paper classification systems based on TF-IDF and LDA schemes. Human-centric Computing and Information Sciences9(1), 30.

 

In this paper, the authors describe the TF-IDF and LDA schemes to calculate the importance of each article and group the papers.

https://www.kaggle.com/occam19/tf-idf-vs-bag-of-words

TF-IDF has a positive effect on the correctness of the model, but TF words schema and TF-IDF schema is slightly different.

https://stats.stackexchange.com/questions/153069/bag-of-words-for-text-classification-why-not-just-use-word-frequencies-instead

Sometime TF-IDF achieved a better result than TF when it's combined with some supervised methods, but it’s not combined with supervised methods to TF results are considered better.

https://towardsdatascience.com/3-basic-approaches-in-bag-of-words-which-are-better-than-word-embeddings-c2cbc7398016

https://medium.com/analytics-vidhya/fundamentals-of-bag-of-words-and-tf-idf-9846d301ff22

https://www.analyticsvidhya.com/blog/2020/02/quick-introduction-bag-of-words-bow-tf-idf/

TF is a better approach than word embedding. TF is resulting in better on this dataset.

 

 

 

  • all the numbers in a table should be in the same line. Please, arrange them.

Response: Thank you for this comment. As suggested by the reviewer, we have to update the table accordingly.

 

  • It is not clear how to apply logistic regression in multi-class problem. Do you compare one class with all the other? If yes, you should try to characterise the main words used in each category. How is the distribution of tf and tf-idf values.

Response: we individually train the TF and TF-IDF. First, we pass the features of TF and train the model, then model trains, and when we pass the example for the test. It generates a probability score against each example. We have 14 categories, so it generates a probability score against each category, which category has more score, so that category is our final prediction. The same process we can apply on TF-IDF. Then its generate a probability score for each example and with more probability score, so that is our final prediction. Which class has a higher probability score so that class is final. (Sport class have higher probability score).

 

  • barchart and tables include the same information, use only one of them. Moreover, try more complex visualisations.

Response: We are thankful to you for putting our focus on that point. We are updating our manuscript according to your comments.

  • precision and recall are very low. You should try to explain why. Moreover, looking at the results, the accuracy highlights an imbalance problem task.

Response: We are agreed with the reviewer. This time we focus on classification in the next future study, we can perform balance techniques on the dataset, and we plan future research.

 

  • the hyper-parameter should be reported in the paper, with a sensitivity analysis to measure the impacts of fluctuations in parameters of each mathematical model.

Response: We have made this change. Now Manuscript is updating according to the reviewer’s comments. Respected Sir, results of fluctuations are below:

Random Forest algorithm on TF and TF/IDF bases after preprocessing.

Categories

Parameter

Random Forest

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=200, random_state=150, max_depth=100

0.579

0.31

0.305

0.292

0.578

0.332

0.291

0.294

Action

0.675

0.325

0.30

0.301

0.682

0.339

0.299

0.290

Logistic Regression algorithm on TF and TF/IDF bases after preprocessing.

Categories

Parameter

Logistic Regression

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=200, random_state=150, max_depth=100

0.579

0.334

0.30

0.298

0.572

0.337

0.298

0.294

Action

0.679

0.322

0.298

0.301

0.684

0.334

0.292

0.292

Random Forest algorithm on TF and TF/IDF bases after preprocessing.

Categories

Parameter

Random Forest

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=400, random_state=250, max_depth=300

0.577

0.30

0.301

0.290

0.572

0.331

0.290

0.291

Action

0.672

0.320

0.302

0.30

0.681

0.334

0.298

0.290

Logistic Regression algorithm on TF and TF/IDF bases after preprocessing.

Categories

Parameter

Logistic Regression

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=400, random_state=250, max_depth=300

0.577

0.330

0.299

0.294

0.561

0.312

0.288

0.291

Action

0.673

0.312

0.294

0.30

0.644

0.311

0.290

0.290

Random Forest algorithm on TF and TF/IDF bases without preprocessing.

Categories

Parameter

Random Forest

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=200, random_state=150, max_depth=100

0.571

0.349

0.301

0.299

0.590

0.320

0.299

0.297

Action

0.653

0.333

0.299

0.302

0.672

0.329

0.291

0.293

Logistic Regression algorithm on TF and TF/IDF bases without preprocessing.

Categories

Parameter

Logistic Regression

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=200, random_state=150, max_depth=100

0.619

0.401

0.33

0.345

0.618

0.401

0.321

0.314

Action

0.70

0.399

0.29

0.310

0.701

0.397

0.299

0.290

 

Random Forest algorithm on TF and TF/IDF bases without preprocessing.

Categories

Parameter

Random Forest

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=400, random_state=250, max_depth=300

0.570

0.343

0.297

0.293

0.581

0.319

0.292

0.291

Action

0.642

0.321

0.291

0.292

0.663

0.316

0.280

0.284

Logistic Regression algorithm on TF and TF/IDF bases after preprocessing.

Categories

Parameter

Logistic Regression

TF

TF-IDF

Accuracy

Precision

Recall

F1 Score

Accuracy

Precision

Recall

F1 Score

Sports

n_estimators=400, random_state=250, max_depth=300

0.561

0.292

0.285

0.290

0.556

0.308

0.272

0.286

Action

0.662

0.303

0.284

0.284

0.637

0.301

0.289

0.281

 

8) Several English misspellings:

- line 81: and tendencies " " along - extra space
- line 90: to explain "this futher", users reviews ... -> further
- line 91: Recent researches "showned"
- line 97: "negetive" rating 
- line 100: of project is possible" ",
- line 111: this may "helps" rectify the fact 
- line 124: Describing "a" regular expressions
- line 200: The result is that store rating "are" not energetic 
- ect.

Response: Yes, we update the manuscript according to format and reviewer comments.

 

Thank you so much for such knowledgeable comments.

All the above changes have been reflected in the manuscript in the green highlighted text (Review Round 2), and the yellow one updates from Review Round 1.

We hope you find the revised manuscript acceptable for publication. Thank you once again for your consideration.

 

 

 

Sincerely,

 

Abdul Karim

PhD Computer Science Student

University Gadjah Mada

Yogyakarta

Indonesia.

 

Corresponding Author & Prove Reading:

Dr. Samir Brahim Belhaouri

Division of Information & Computer Technology,

College of Science & Engineering,

Hamad Bin Khalifa University, Doha, Qatar.

Author Response File: Author Response.docx

Back to TopTop