Next Article in Journal
Thermal-Hydraulic Modeling of Oil-Immersed Motor Pump
Next Article in Special Issue
Attention Block Based on Binary Pooling
Previous Article in Journal
Leveraging Language Models for Inpatient Diagnosis Coding
Previous Article in Special Issue
SP-YOLOv8s: An Improved YOLOv8s Model for Remote Sensing Image Tiny Object Detection
 
 
Article
Peer-Review Record

Marketing Insights from Reviews Using Topic Modeling with BERTopic and Deep Clustering Network

Appl. Sci. 2023, 13(16), 9443; https://doi.org/10.3390/app13169443
by Yusung An, Hayoung Oh * and Joosik Lee
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 4:
Appl. Sci. 2023, 13(16), 9443; https://doi.org/10.3390/app13169443
Submission received: 9 July 2023 / Revised: 17 August 2023 / Accepted: 17 August 2023 / Published: 21 August 2023

Round 1

Reviewer 1 Report

Summary

This research focuses on leveraging clustering algorithms using topic model to automate the extraction of consumer intentions, related products, and the pros and cons of products from review data. To achieve this, a review dataset was created by performing web crawling from the Naver Shopping platform.The findings are expected to contribute to a more precise understanding of consumer sentiments,enabling marketers to make informed decisions across a wide range of products and services.

General Comment

Topic modeling is a natural language processing technique used to extract hidden topics from text data. Text data can contain various topics, and each document may be related to multiple topics. Topic modeling automatically identifies these topics and estimates which topics each document is composed of.This research embed the review data into vectors by utilizing KcBERT, which is a BERT model trained using comments collected from Korean online news.

This paper introduces the process of extracting advantages and disadvantages in detail. In this process, LDA, DCN, HDBSCAN, BERTopic and other methods are compared, and the conclusion is drawn that DCN and BRETopic have better performance. In the process of product correlation analysis,the author use clustering-based topic modeling algorithmsexamine the results obtained using three models:Kmeans, DCN, and BERTopic.

The paper explores the innovative combination and comparison of different topic modeling and dimensionality reduction methods for practical problem-solving. However, the analysis of results is not outstanding, lacks practicality, and fails to convincingly integrate with the title's marketing insights. There is room for improvement in two key aspects.

1. Analysis of Pros and Cons

The comparison of extracted keywords using different methods is not intuitive due to the multitude of words and methods listed in tables. To enhance reader understanding, I suggest separately comparing representative words like "cleaning," "mold," "odor," and "removal" to provide a more direct and comprehensive view of the product characteristics. 

2. Product Analysis

The paper only lists extracted keywords for different topics without detailed explanations. It is advisable to elaborate on the meanings represented by each topic and their practical implications for marketing strategies and product improvements in real-world scenarios. Additionally, you can discuss the wide applicability of this method in market insights and how to implement it effectively.

3. Changes in the Autoencoder

In the original research, the authors ecountered a problem of the trivial solution in the setting using the MNIST benchmark dataset,where all data was assigned to a single label. To address this issue,they reduced the size of the hidden layers in the autoencoder. The reason for modifying the autoencoder is unclear. I recommend providing an explanation of the rationale and benefits behind this change for reader comprehension.

4. The author should cite the following publication: Ying Ji, Yifan Ma. The robust maximum expert consensus model with risk aversion, Information Fusion, 2023, DOI: https://doi.org/10.1016/j.inffus.2023.101866

Finally, the paper have some formatting issues; I list some of them below, together with other specific and editorial comments.

Specific and editorial comments

1. Some formulas and tables are not completely centered; adjustments should be made for better appearance.

2. Table 1's row numbers overlap with the table content; reformatting is needed. The use of commas to separate table content makes it confusing and difficult to read. It is suggested to use row separation for clarity.

3. In Table 2, different methodsresults are separated into rows without explanation of the meaning of each row and differences between them. Please provide detailed clarification or consider modifying the table structure.

 

Need polish.

Author Response

Response to Reviewer 1 Comments 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “ Marketing Insight from Review using Topic Modeling”, control number applsci-2523304. Through the accurate comments made by the reviewers, we better understand the issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions.

Point 1: The comparison of extracted keywords using different methods is not intuitive due to the multitude of words and methods listed in tables. To enhance reader understanding, I suggest separately comparing representative words like "cleaning," "mold," "odor," and "removal" to provide a more direct and comprehensive view of the product characteristics. 

Response 1: Thank you for your good suggestion. We have revised Table 2 to incorporate both the key terms extracted from the analysis of advantages and disadvantages and the model from which these terms were derived. However, we have also retained the original table in the Appendix, as we believe that providing the original keywords extracted from each model can assist in comprehending the subsequent content, as elaborated later.

 

Point 2: The paper only lists extracted keywords for different topics without detailed explanations. It is advisable to elaborate on the meanings represented by each topic and their practical implications for marketing strategies and product improvements in real-world scenarios. Additionally, you can discuss the wide applicability of this method in market insights and how to implement it effectively. 

Response 2: We have additionally expounded upon the meaning of each topic and its practical implications for marketing strategies and real-world product enhancements within the Result and Discussion sections. 

 

Point 3: In the original research, the authors ecountered a problem of the trivial solution in the setting using the MNIST benchmark dataset,where all data was assigned to a single label. To address this issue, they reduced the size of the hidden layers in the autoencoder. The reason for modifying the autoencoder is unclear. I recommend providing an explanation of the rationale and benefits behind this change for reader comprehension. 

Response 3: The reason of reducing the size of hidden layer in the autoencoder is the original value appeared excessively large in comparison to the embedding vector dimension of 768 employed in our experiment. This discrepancy raised concerns akin to the curse of dimensionality, suggesting that the characteristics of each vector were being overlooked, resulting in potential issues. We also wrote this content in experiment section. 

 

Point 4: The author should cite the following publication: Ying Ji, Yifan Ma. The robust maximum expert consensus model with risk aversion, Information Fusion, 2023, DOI: https://doi.org/10.1016/j.inffus.2023.101866 

Response 4: We have incorporated the present paper into the literature review section. 

 

Point 5: 1. Some formulas and tables are not completely centered; adjustments should be made for better appearance. Table 1's row numbers overlap with the table content; reformatting is needed. The use of commas to separate table content makes it confusing and difficult to read. It is suggested to use row separation for clarity. In Table 2, different methods’ results are separated into rows without explanation of the meaning of each row and differences between them. Please provide detailed clarification or consider modifying the table structure. 

Response 5: All the formatting issues mentioned have been rectified. Table 1 has been reformatted to graph, and all instances of comma usage within the tables have been addressed. Furthermore, the relocated original Table 2 in the Appendix has been augmented with detailed explanations.

 

+ We also tried to improve the inadequate English expressions in our paper through professional English editing services. 

Reviewer 2 Report

·        The paper clearly states the method of extracting and analyzing  marketing related data.  

·        Certain typo errors could have been avoided. Eg. Figure 1 in Figure 1. Process of extracting related products, which is inconsistent with the term Fig 1 used in the content.  Referencing  Figure 2, wrongly mentioned as Figure 1 and also in the related content as Fig 2.

·        The resultant table  could have been depicted with two-dimensional graph plotting for better understanding and clarity.

·        References have been justified and are relevantly quoted. Certain older references (eg. More than six years) could have been avoided. 

·        The paper could have been even more research oriented by mentioning the  research gap and the related future works in the conclusion.

Author Response

Response to Reviewer 2 Comments 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Marketing Insight from Review using Topic Modeling”, control number applsci-2523304. Through the accurate comments made by the reviewers, we better understand the issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions.


Point 1. Certain typo errors could have been avoided. Eg. Figure 1 in Figure 1. Process of extracting related products, which is inconsistent with the term Fig 1 used in the content.  Referencing  Figure 2, wrongly mentioned as Figure 1 and also in the related content as Fig 2.

Response 1. We corrected all typo error which you mentioned.


Point 2. The resultant table could have been depicted with two-dimensional graph plotting for better understanding and clarity.

Response 2. We changed Table 1 (topic modeling evaluation) to two-dimensional for better understanding and clarity. Thank you for your good suggestion.


Point 3. References have been justified and are relevantly quoted. Certain older references (eg. More than six years) could have been avoided.

Response 3. We agree with you and have incorporated this suggestion throughout our paper. We have conducted further investigations into the relevant recent research and made efforts to incorporate these findings into the paper to reflect more current research trends. However, a few older references, despite being published earlier, have been included to explain concepts or methods that are still widely used up to the present.

Point 4. The paper could have been even more research oriented by mentioning the research gap and the related future works in the conclusion.

Response 4. We created a Discussion section and added content about research gaps, potential applications, and more to both the Discussion and Conclusion sections. Thank you for your good suggestion.

Reviewer 3 Report

1. I recommend the creation of a separate Literature review section in which the opinions expressed by other authors who have published works relevant to the field addressed by the authors in the present paper are presented, revealing the current state of research in this field and at the same time representing a point of start in the research activity undertaken by the authors.

2. In order for the results of the study carried out by the authors to be as credible as possible, they should focus their efforts on collecting as large a volume of data as possible, on refining and structuring them, and on arranging them within volumes of data on which to be able to apply analysis methods such as big data analytics, qualitative and quantitative statistical analysis of data, with the help of appropriate software tools such as: SPSS, STATA, R language, etc.

3. I strongly believe that it is absolutely necessary to create a Results section in which the results of the experiments, analyzes and other data processing methods carried out by the authors are presented in detail, insisting on those results which are much more expository. In addition to the fact that this section would increase the interest of potential readers, it would also represent a real operational and scientific context for rethinking, reformulating and restructuring the Conclusions section.

4. I recommend redoing the References section to include, mandatorily, a majority of works that have been published in the last five years.

========================

========================

 

1. The main idea of the paper is related to the collection of data related to the opinions expressed by consumers on electronic commerce platforms, the processing of these data and their processing, with certain tools, in order to substantiate marketing decisions. Moreover, I recommend setting some hypotheses of the research, to be followed and all the actions undertaken by the authors to aim at verifying and validating the manner and degree of their fulfillment.

2. The subject of the research is not original, and in this sense in the specialized literature there are many works in which the opinions of other authors regarding this subject are exposed. In order to become relevant, the authors of the paper should present an innovative method of storing, processing and processing data from the information collected from consumers, through electronic commerce platforms. When I say this, I mean the creation of a database in which these data are stored, in an organized form, specific to these data collections and which later allows their processing with specific tools, such as big data analytics, data mining, SPSS, STATA, etc.

3. In its current version, the work does not add a significant scientific value to the field addressed. But with the proposed changes the scientific value of the work would increase and become interesting for readers.

4. Authors should focus their research efforts on how to collect, store, organize data from consumer opinions. In addition, the methods of processing these data should be improved and expanded in terms of the use of tools specific to Data Science methods, especially oriented towards methods of quantitative and qualitative statistical analysis of data that are carried out with appropriate software tools: SPSS, STATA, Python, R language, etc. These tools allow much more complex processing, are efficient in terms of the volume of data processed and the time required for processing, and allow providing results that are much more expository and easier to understand and interpret for readers.

5. In its current form, the conclusions presented by the authors are devoid of scientific relevance and do not provide a fair perspective on how the authors fulfilled the research objectives (they verified and validated the study's hypotheses). In this sense, I recommend creating a section of Results and discussions in which the results obtained by the authors following the specific research actions will be presented in detail. This section would provide a real foundation for the conclusions section and would significantly increase the scientific value of the paper and, by default, the interest of the readers.

6. The bibliographic references are not adequate. The papers presented are out of date and do not reflect the latest opinions expressed by authors who have published papers relevant to the field addressed. In this sense, I recommend rebuilding the bibliographic references section and including within it some significant works for the field, which in the vast majority have been published in the last five years.

Author Response

Response to Reviewer 3 Comments 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Marketing Insight from Review using Topic Modeling”, control number applsci-2523304. Through the accurate comments made by the reviewers, we better understand the issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions.


Point 1. I recommend the creation of a separate Literature review section in which the opinions expressed by other authors who have published works relevant to the field addressed by the authors in the present paper are presented, revealing the current state of research in this field and at the same time representing a point of start in the research activity undertaken by the authors.

Response 1. We created the Literature Review, and made an effort to investigate recent research papers in the relevant field to incorporate the current research trends.


Point 2. Authors should focus their research efforts on how to collect, store, organize data from consumer opinions. In addition, the methods of processing these data should be improved and expanded in terms of the use of tools specific to Data Science methods, especially oriented towards methods of quantitative and qualitative statistical analysis of data that are carried out with appropriate software tools: SPSS, STATA, Python, R language, etc. These tools allow much more complex processing, are efficient in terms of the volume of data processed and the time required for processing, and allow providing results that are much more expository and easier to understand and interpret for readers.

Response 2. We utilized Python to access the review API of the 'Naver Shopping' platform and collected approximately 500,000 consumer reviews. Subsequently, we conducted data preprocessing using tools such as pandas and the MeCab morphological analyzer. Data crawling, implementation and experimentation of all models used Python. The basic BERTopic was utilized using the provided library, while the DCN-based model was adapted from the official implementation code by the DCN researchers to suit the data of this study. The Kmeans algorithm employed the code from the sklearn library, and LDA as well as the evaluation metric calculations were performed using OCTIS. We also added the detailed information about these studies and experiments to the Method section. Additionally, to provide fundamental insights into the review data itself, we analyzed the distribution of ratings, the frequency of each morpheme's appearance, and visualized the results using pandas and matplotlib. These findings were incorporated into the Method section. Thank you for your comment.


Point 3. I strongly believe that it is absolutely necessary to create a Results section in which the results of the experiments, analyzes and other data processing methods carried out by the authors are presented in detail, insisting on those results which are much more expository. In addition to the fact that this section would increase the interest of potential readers, it would also represent a real operational and scientific context for rethinking, reformulating and restructuring the Conclusions section.

Response 3. As suggested, we have created the Results section and made an effort to provide a more detailed description of the experimental process and outcomes.


Point 4. The subject of the research is not original, and in this sense in the specialized literature there are many works in which the opinions of other authors regarding this subject are exposed. In order to become relevant, the authors of the paper should present an innovative method of storing, processing and processing data from the information collected from consumers, through electronic commerce platforms. When I say this, I mean the creation of a database in which these data are stored, in an organized form, specific to these data collections and which later allows their processing with specific tools, such as big data analytics, data mining, SPSS, STATA, etc.

Response 4. We acknowledge that the subject of our research may seem familiar within the specialized literature, as various works have explored opinions from different authors on this topic. Utilizing topic modeling in marketing is already a widely adopted approach. However, the distinctive features of our study are as follows: 1) We conducted experiments by applying a recently introduced neural network clustering algorithm like DCN to BERTopic, and 2) We employed a clustering-based topic modeling technique to connect clustered products with topics and extract related products based on user perceptions. Throughout this process, we employed Python for data collection, model implementation, training, and analysis. We have further enhanced these contents and added them to the paper. I appreciate your feedback.


Point 5. In its current form, the conclusions presented by the authors are devoid of scientific relevance and do not provide a fair perspective on how the authors fulfilled the research objectives (they verified and validated the study's hypotheses). In this sense, I recommend creating a section of Results and discussions in which the results obtained by the authors following the specific research actions will be presented in detail. This section would provide a real foundation for the conclusions section and would significantly increase the scientific value of the paper and, by default, the interest of the readers.

Response 5. As suggested, we have created the Results section and the Discussion section. We have also added content regarding the objectives and significance of our study, practical application approaches, and research gaps.


Point 6. The bibliographic references are not adequate. The papers presented are out of date and do not reflect the latest opinions expressed by authors who have published papers relevant to the field addressed. In this sense, I recommend rebuilding the bibliographic references section and including within it some significant works for the field, which in the vast majority have been published in the last five years.

Response 6. We agree with you and have incorporated this suggestion throughout our paper. We have conducted further investigations into the relevant recent research and made efforts to incorporate these findings into the paper to reflect more current research trends. However, a few older references, have been included to explain concepts or methods that are still widely used up to the present, despite being published earlier. Thank you for providing these insights.

Reviewer 4 Report

General considerations

A classification model has been used to classify reviews about products and services. In my opinion, the topic is of interest, the work has been developed correctly and the results obtained are applicable. On the other hand, I think that the writing of the paper should be improved, and some minor doubts should also be clarified. For this reason, I suggest publication after major revision (rewrite the introduction).

 

Specific considerations

1.-In general, in the introduction is where the topic is identified, the context is provided, the related research is explained and the objectives are established. In my opinion, this is what is presented in sections 1, 2 and 3 (Introduction, Cluster Algorithm, Topic Modelling, TF-IDF and Related Research). For this reason, I think the introduction should be rewritten (unify the mentioned sections, remove the repetition of objectives and renumber the sections).

2.- Three statistics (NPMI, TD and WE) have been used to evaluate the performance of the models. Given that these statistics provide complementary information, it would be desirable to discuss what each indicator provides and what its target value is.

 

3.- In my opinion, the work would improve if a discussion was added in relation to the application of the results obtained (Section 5).

Author Response

Response to Reviewer 4 Comments 

We are sincerely grateful for your thorough consideration and scrutiny of our manuscript, “Marketing Insight from Review using Topic Modeling”, control number applsci-2523304. Through the accurate comments made by the reviewers, we better understand the issues in this paper. We have revised the manuscript according to the Reviewer’s suggestions.


Point 1. In general, in the introduction is where the topic is identified, the context is provided, the related research is explained and the objectives are established. In my opinion, this is what is presented in sections 1, 2 and 3 (Introduction, Cluster Algorithm, Topic Modelling, TF-IDF and Related Research). For this reason, I think the introduction should be rewritten (unify the mentioned sections, remove the repetition of objectives and renumber the sections).

Response 1. As suggested, We have combined the content of Sections 1, 2, and 3 to revise the introduction in a way that better reflects the research's topic, context, and objectives. Instead, I have retained only a basic explanation of the algorithm in Section 2, while I have restructured Section 3 into a Literature Review section to provide information on the latest trends in the relevant field.


Point 2. Three statistics (NPMI, TD and WE) have been used to evaluate the performance of the models. Given that these statistics provide complementary information, it would be desirable to discuss what each indicator provides and what its target value is.

Response 2. NPMI and WE are types of metrics for Topic Coherence, which measure how closely related the keywords extracted within a single topic are in terms of meaning. Topic Diversity, on the other hand, assesses how distinct different topics are from each other. Both metrics are commonly employed to evaluate the performance of topic modeling. More detailed explanations and interpretations regarding these metrics have been added to the Results section.


Point 3. In my opinion, the work would improve if a discussion was added in relation to the application of the results obtained (Section 5).

Response 3. As suggested, We created the Discussion section and included additional content such as insight, real-world applications of the results and research gaps.

Round 2

Reviewer 1 Report

The reference with Korea  lanague should be repleaced with English.

Good

Author Response

Response to Reviewer 1 Comments

The references with Korean language have been replaced with English.

We sincerely appreciate your thorough review of our manuscript and your valuable feedback. Your insights have undoubtedly contributed to enhancing the scientific quality of our paper.

Once again, thank you for your time, effort, and constructive feedback.

Reviewer 3 Report

Although the authors did not fully implement the changes and improvements that I recommended in the first review, I believe that the scientific quality of the paper has improved and can be published in its current form.

Author Response

Response to Reviewer 3 Comments

We sincerely appreciate your thorough review of our manuscript and your valuable feedback. Your insights have undoubtedly contributed to enhancing the scientific quality of our paper.

Once again, thank you for your time, effort, and constructive feedback.

Reviewer 4 Report

The manuscript has been improved. In my opinion, it is acceptable for publication

Author Response

Response to Reviewer 4 Comments

We sincerely appreciate your thorough review of our manuscript and your valuable feedback. Your insights have undoubtedly contributed to enhancing the scientific quality of our paper.

Once again, thank you for your time, effort, and constructive feedback.

Back to TopTop