Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Improved Text Summarization of News Articles Using GA-HC and PSO-HC

Appl. Sci. 2021, 11(22), 10511; https://doi.org/10.3390/app112210511

by Muhammad Mohsin¹, Shazad Latif¹, Muhammad Haneef², Usman Tariq³

, Muhammad Attique Khan^4,*

, Sefedine Kadry⁵

, Hwan-Seung Yong⁶ and Jung-In Choi⁷

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2021, 11(22), 10511; https://doi.org/10.3390/app112210511

Submission received: 15 September 2021 / Revised: 3 November 2021 / Accepted: 5 November 2021 / Published: 9 November 2021

(This article belongs to the Special Issue Text and Data Mining (TDM) Techniques for Personalized Services and Their Policy)

Round 1

Reviewer 1 Report

This manuscript presents two hierarchical clustering-based models for multi-document summarization. The proposed pipeline is classical and conventional, where the big vectors are created to represent sentences for further optimized ranking. Overall, the manuscript is well-structured and easy to follow; however, the content is not strong enough and the conclusion cannot be justified on the basis of the manuscript. I will not make it to the peer-review process.

[MAJOR]

Regarding multi-document summarization, rule-based methods have been developing for many years. In the most recent 5 years, machine learning / deep learning-based approaches have made greater processes in this area, compared with the traditional ones. Focusing on this paper, either GA or PSO is not a novel technique. Therefore, the entire paper lacks novelty so it might not be interesting to readers.

Regarding methodology, the proposed model seems not using machine learning techniques. Therefore, more evidence should be provided to prove the generality or robustness of the rule-based method. The reason is that it is not complicated to design a method that can reach a higher evaluation metric compared with other baselines. Therefore, it is not sufficient to utilize just one dataset (DUC 2007) for experiments. I would recommend using more relevant datasets to make more convincing experiment results.

Regarding evaluation, I wonder if it is feasible to compare the proposed model with some recent models instead of the ones in the paper (PKUSUMSUM-2016, TextRank-2004, OPINIOSIS-2010). Though the semantic text summarizer was published in 2020, it is not the state-of-the-art approach for a strong baseline comparison. Further, it has been proved that the ROUGE score cannot effectively measure the quality of automatically generated summaries. Therefore, it is also necessary to conduct qualitative evaluation through humans.

[MODERATE]

The related work section is not clear. It is like a mixture of various types of summarization (single/multi-document, abstractive/extractive). It would be better to use subsections to make a more clear literature review. In addition, since the manuscript presents a multi-document summarization approach, more content should be related to this area. The authors can also present the highlights of their approach compared with others.

More details should be added in the section "Big Vectors". I am confused about the entire section. Also, algorithm 1 mentioned in this section points to the Agglomerative clustering, which might not be correct.

[MINOR]

Please double-check the English writing since there are a lot of grammar or spelling errors.

Please increase the font size in Figure 3 to make it more readable.

Table 2 and Figures 4-6 describe the same thing. Please remove one of them.

Author Response

Response Sheet

Reviewer 1

Comments	Response
Major
Regarding multi-document summarization, rule-based methods have been developing for many years. In the most recent 5 years, machine learning / deep learning-based approaches have made greater processes in this area, compared with the traditional ones. Focusing on this paper, either GA or PSO is not a novel technique. Therefore, the entire paper lacks novelty so it might not be interesting to readers.	Precise automatic text summarization is considered as non-convex and NP hard problem. Meta-heuristic approaches are good to deal with non-convex and NP hard problem. Therefore modified GA and PSO are sued to solve this problem.
Regarding methodology, the proposed model seems not using machine learning techniques. Therefore, more evidence should be provided to prove the generality or robustness of the rule-based method. The reason is that it is not complicated to design a method that can reach a higher evaluation metric compared with other baselines. Therefore, it is not sufficient to utilize just one dataset (DUC 2007) for experiments. I would recommend using more relevant datasets to make more convincing experiment results.	CNN and Daily mail dataset also included in revised paper and simulations are conducted over this dataset and compared the results among various algorithms.
Regarding evaluation, I wonder if it is feasible to compare the proposed model with instead of the ones in the paper (PKUSUMSUM-2016, TextRank-2004, OPINIOSIS-2010). Though the semantic text summarizer was published in 2020, it is not the state-of-the-art approach for a strong baseline comparison. Further, it has been proved that the ROUGE score cannot effectively measure the quality of automatically generated summaries. Therefore, it is also necessary to conduct qualitative evaluation through humans.	The ROUGE is a standard evaluation criterion for text summarization domain. Human based evaluation can vary person to person so cannot set as a standard. In addition, a recent relevant paper of [2021] is also included for comparison in revised paper.
Moderator
The related work section is not clear. It is like a mixture of various types of summarization (single/multi-document, abstractive/extractive). It would be better to use subsections to make a more clear literature review. In addition, since the manuscript presents a multi-document summarization approach, more content should be related to this area. The authors can also present the highlights of their approach compared with others.	Literature review is improved. However, it is difficult to segregate with single/multiple document and abstractive and extractive feature selection. For example, single document summary work can utilize both abstractive and extractive techniques and vice versa.
More details should be added in the section "Big Vectors". I am confused about the entire section. Also, algorithm 1 mentioned in this section points to the Agglomerative clustering, which might not be correct.	More Big vector detail is included in revised paper. Agglomerative clustering also known as bottom up approach is one of the standard clustering approach and considered in this paper.
Minor
Please double-check the English writing since there are a lot of grammar or spelling errors.	Write up is thoroughly checked and improved in revised document.
Please increase the font size in Figure 3 to make it more readable.	Figure 3 font size is increased now
Table 2 and Figures 4-6 describe the same thing. Please remove one of them.	Table 2 is removed now.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper proposed two automatic text summarization models, which are Genetic Algorithm with Hierarchical Clustering (GA-HC) and Particle Swarm Optimization with Hierarchical Clustering(PSO-HC). The proposed models are compared with several existing models, such as PKUSUMSUM, TextRank, and OPINOSIS on the DUC 2007 dataset. The results show the proposed models outperform the compared approaches. The contribution of this paper is obvious. However, it is not well written and should be substantially improved. My detailed comments are as follows:

1. In the introduction section, there are a lack of references. The authors made many statements but with very few references. Two of them are automatic text summarization (ATS) and Abstractive text summarization. This should be improved.

2. The authors only briefly introduce the text summarization techniques in the introduction section. More important content should be the research questions, the proposed solution/approach, and the novelty and contribution of the study, which however are missing. The introduction section should be extensively improved.

3. In the related works, the authors just simply listed the existing text summarization techniques, without any logic among these techniques. It would be better if these techniques can be at least categorized.

4. It would be better to use some examples to explain each step of the proposed model. Otherwise, it is difficult to understand. What is the purpose of clustering? Only the classic GA algorithm is introduced. It is unclear how the GA algorithm is used in the proposed approach. Figure 2 is missing.

5. In the evaluation part, the authors just presented the final result without any further analysis of the results of their proposed models as well as the compared approaches. Furthermore, some real examples are also helpful for readers to understand the results. Figure 3 is blurred with a low resolution.

6. The paper has a lot of basic grammar errors, which should be carefully checked all through the paper. Just name a few.

-Different kinds of text summarization techniques have been proposed in the literature which are;

- There are two kinds for summarization techniques, one is extractive text summarization approach.

- After assigning weightage to the uses k-means clustering algorithm for generating clusters and then mines frequent itemsets using Apriori Algorithm; for identification and selection of most important sentences. Finally, selects top sentence with high score and generate summary.

- Two techniques studied in [14].

Author Response

Reviewer 2
1. In the introduction section, there are a lack of references. The authors made many statements but with very few references. Two of them are automatic text summarization (ATS) and Abstractive text summarization. This should be improved.	Introduction section is revised now.
2. The authors only briefly introduce the text summarization techniques in the introduction section. More important content should be the research questions, the proposed solution/approach, and the novelty and contribution of the study, which however are missing. The introduction section should be extensively improved.	Introduction section is revised now.
3. In the related works, the authors just simply listed the existing text summarization techniques, without any logic among these techniques. It would be better if these techniques can be at least categorized.	Literature review is revised.
4. It would be better to use some examples to explain each step of the proposed model. Otherwise, it is difficult to understand. What is the purpose of clustering? Only the classic GA algorithm is introduced. It is unclear how the GA algorithm is used in the proposed approach. Figure 2 is missing.	Modified non sorting genetic algorithm technique is applied and explained in proposed model section. Figures numbering is revised now.
5. In the evaluation part, the authors just presented the final result without any further analysis of the results of their proposed models as well as the compared approaches. Furthermore, some real examples are also helpful for readers to understand the results. Figure 3 is blurred with a low resolution.	Figure 2 resolution is enhanced now. Result discussion is also revised.
6.The paper has a lot of basic grammar errors, which should be carefully checked all through the paper. Just name a few. -Different kinds of text summarization techniques have been proposed in the literature which are; - There are two kinds for summarization techniques, one is extractive text summarization approach. - After assigning weightage to the uses k-means clustering algorithm for generating clusters and then mines frequent itemsets using Apriori Algorithm; for identification and selection of most important sentences. Finally, selects top sentence with high score and generate summary. - Two techniques studied in [14].	The paper write is thoroughly revised now.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The revised version of the manuscript has solved some of my concerns, including the experiment datasets, literature review, Big vector, and other minor errors.

The rest of my concerns are listed below:

1. The current related work must be reorganized and improved. It looks messy with a lack of logicality. I still recommend applying some categories to make it clear. For example, based on the type of input, summarization is of two types: Single Document Summarization (SDS) and Multi-Document Summarization (MDS). One way of performing either SDS or MDS is through extractive summarization...

2. Please be very careful with the "proposed model" section, since most stages and methods are quite similar to [25]. However, there is no citation at the beginning of this section. I would recommend citing [25] at the very beginning and deeply checking and rephrasing the content (including equations).

3. It would be better to have a native English speaker revise the entire manuscript since there are still many errors, uncommon expressions, and phrases.

4. Regarding human evaluation, I will not consider it in this submission but would like to state that it can further strengthen model performance. It is feasible to measure summaries from multiple aspects including readability, correctness, completeness, and compactness. Also, human bias could be alleviated through crowdsourcing or other techniques.

Author Response

Response sheet is attached. thanks

Author Response File: Author Response.pdf

Reviewer 2 Report

The authors have solved most of my concerns except the English grammar errors. I can still find many such errors. Extensive editing of English language and style are required.

Author Response

Response sheet is attached

Author Response File: Author Response.pdf

Article Menu

Improved Text Summarization of News Articles Using GA-HC and PSO-HC

Further Information

Guidelines

MDPI Initiatives

Follow MDPI