Next Article in Journal
In-Memory Caching for Enhancing Subgraph Accessibility
Previous Article in Journal
Laboratory Investigation of the Temperature-Dependent Mechanical Properties of a CRTS-Ⅱ Ballastless Track-Bridge Structural System in Summer
 
 
Article
Peer-Review Record

A Novel Statistic-Based Corpus Machine Processing Approach to Refine a Big Textual Data: An ESP Case of COVID-19 News Reports

Appl. Sci. 2020, 10(16), 5505; https://doi.org/10.3390/app10165505
by Liang-Ching Chen 1,2, Kuei-Hu Chang 3,4,* and Hsiang-Yu Chung 3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2020, 10(16), 5505; https://doi.org/10.3390/app10165505
Submission received: 14 July 2020 / Revised: 2 August 2020 / Accepted: 7 August 2020 / Published: 9 August 2020

Round 1

Reviewer 1 Report

The article is providing a well-detailed study on statistic-based corpus machine processing approach for big text data. The characteristics and advantages of the proposed solution are discussed and argumented.
The only issue that needs additional work in chapter 1, is relating more the Industry 4.0 concept to the current study, respectively providing some more details of the ideas and the domains in the Industry 4.0 direction. The Industry 4.0 concept is not well fundamented in relation to the current work and the case study. Some additional references should be considered:
(e.g. Nicolae, A.; Korodi, A.; Silea, I. Identifying Data Dependencies as First Step to Obtain a Proactive Historian: Test Scenario in the Water Industry 4.0. Water 2019, 11, 1144.
Sung, S.-I.; Kim, Y.-S.; Kim, H.-S. Study on Reverse Logistics Focused on Developing the Collection Signal Algorithm Based on the Sensor Data and the Concept of Industry 4.0. Appl. Sci. 2020, 10, 5016.)

Author Response

I sincerely appreciate that editor and associate editor of Applied Sciences and anonymous referees spend lots of time on reviewing my manuscript. We have incorporated reviewers’ comments into our revision (highlighted in blue). Our point-by-point responses are listed below.

 

Responses to Reviewer #1

(1) Q: The only issue that needs additional work in chapter 1, is relating more the Industry 4.0 concept to the current study, respectively providing some more details of the ideas and the domains in the Industry 4.0 direction. The Industry 4.0 concept is not well fundamented in relation to the current work and the case study. Some additional references should be considered:

      (e.g. Nicolae, A.; Korodi, A.; Silea, I. Identifying Data Dependencies as First Step to Obtain a Proactive Historian: Test Scenario in the Water Industry 4.0. Water 2019, 11, 1144.

      Sung, S.-I.; Kim, Y.-S.; Kim, H.-S. Study on Reverse Logistics Focused on Developing the Collection Signal Algorithm Based on the Sensor Data and the Concept of Industry 4.0. Appl. Sci. 2020, 10, 5016.)

A: Thank you for pointing out this. As suggested, we have updated the references to improve its overall readability. These journal papers are cited in this paper. Please refer to p. 1-2 of the revised paper.

    For example, Nicolae et al. [4] discovered current systems based on the Industrial Internet of Things (IIoT) concept to process the data of water industries, drinking water treatment plant (DWTP), seem inadequate intelligence to reduce cost and to be utilized in quality controls. Thus, in their research, the developed algorithm dedicated to make systems smarter and more comprehendible in processing the data. Moreover, Sung et al. [5] created the collection algorithm via utilizing a collection box; their invented algorithm that based on an experimental design method embedding multiple sensors was able to be utilized in handling the current collection problems and decreasing logistics costs.

Author Response File: Author Response.pdf

Reviewer 2 Report

This paper proposed statistics-based corpus machine processing technique for COVID-19; that is, the technique tried to eliminate function words or meaningless words by machine. In the paper, various corpus-based approaches have been described and some of them were compared with the proposed technique in detail.

However, I come to know that the proposed technique showed an improvement, but were not enough to judge that it is superior. The approach described in Figure 3 is quite similar to the existing technique, although the expression is different. Thus,  I think that it is necessary for the authors to further highlight the performance or the differentiation of the proposed technique; Table 9 alone seems to be not enough. 

Author Response

I sincerely appreciate that editor and associate editor of Applied Sciences and anonymous referees spend lots of time on reviewing my manuscript. We have incorporated reviewers’ comments into our revision (highlighted in blue). Our point-by-point responses are listed below.

 

Responses to Reviewer #2

(1) Q: This paper proposed statistics-based corpus machine processing technique for COVID-19; that is, the technique tried to eliminate function words or meaningless words by machine. In the paper, various corpus-based approaches have been described and some of them were compared with the proposed technique in detail.

However, I come to know that the proposed technique showed an improvement, but were not enough to judge that it is superior. The approach described in Figure 3 is quite similar to the existing technique, although the expression is different. Thus, I think that it is necessary for the authors to further highlight the performance or the differentiation of the proposed technique; Table 9 alone seems to be not enough.

A: Thank you for pointing out this. We have added and reformed highlights of the performances by comparing with three listing methods, presenting data discrepancy, and demonstrating knowledge extraction from the big textual data (appears in subsection 4.3 of revised paper).

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper proposes a novel statistic-based corpus machine processing approach to refining news reports related to coronavirus. The presented research is interesting and relevant. My detailed comments are presented below.

I found myself with some difficulties in understanding some of the emphasis that was put on some keywords, as I discuss below.

First, the paper dedicates considerable space in emphasizing the importance of the coronavirus. While the virus case is indeed relevant, I believe the emphasis was exaggerated. Instead of some of this content, I suggest that authors also address concepts that are essential to better grasp the importance of the research, and a fortiori of its application to the coronavirus. Thus concepts such as English for specific purposes, corpus analysis, and so on.

Furthermore, I believe that the possibility of applying this work to other cases is very important. The authors briefly mention this in the last paragraph. I would have preferred to see this discussed with more details. Perhaps throughout the paper, highlighting why the approach that was followed would allow this to be true. This would be very interesting.

Second, from my perspective, the contributes of this work to the section of the journal that it was submitted to are not clear. This should be more evident to the reader both in the introduction, serving as a more efficient motivation, and in the conclusions.

Lastly, the authors also extensively use the term big data, and big text data. However, and to the best of my understanding of the work the data, the technologies, and the approaches that were considered in this research do not fit the concept of Big Data. What was the total volume? Does it consider streaming data? Of what volume? And I am excluding the type of data, as it seems that only text data is considered. Does the volume of data have such a volume that traditional tools cannot handle it efficiently, hence the need for Big Data concepts and technologies? From what was presented, I believe the answer is no. Therefore, I have difficulties in understanding the emphasis that was put on the term big data. In my opinion, it is inappropriate. However, if I got any of this wrong, then I believe that additional details should be included in the manuscript.

Regarding presentation layout and language, I believe some minor corrections are required and that proofreading and editing are required, as I was able to identify some sentences that are not clear in conveying the message to the reader.

Finally, I would like to stress what in my opinion is a strong positive point of this work, which is presented in table 9. The authors presented a comparison to the established state of the art methods. The technical obtained results also seem to be sound.

Author Response

I sincerely appreciate that editor and associate editor of Applied Sciences and anonymous referees spend lots of time on reviewing my manuscript. We have incorporated reviewers’ comments into our revision (highlighted in blue). Our point-by-point responses are listed below.

 

Responses to Reviewer #3

(1) Q: First, the paper dedicates considerable space in emphasizing the importance of the coronavirus. While the virus case is indeed relevant, I believe the emphasis was exaggerated. Instead of some of this content, I suggest that authors also address concepts that are essential to better grasp the importance of the research, and a fortiori of its application to the coronavirus. Thus concepts such as English for specific purposes, corpus analysis, and so on.

A: Thank you for pointing out this. We have carefully revised the paper to increase the standard of our paper. Please refer to p. 1 and p. 15-18 of the revised paper.

(2) Q: Furthermore, I believe that the possibility of applying this work to other cases is very important. The authors briefly mention this in the last paragraph. I would have preferred to see this discussed with more details. Perhaps throughout the paper, highlighting why the approach that was followed would allow this to be true. This would be very interesting.

A: Thank you for pointing out this. We have added the future research directions to increase the standard of our paper. Please refer to p. 19 of the revised paper.

   In the future, the proposed approach can be widely adopted to optimize corpus analysis results or to enhance the efficiency of corpus-based approaches, especially in cases of extracting domain-oriented lexical units. This paper’s research results from linguistic angles would provide valuable English linguistic patterns that can be utilized in ICT fields of machine learning, machine translation, deep learning, NLP, AI, and more. In addition, techniques used in streaming data are also important future development indicators to effectively and rapidly intercept the latest data (especially in the case of COVID-19) to timely expand and update big textual database. It will allow corpus-based approaches in big data analysis accompanied by more accurate, efficient, and up-to-date analytical results.

(3) Q: Second, from my perspective, the contributes of this work to the section of the journal that it was submitted to are not clear. This should be more evident to the reader both in the introduction, serving as a more efficient motivation, and in the conclusions. 

A: Thank you for pointing out this. As suggested, we have rewritten the Abstract and Conclusion sections to increase the standards of our paper. Please refer to p. 1 and p. 19 of the revised paper.

(4) Q: Lastly, the authors also extensively use the term big data, and big text data. However, and to the best of my understanding of the work the data, the technologies, and the approaches that were considered in this research do not fit the concept of Big Data. What was the total volume? Does it consider streaming data? Of what volume? And I am excluding the type of data, as it seems that only text data is considered. Does the volume of data have such a volume that traditional tools cannot handle it efficiently, hence the need for Big Data concepts and technologies? From what was presented, I believe the answer is no. Therefore, I have difficulties in understanding the emphasis that was put on the term big data. In my opinion, it is inappropriate. However, if I got any of this wrong, then I believe that additional details should be included in the manuscript.

A: Thank you for pointing out this. Big data computing can be generally divided into two types based on the processing requirements, which are big data batch computing and big data stream computing (Holmlund et al., 2020; Balakrishna et al., 2020). In our research, we focused on big data batch computing because we collected news reports related to COVID-19 from December 2019 to April 2020 as a big data batch (i.e. the corpora data) for shedding lights on refining big textual data. The connections between big textual data and big data can also be found in Li et al. (2019) research. Nevertheless, we were inspired from your valuable feedbacks that COVID-19 is still ongoing. In order to ensure COVID-19 information can be collected rapidly and be updated continuously, we have listed streaming data as important future research directions to face a large amount of COVID-19 information streaming (appears in section 5 of revised paper).

Holmlund, M.; Van Vaerenbergh, Y.; Ciuchita, R.; Ravald, A.; Sarantopoulos, P.; Ordenes, F.V.; Zaki, M. Customer experience management in the age of big data analytics: A strategic framework. J. Bus. Res. 2020, 116, 356-365.

Balakrishna, S.; Thirumaran, M.; Solanki, V.K.; Nunez-Valdez, E.R. Incremental Hierarchical Clustering driven Automatic Annotations for Unifying IoT Streaming Data. Int. J. Interact. Multimed. Artif. Intell. 2020, 6(2), 56-70.

Li, Q.; Li, S.B.; Zhang, S.; Hu, J.; Hu, J.J. A Review of Text Corpus-Based Tourism Big Data Mining. Appl. Sci.-Basel 2019, 9(16), Article Number: 3300.

(5) Q: Regarding presentation layout and language, I believe some minor corrections are required and that proofreading and editing are required, as I was able to identify some sentences that are not clear in conveying the message to the reader.

A: Thank you for pointing out this. We have rechecked the grammatical usages, words spellings, semantic usages, to improve its overall readability. Moreover, we also request professional editorial for the assistant services.

(6) Q: Finally, I would like to stress what in my opinion is a strong positive point of this work, which is presented in table 9. The authors presented a comparison to the established state of the art methods. The technical obtained results also seem to be sound.

A: Thank you for your positive evaluation. We have carefully revised the paper to increase the standard of our paper. Please refer to p. 15-18 of the revised paper.

 

Author Response File: Author Response.pdf

Back to TopTop