Next Article in Journal
Experimental Study on the Reciprocating Shear Characteristics and Strength Deterioration of Argillaceous Siltstone Rockfill Materials
Previous Article in Journal
Special Issue on Prevention and Treatment of Medical Diseases in Vulnerable Populations
 
 
Article
Peer-Review Record

Research on Coal and Gangue Recognition Model Based on CAM-Hardswish with EfficientNetV2

Appl. Sci. 2023, 13(15), 8887; https://doi.org/10.3390/app13158887
by Na Li *, Jiameng Xue *, Sibo Wu, Kunde Qin and Na Liu
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Appl. Sci. 2023, 13(15), 8887; https://doi.org/10.3390/app13158887
Submission received: 8 June 2023 / Revised: 25 July 2023 / Accepted: 30 July 2023 / Published: 2 August 2023
(This article belongs to the Section Computing and Artificial Intelligence)

Round 1

Reviewer 1 Report

This manuscript proposes an improved EfficientNetV2 network for coal and gangue recognition. The improved EfficientNetV2 coal and gangue recognition method is easy to train, has fast con-vergence and training speeds, and thus achieves high recognition accuracy under insufficient dataset conditions. However, I have the following concerns regarding this paper:

(1) The spelling error in the title of Figure 5 does not match the explanation in the text.

(2) The inference time (or inference speed) and FPS (number of inference images per second) are repetitive.

(3) There is an issue with the number of datasets: "After performing data enhancement and other operations on the raw dataset, a total of 6272 coal and gangue images were obtained, comprising 3223 coal images and 3048 gangue images."

(4) The naming of models in Figure 9 and Table 3 is confusing. The abscissa in Figure 9 represents epochs, while the text refers to iterations.

(5) The mainstream models presented in Table 5 are not introduced in the introduction, and the comparative data lacks training time.

(6) Table 3 is missing the recall, F1 Score, training time, and inference time, while Table 4 lacks the recall and F1 Score.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 2 Report

Through detailed experiments, model building, data collection and data analysis, this paper introduces a neural network with high efficiency and high accuracy.The author modified the attention mechanism module in the original model, adopted CAM attention mechanism module, selected Hardswish activation function, updated the Block structure in the network, and added optimization hyperparameters by selecting the network structure to improve the training speed of the network while maintaining the accuracy of the model.Finally, the author realizes the efficient and effective screening of coal and gangue by using the optimized neural network.However, this manuscript also has some aspects that need further discussion. The following comments and suggestions should guide the authors to better revise the paper:

 

1)Abstract

In the abstract of the paper, the author should make a concise introduction to the main technical innovation, algorithm optimization and performance improvement involved in the paper, as well as the reasons for the selection of optimization modules. Therefore, the author should briefly explain the reasons for choosing the CAM attention mechanism module and the Hardswish activation function to make the paper more attractive to scholars in the field.

 

2)Introduction

In paragraph 5, the author proposes an improved identification network based on efficentnetv2, so the author should briefly introduce the origin and research and development background of the original network of efficentnetv2 to make the article more cohesive.

 

3)Image data preprocessing

In the part of image preprocessing, the algorithm proposed by the author only does the image rotation, brightness, chroma, contrast adjustment and histogram equalization, that is, changes the gray level of each pixel in the image by changing the image histogram, which is mainly used to enhance the contrast of the image with a small dynamic range. In my opinion, the author's image preprocessing of coal and gangue here is too simple, and some more powerful image preprocessing algorithms should be applied. For example, in image denoising, convolutional neural network can be used to remove noise in coal and gangue images, and image enhancement can be carried out after removing noise. Image enhancement algorithms such as frequency domain enhancement algorithm can be used to further improve image quality. To sum up, only by ensuring the high quality and stability of image preprocessing can image recognition be further carried out. Therefore, the author should further select and optimize the algorithm in this part.

 

In this paper, through the combination of algorithm optimization and parameter measurement and evaluation, it is proved that the improved netv2 network designed by the author has excellent performance in coal and gangue identification, high efficiency and high stability, and this network is conducive to promoting the solution of coal and gangue identification problems. The data in this paper are abundant and reliable, and the ideas are clear and rigorous. I would like to recommend acceptance of this manuscript after resolving the comments listed below.

Comments for author File: Comments.doc

 Minor editing of English language required

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

The need to detect the border between coal and gangue is essential. This detection shall be carried out during the operation of the combine at the coal wall in the coal mine. The working environment in the mine excavation is characterized by great dustiness, moisture and vibrations induced by the shearer's digging organ and very poor lighting, which makes it extremely difficult to use visual methods to identify the boundary between coal and rock. For these reasons, the possibilities of implementing the methodology for recognizing the coal/rock boundary described in the article are very limited. Some discussion of this issue would be very important.

The problem of building a system based on deep NN is solved correctly. Some definitions, such as accuracy, are well-known and can be cited. However, it must be stressed that the conditions of taking images of the samples of pieces of coal and gangue are far different from the natural conditions that would have to be working the final system in a coal mine, thus making the implementation very questionable. 

The bibliography search is limited to Chinese sources only, which is a significant lack. Furthermore, it would be expedient to include in the search other methods of detecting coal/gangue border, e.g. based on acoustic signals (as mentioned in the paper: P. Kiljan, W. Moczulski and K. Kalinowski: Initial Study into the Possible Use of Digital Sound Processing for the Development of Automatic Longwall Shearer Operation, Energies 2021, 14, 2877).

Some corrections in the editing of the paper are required, such as a table split between two pages, many hard hyphenations of words in the text etc.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Major revision     

Title: An Improved EfficientNetV2 Network for Coal and Gangue Recognition

This manuscript proposes an enhanced version of the EfficientNetV2 network for coal and gangue recognition. The improved EfficientNetV2 method is easy to train, exhibits fast convergence and training speeds, thus achieving high recognition accuracy even under conditions of insufficient dataset. However, the paper raises the following concerns:

(1)Repetition of Terms: The terms "inference time" (or "inference speed") and "FPS" (number of inference images per second) are redundant. Training speed can be better understood through three metrics: model training time, inference time (or inference speed), and FPS (number of inference images per second). A shorter training and inference time lead to better performance, while a higher number of inference images per second correlates with improved results. Furthermore, the current FPS value is reported to be 52imgs/sec, representing a notable increase from the previous 39imgs/sec.

(2)Inconsistent Terminology: Figure 9 labels the abscissa as "epoch," while the text refers to it as "iterations." Clarification is needed to confirm whether the iteration cycle represents 100 epochs or another unit of measurement.

(3)The authors may add more state-of-art CV application articles for the integrity of the manuscript (Detection and Counting of Banana Bunches by Integrating Deep Learning and Classic Image-Processing Algorithms; Computers and Electronics in Agriculture. Novel visual crack width measurement based on backbone double-scale features for improved detection automation; Engineering Structures).

(4)Inconsistent Model Naming: The model name in Figure 9 does not align with the description in the text. The original EfficientNetV2-S network, which utilized SE attention mechanism and ReLU activation function, was compared with the EfficientNetV2-S network that used CAM attention mechanism and ReLU activation function, the EfficientNetV2-S network that employed CBAM attention mechanism, and the EfficientNetV2-CAMHardswish network. Figure 9 displays the test accuracy graph, where EV2 stands for EfficientNetV2-S. However, the activation function of EV2-CBAM is not specified, and it needs to be provided.

(5)Model Name Discrepancy in Table 4: The model names in Table 4 do not match the case of the names described in the text, and they lack sufficient detail. The paper claims to have conducted a comparative analysis of the training speed between the S, M, and L versions of the EfficientNetV2 network and the improved network to ensure that the latter did not experience any loss in performance. However, further details regarding the model names and their corresponding configurations are necessary to make this comparison clear and informative.

 

Author Response

Please see the attachment

Author Response File: Author Response.docx

Reviewer 3 Report

It would be fine to express in the introduction that you are going to sort out coal and gangue after transporting it to the mine processing plant.

Author Response

Please see the attachment

Author Response File: Author Response.docx

Back to TopTop