Next Article in Journal
Transcriptomic Analysis of the Carbon Fixation Pathway in Photosynthetic Organisms of Pugionium cornutum (L.) under Drought Stress
Previous Article in Journal
How to Regain Green Consumer Trust after Greenwashing: Experimental Evidence from China
 
 
Article
Peer-Review Record

Self-Attention-Mechanism-Improved YoloX-S for Briquette Biofuels Object Detection

Sustainability 2023, 15(19), 14437; https://doi.org/10.3390/su151914437
by Yaxin Wang 1, Xinyuan Liu 1, Fanzhen Wang 1, Dongyue Ren 1, Yang Li 1, Zhimin Mu 2, Shide Li 2 and Yongcheng Jiang 1,*
Reviewer 1:
Reviewer 2:
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Sustainability 2023, 15(19), 14437; https://doi.org/10.3390/su151914437
Submission received: 1 August 2023 / Revised: 15 September 2023 / Accepted: 18 September 2023 / Published: 3 October 2023

Round 1

Reviewer 1 Report

1.Section 2.2.3 discusses the Convolutional Neural Network, but the title is "Contextual transformer network," which does not correspond to the content of the article.

2.Section 2.1.4 mentions five evaluation metrics, but they are not reflected in the experiments described in Section 3.1. It would be beneficial for the author to provide additional information regarding the evaluation metrics in Figure 7.

3.In Figure 8, the loss curve still shows a downward trend, indicating that it is uncertain if the model training has truly converged. To further investigate the continued decrease in loss, it would be beneficial for the author to increase the number of training epochs and observe the subsequent trend in loss.

4.Other logic problems

(1)In line 117, the author writes "the intersection over union (IOU) is set to 0". IOU is used to evaluate the proximity between the predicted box and the ground truth. The author should reconsider the selection of the IOU value according to the formula.

(2)In line 126, the description of Recall is stated as follows: "The Recall is the ratio of TP and the sum of TP and FP". This description differs from the formula (3) for Recall, and it is evident that the description in the text corresponds to the Precision value, not the Recall value.

(3)In line 143, the Resblock body structure mentioned by the author is not illustrated in Figure 2.     

(4)In Figure 2, the CSPLayer and CBS structures are named by the author that are not explained in detail within the article or the figure.

(5)In line 182, the ResNet structure is not explained in detail within the article. 

(6)In line 245, the author describes "the accuracy of the straw block is only 29%", which contradicts the value shown for the straw block in Figure 7.

(7)Figure 7 presents the Average Precision (AP) values for different objects, rather than the accuracy values described in lines 244-246. This indicates a confusion in the author's description of the evaluation metric.

(8)Lines 274-277 are supposed to be a description of Figure10, but line 274 is incorrectly written as Figure11. Additionally, the labels mentioned in lines 275 and 276 do not match the labels in Figure 10.

(9)In lines 286-290, the authors indicate that it is a comparison of the detection results on the same image using different models. However, in Figure 11, the images shown on the left and right sides of (b), (c), (d), and (e) are not the same, which fails to accurately represent the improved model's detection performance.

There are some grammatical and spelling mistakes in the paper, it is recommended to ask native English speakers to correct them.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

By improving the Yolox-s, the fuel detectors. Overall, the paper makes sense. But there are still the following points that need to be improved.

1.In lines 26-39, the reference should be cited only once. For example [3], [5]. You also cite multiple times elsewhere in the text.

2.Your section 2.1 I suggest to put in the experimental section of the narrative, related work generally tells the description related to your proposed methodology, like the data set and evaluation metrics can be put in the experimental preparation section to write later.

3.On line 127, write equation 1, non ''(1)''

4.Your model only improves in performance by about two points compared to Yolov7 and one point compared to Yolox-s, but this slows down your training considerably, so are you just looking for the slightest improvement in performance and don't care about speed? Why can't I just give up a little bit of performance in favour of fast detection? Please do explain this!

5. The models of the Yolo series have now reached Yolov8 and need to be compared with it for experiments.

6. Your section 3.2 is actually an analysis of an ablation experiment, and I hope you will show your results in a table, and you can learn from how others have written this paragraph.

7. I hope that your model will be analysed in a comparative experiment with the current state-of-the-art model, there are too few comparative experiments in your paper.

8.For example, your Fig. 7, Fig. 10 could be represented as a table.

9. There are some areas of language expression that require revision.

Moderate editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

In this manuscript, the authors introduced a pipeline based on YOLOX for biomass fuel types detection. In detail, the YOLOX algorithm is improved and self-attention is added to build a new backbone feature extraction network by replacing 3×3 convolution in ResNet with a COT module. Extensive experiments have been conducted for evaluation. The manuscript is easy to follow and the proposed method sounds convincing. However, there are some suggestions:

1. The key idea to introduce self-attention into the existing YOLOX backbone is similar to many existing works [a][b][c]. Therefore, it is worthwhile discussing or at least mentioning them in the manuscript. 

[a] Shaw, Peter, Jakob Uszkoreit, and Ashish Vaswani. "Self-attention with relative position representations." arXiv preprint arXiv:1803.02155 (2018).

[b] Zhao, Hengshuang, Jiaya Jia, and Vladlen Koltun. "Exploring self-attention for image recognition." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[c] Wang, Sinong, et al. "Linformer: Self-attention with linear complexity." arXiv preprint arXiv:2006.04768 (2020).

2. The authors chose the YOLOX as the backbone to balance the performance and inference speeds of the model. Therefore, it would be better if the authors could also discuss or at least mention the existing works [d][e][f] to balance the performance and inference speeds of models for object detection.

[d] Tan, Mingxing, Ruoming Pang, and Quoc V. Le. "Efficientdet: Scalable and efficient object detection." Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020.

[e] Cui, Yiming, Linjie Yang, and Ding Liu. "Dynamic proposals for efficient object detection." arXiv preprint arXiv:2207.05252 (2022).

[f] Chen, Guobin, et al. "Learning efficient object detection models with knowledge distillation." Advances in neural information processing systems 30 (2017).

3. It would be better if the authors could provide some examples to show the challenges of biomass fuel types detection, compared to general object detection, to highlight the contributions and motivations of the manuscript, like intra/inter-class variances.

4.  It would be better if the authors could try to explain the reason for the failure cases in Figure 11 since those objects undetected are not too challenging.

5. For the datasets, it seems that the object categories in one image are always the same. Is there any demo examples where there are different objects categories in the image?

Good

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Highlight changes in yellow in a next revision, please. No track changes.

 

Dear authors,  

Unfortunately, it will not be possible to review a paper with such high similarity.

Similarity needs to be dropped, and then perhaps the paper can be revised.

It is just too high.             

Not possible.

 

All final list of references are very limited, they all come from particular publications

Not assessed

Author Response

We sincerely apologize for the high similarity of the articles. We have rechecked and modified the contents using the rechecking tool, and hope that the requirements can be met now. We would like to thank the referee again for taking the time to review our manuscript. Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

There are still a lot of unanswered questions about this paper.

The authors in the paper all belong to one organization? The affiliation of each author needs to be indicated. Take a good look at how others have written the author information section.

The abstract needs to be rewritten, read more references and see how others write their abstracts, which are not coherent and logical.

Lines 38 and 42 cite the same document, is this document formatted correctly?

There are still many places where there are multiple references to the literature, and you didn't address the questions I posed 1.

What does it mean that there is a '-' in front of multiple words in lines 49, 51,53,55,67, etc.?

The work in question must not be the model you propose, read more papers, don't write just for the sake of writing.

There are still multiple spaces in the paper, won't the typesetting?

Table3 - what does it mean?

Have you checked your Table 2, Table 3, Table 4? Doesn't it seem to have a big problem?

Need to add a discussion to analyze

Extensive editing of English language required

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report

This is a manuscript after revision. I am grateful for the authors' responses which have addressed all my concerns.

Author Response

Thank you very much for sending us the reviews.

Reviewer 4 Report

Highlight changes in yellow in a next revision, please. No track changes.

 

Dear authors, unfortunately, the similarity remains too high.

It needs to be significantly lowered

 

General comments then:

 

References style needs to be corrected.

“[1], [2].”

 

English needs to be corrected:

“the use of biomass for bioenergy production have been”

 

English is terrible see.

“The agricultural crops and forestry biomass wastes are produced more than 900 mil-

tons per year in China [8],”

 

Complete similarity here See that cited sources vary, but the original source is the same.

“YOLO series algorithm is a "one stage" target detection algorithm that reach a balance between the detection accuracy and speed [21]. Among them, YOLOX [22] developed by MEGVII is an open-source high-performance detection algorithm, which can exceed the AP of Yolov3, Yolov4 and Yolov5 and achieve an extremely competitive spee”

 

...

“YOLOX-S model is a derivative version of YOLOX model whose network structure is divided into three parts: the backbone feature extraction network (CSPDarknet), the enhanced feature extraction network (FPN), and the classifier and regression (YOLO head) [23].”

 

Please check all typos:

“the channels (Conv2D_BN_SiLU i”

 

Other papers used these measures too. Why are they not cited?

“(80×80×256), (40×40×512) and (20×20×1024), respectively.”

 

I do not really understand this caption.:

“Figure 1. YOLOX Network Structure.”

 

Compared with the other to which similarity is presented, that makes sense:

Figure 1. Structure of original YOLOX-S model.

 

Here we go again:

 

Similarity all over.

“2.1.2. Contextual transformer network”

And no citations in several cases.

 

“Convolutional Neural Network (CNN) is a kind of feedforward neural network, whose artificial neurons can respond to a part of the surrounding units within the cover- age range, and have excellent performance in large image processing [28].”

“Group convolution of k×k is performed on the KeyMap to obtain ?i?1 with context information representation. This ?i?1 can be re- garded as static modeling of local information. Then the concat operation is performed, in which ?i?1and Q were concatenated, and its results were convolved twice in succession to get an Attention Map with rich context information. Then, this Attention Map was multi- plied by V to get ?i?2 of dynamic context modeling. The”

“3.1. Image datasets 3.1.1. Data sets making and processing”

“3.1.2. Environment setup”

“3.1.4. Evaluation metrics”

I must leave it very clear that I do not understand this. See that equations are being presented as if they were original. No reference is indicated. In any case, the similarity needs to be entirely addressed.

 

And it goes on:

“3.2. Comparative Experiment”

 

etc

etc

 

I do not even understand how to join this together. Authors seem to be referring to methods and results at the same time.

“3. Experiment and Results Analysis”

So if it is about results, we cannot have similarity.

 

Captions need to be self explanatory, they are not.

“Table 2. Yolov7 training results.”

“Table 3. Ablation experiments.”

etc

 

?!: “the improved strategy has improved the”

 

Grouped figures need to include a subcaption by letter after the main caption, not below the figures.

“Figure 9. Comparison of the target missed detection rate.”

 

This says nothing is it is not enlightening the term comparison should never appear in a caption. The comparison needs to be made by the authors.

“Figure 10. Comparison of the RecallAP and F1 evaluation.”

 

Our readers supposed to guess?!

“(f) image with different object categories”

 

The conclusion section please. use plural all the times. Cannot contain any similarity, and it does.

It should start by a brief contextualization and methodology, so it is clear why the study was developed and how then. main findings and practical implications limit and future prospects novel. innovation and originality must be highlighted.

 

I hope that the authors understand that similarity is a big issue here. More than that other comments above indicate the authors how to make the text more relevant.

extensive revision

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

I agree that your paper can be accepted with major revisions, but you will need to address the issues I have given below carefully, and I will look at them again once they are resolved, and if they are not appropriate. I will make a comment to the editor.

1.For some of the content I re-presented in my thesis in terms of how it was written, I would like you to refer to the following piece of literature

https://doi.org/10.3390/e25091280

2. There are still multiple citations in your paper that are not standardized, and I hope you will revise them carefully.

3. Before you submit your paper, look carefully at the figures and tables in your paper, for example, you have a problem with Table 2.

4. What does table 3 '---' mean, and your notes on table 3 make it hard to understand. This kind of problem is equally difficult to understand in other tables and figures, so please refer carefully to that reference in comment 1 for modification.

5. Where are the contributions you made in your dissertation

Extensive editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Highlight changes in yellow in a next revision, please. No track changes.

 

Dear authors, The similarity remains significant. It needs to be lowered, again.

I really do not understand this way of duplicating headings from another paper, for example, why?!

Methods section is full of similarity, Including entire sentences without a corresponding reference that need to be rephrased.

If this is the method section and it relates the description of how work has been carried out without involving specifications, then own words should be used to write it.

Partially rewriting is obviously not enough.

Please do not start caption by the.

In figure 10 authors cannot use the same letter to two different figures.

 

Understanding the change is made to the manuscript. I would like the authors to understand what is at stake here. Similarity is not an option.

References should be made more international because authors do want to publish in an internationally indexed journal and be cited by an international audience.

To be revised

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 4

Reviewer 2 Report

no

Minor editing of English language required

Author Response

We re-carried out the language refinement in the newly submitted paper. The modified part is highlighted in yellow. I would like to sincerely thank the reviewers for your careful review and guidance of the papers I submitted recently. Under your guidance, my papers are more accurate and clear, and my research results can be better conveyed. Thanks again for your help!

Reviewer 4 Report

Dear authors more than the percentage of similarity. It is the kind of similarity detect in parts of the text where it should be the author’s own words.

The similarity found in the methods section is extensive.

It should be further addressed.

 

I really hope the authors have learned something with this.

Moderate

Author Response

We have carefully revised the paper according to the duplicate report you gave us. The modified part is highlighted in yellow. I would like to sincerely thank the reviewers for your careful review and guidance of the papers I submitted recently. Under your guidance, my papers are more accurate and clear, and my research results can be better conveyed. Thanks again for your help!

Back to TopTop