Next Article in Journal
An Efficient Noise Reduction Method for Power Transformer Voiceprint Detection Based on Poly-Phase Filtering and Complex Variational Modal Decomposition
Next Article in Special Issue
Utilizing Latent Diffusion Model to Accelerate Sampling Speed and Enhance Text Generation Quality
Previous Article in Journal
A Fast Phase-Only Beamforming Algorithm for FDA-MIMO Radar via Kronecker Decomposition
Previous Article in Special Issue
Safe and Trustful AI for Closed-Loop Control Systems
 
 
Article
Peer-Review Record

PatchRLNet: A Framework Combining a Vision Transformer and Reinforcement Learning for The Separation of a PTFE Emulsion and Paraffin

Electronics 2024, 13(2), 339; https://doi.org/10.3390/electronics13020339
by Xinxin Wang 1,2, Lei Wu 1,2, Bingyu Hu 3, Xinduoji Yang 3, Xianghui Fan 1,2, Meng Liu 3, Kai Cheng 1,2, Song Wang 3, Jianqiang Miao 1,2 and Haigang Gong 2,*
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2024, 13(2), 339; https://doi.org/10.3390/electronics13020339
Submission received: 4 December 2023 / Revised: 3 January 2024 / Accepted: 5 January 2024 / Published: 12 January 2024
(This article belongs to the Special Issue Advances in Artificial Intelligence Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The study focuses on a significant challenge in the production of PTFE emulsion: the need for efficient and safe detection of separation between PTFE emulsion and liquid paraffin. To address this, the authors propose an automated detection framework named PatchRLNet, which combines a vision transformer with reinforcement learning, focusing on essential features while minimizing background noise.

1. Introduction:

The introduction would benefit from a brief but more explicit description of how reinforcement learning and the vision transformer synergize in the proposed solution.

2. Related work:

A deeper analysis of why certain methods are more effective in specific scenarios would enhance understanding. For example, more detailed explanations of the limitations of CNNs in noisy environments and why vision transformers excel in these conditions would be beneficial.

While the section discusses various applications of CNNs and vision transformers, it lacks a direct connection to how these technologies specifically apply to PTFE emulsion and paraffin separation.

There seems to be a lack of critical evaluation of the methods discussed. Including insights into potential biases, inaccuracies.

3. Materials and Methods:

The section briefly mentions various data augmentation techniques used but lacks details on how each technique specifically contributes to improving the model’s performance or the rationale behind choosing these specific augmentation methods.

The approach to address class imbalance through oversampling is mentioned, but the method and its potential impact on the model's performance are not thoroughly discussed. Oversampling can sometimes lead to overfitting, and how this was mitigated or accounted for is not clear.

The integration of reinforcement learning is a critical part of the methodology, but there's insufficient detail on how the reinforcement learning algorithm was specifically tailored or optimized for this application.

4. Results:

The description of the loss curve and the model's initial training dynamics is somewhat vague.

The section mentions one false positive but doesn't provide information on false negatives, which are equally important in understanding the model's reliability. Additionally, the reasons behind the false positive and how such errors could be mitigated in future iterations of the model are not sufficiently explored.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The following are some comments and suggestion for improving this manuscript.

1. In section 3.2, the presentation style of the proposed method should be changed as follow:

  Firstly, the framework and its components are briefly explained.

  Then, the detail of the components or blocks of the frameworks are presented in the sub-sections.

2. The presentation style of the experimental results section should be changed as follow.

  Title of Section 4.1 should be the "Performance Evaluation Method".

  4.2. Performance Evaluation of Vision Transformer

  4.3. Performance Evaluation of PatchRLNet

  4.4. Performance Evaluation of PTFE Emulsion Paraffin Separation

 

3. The title of the following sections should be removed and its explanation should be integrated into the corresponding sub-sections.

  4.4. Loss Curve

  4.5. Explaninability

4. In experimental results section, the authors should described some sample images of the obstructed images, paraffin images, emulsion images, and idle images from the two perspectives.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The paper proposes a framework called PatchRLNet that combines a vision transformer and reinforcement learning for automated detection of the separation between PTFE emulsion and liquid paraffin. I have the following comments for the authors to address.

1) The reported average testing accuracy of 99.00% ± 2.04% for the proposed PatchRLNet model raises concerns, as classification accuracy by definition cannot exceed 100%. There appears to be an error in the calculation or reporting of the average accuracy metric - it either should not be higher than 100% or should not have a positive standard deviation spanning above 100%. The authors should double check their evaluation methodology.

2) While reporting accuracy provides an intuitive measure of classification performance, it has limitations when dealing with imbalanced data classes. To address this, I recommend additionally including the F1-score as an evaluation metric. 

3) Since the accuracy is very high on the PTFE emulsion data, more testing is needed to determine if the integrated model can maintain performance on other liquids or materials. One of the major issue that I found with the manuscript is that the overfitting could limit applicability of model in new use cases.

4) Limitation of the work and potential risks should be mentioned. For example, reliance on two camera perspectives may be infeasible in space-constrained industrial environments. 

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors The manuscript has been sufficiently improved to warrant publication in Electronics.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

I have confirmed that the authors modified the manuscript according to the comments and suggestion of previous review.

Therefore, I would like to encourage to publish this manuscript.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have thoroughly and satisfactorily addressed all of my comments on the previous version of the manuscript in their revisions. 

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop