Next Article in Journal
A Case Study of Educational Games in Virtual Reality as a Teaching Method of Lean Management
Previous Article in Journal
Electrophysiological Correlates of Virtual-Reality Applications in the Rehabilitation Setting: New Perspectives for Stroke Patients
 
 
Article
Peer-Review Record

Real-Time Face Mask Detection Method Based on YOLOv3

Electronics 2021, 10(7), 837; https://doi.org/10.3390/electronics10070837
by Xinbei Jiang 1, Tianhan Gao 1,*, Zichen Zhu 1 and Yukang Zhao 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 5: Anonymous
Electronics 2021, 10(7), 837; https://doi.org/10.3390/electronics10070837
Submission received: 11 March 2021 / Revised: 27 March 2021 / Accepted: 29 March 2021 / Published: 1 April 2021
(This article belongs to the Section Computer Science & Engineering)

Round 1

Reviewer 1 Report

Dear Authors

Congratulations on your work, it's good, especially in this time of the COVID pandemic. The use of the mask will continue for a long time and must be mandatory, and the degree of contagion of COVID depends on the use it.

Your research is well designed, it has a very well explained mathematical component, and the experimentation component, well-founded.

A suggestion, extend with more information the detection time of your algorithm, compared to the others. For example, how many milliseconds does your proposal take vs. the rest of the algorithms. 

Author Response

Dear reviewer:

 

We are gratefully appreciated for your advice and thoughtful comments made to our manuscript.  In addition to the indicated modifications to specifically address the Reviewer’s comments, we have also included other changes in the manuscript to make a clearer description of our findings. In the memo, we have addressed each of the reviewers’ comments on a point-by-point basis, and all of the revised parts have been highlighted in red.

Author Response File: Author Response.pdf

Reviewer 2 Report

This study a properly-wearing-masked face detection dataset . This is an interesting paper, but some corrections are required.

  1. Consider changing the title of the manuscript. There not should be abbreviations.
  2. Abstract - consider revising the background to reflect the specific goal of this paper. The abstract should include information about the background, aim, methods, and results.
  3. The Introduction section needs to be rewritten. 
    - There needs to be a stronger motivation why the stated aim of the study is relevant or useful for people.
    - The points 1-4 should be move to the Methods section.
  4. Methods: The points 4.1-4.2 should move to the Methods section. In this section should be the results presented.
  5. Results: Fig. 6 should move to the Results and Discussion section. Please add the evaluation metrics results.
  6. Your discussion section should be constructed as follows. Consider revising it to include this information: Rephrase the question followed by the answer that was reached from the results. Describe how the data support the answers to the questions. Compare to other studies.    Present the strengths and limitations. Combine the information in the previous paragraphs into a coherent whole, within the framework of the hypotheses.
  7. Make a conjecture of what this study suggests in a larger scope in The Conclusion section.

Author Response

Dear reviewer:

 

Thank you again for your positive comments and valuable suggestions to improve the quality of our manuscript.  In addition to the indicated modifications to specifically address the Reviewer’s comments, we have also included other changes in the manuscript to make a clearer description of our findings. In the memo, we have addressed each of the reviewers’ comments on a point-by-point basis, and all of the revised parts have been highlighted in red.

 

 

 

 

Point 1: Consider changing the title of the manuscript. There not should be abbreviations.

 

Response 1: Thank you very much for your kind suggestion. We have removed “SE-YOLOv3” in the title.

 

 

Point 2: Abstract - consider revising the background to reflect the specific goal of this paper. The abstract should include information about the background, aim, methods, and results.

 

Response 2: We have added more explanation for the background in abstract in line 4-5 “which has also led to a growing demand for automatic real-time mask detection services instead of manual reminding.” and revise the other parts of the abstract.

 

 

Point 3: The Introduction section needs to be rewritten.

- There needs to be a stronger motivation why the stated aim of the study is relevant or useful for people.

- The points 1-4 should be move to the Methods section.

 

Response 3: We would like to sincerely thank you for your advices and constructive comments. We have revised the Introduction section and add more explanation about importance and aim of our study in line 27-33 “According to World Health Organization (WHO), the right way to wear a mask is adjusting the mask to cover our mouth, nose, and chin. The protection will be greatly reduced if masks are not worn properly. At presents, security guards are arranged at entrances of public places to remind people to wear masks. However, this measure not only exposes the guards to the air that may contain the virus, but also leads to overcrowding at the entrance due to its inefficiency. Therefore, a fast and effective method is needed to address the situation.”

After our discussions and deliberative consideration, we maintain the points 1-4 in the current position because we hope that readers can have a full understanding of our contribution to the paper in the introduction. And we have mentioned the main idea of points 2-4 in line 129-133 “Firstly, we introduce an attention mechanism to the backbone network, which can help the network allocate more resources to important features. Then we employ GIoU and focal loss to accelerate the training process and further improve the performance. Finally, we adopt suitable data augmentation techniques for face mask detecting to achieve robustness.”

 

 

Point 4: Methods: The points 4.1-4.2 should move to the Methods section. In this section should be the results presented.

 

Response 4: Thank you for your suggestion. We have considered carefully and thought it might be better to maintain the position of points 4.1-4.2 because they are related to the experiment and as the supplement, they can help the experiment section more readable for general readers.

 

 

Point 5: Results: Fig. 6 should move to the Results and Discussion section. Please add the evaluation metrics results.

 

Response 5: Fig. 6 (now is Fig. 7) has been moved to section 4.4 Comparison with Other State-of-The-Art Detectors. The evaluation metrics results have been shown in Table 2 and Table 3.

 

 

Point 6: Your discussion section should be constructed as follows. Consider revising it to include this information: Rephrase the question followed by the answer that was reached from the results. Describe how the data support the answers to the questions. Compare to other studies.    Present the strengths and limitations. Combine the information in the previous paragraphs into a coherent whole, within the framework of the hypotheses.

 

Response 6: Thank you very much for your kind suggestion.  As you are concerned, there are several problems that need to be addressed. According to your nice suggestions, we have made extensive corrections to our discussion part. We hope the revised manuscript could be acceptable for you.

 

Point 7: Make a conjecture of what this study suggests in a larger scope in The Conclusion section.

 

Response 7: We have revising the conclusion in line 430-432 “Besides, we are going to take parameters and flops into consideration and deploy SE-YOLOv3 on lightweight devices, which can further contribute to global health.”

Author Response File: Author Response.pdf

Reviewer 3 Report

The paper presents an application of machine learning algorithm SE-YOLOv3 for the detection of face mask. The topic is very useful. The method in the paper is appropriate. I have the following suggestions: 

(1) The authors should provide more information on the application of the current method in real-time manner (e.g. computational time, computational cost and etc.). 

(2) The language of the current paper needs to be further improved. The introduction of the existing work should be further expanded in section 1. 

(3) The machine learning concepts and terms (e.g. k-means, convolutional network and etc.) needs to be explained for general reader.  

 

Author Response

Dear reviewer:

 

We gratefully appreciate your advice and niece comments on our manuscript.  In addition to the indicated modifications to specifically address the Reviewer’s comments, we have also included other changes in the manuscript to make a clearer description of our findings. In this memo, we have addressed each of the reviewers’ comments on a point-by-point basis, and all of the revised parts have been highlighted in red.

 

 

 

 

Point 1: The authors should provide more information on the application of the current method in real-time manner (e.g., computational time, computational cost and etc.).

 

Response 1: Thank you so much for your kind suggestion. We have added the comparison in detection time among our model and the others in Table 3.

And we have also added the average execution time of the system for processing one image frame in line 419-421, “And the average execution time of the system for processing one image frame is 0.13s…”

 

 

Point 2: The language of the current paper needs to be further improved. The introduction of the existing work should be further expanded in section 1.

 

Response 2: We sincerely appreciate the valuable comments. We have revised spelling and grammatical errors in the paper need to be corrected and improved the details in our manuscript. Existing work has been added in line 41-44, “There has been some recent research addressing mask detection. RetinaMask [1] introduces context attention module and transfer learning technique to RetinaNet, achieving 1.9% higher average accuracy than the baseline model. D.Chiang el al. [2] improves SSD with a lighter backbone network to perform real-time mask detection.”

 

 

Point 3: The machine learning concepts and terms (e.g., k-means, convolutional network and etc.) needs to be explained for general reader.

 

Response 3: We have added the concepts and terms of convolutional network in line 78-84 , “Convolutional Neural Network (CNN) is a class of deep neural network which is inspired by biological processes [1]. A CNN consists of a series of building blocks, such as convolutional layer, pooling layer, fully connected layer, and is capable of learning automatically and adaptively learn spatial hierarchies of features through a backpropagation algorithm. The kernels of an CNN are shared across all the image positions which makes it highly efficient in parameters. These properties make CNN an ideal solution for computer vision tasks.”

Two-stage method in line 93-95, “The two-stage method first generates a large number of region proposals for each image through a heuristic algorithm or CNN network, and then classifies them and regresses these candidate regions.”

YOLOv3 method in line 151-166, “YOLOv3 [2] is an incremental version from YOLO family. Followed the anchor mechanism introduced in YOLO9000, YOLOv3 make predictions from feature maps of 3 different levels. The feature maps are divided into grids and the cells of grids are placed with anchors of different size and aspect ratio, which are acquired by performing K-Means on the dataset…”

K-means in line 199-205, “K-means is a commonly used clustering algorithm based on Euclidean distance. The hypothesis of K-means is that the data is generated from k exact centers, and some Gaussian noise. It first picks k random points from the data as centroids, and then assign all the points to the closest cluster centroids. The centroids will be recomputed on the newly formed clusters after the assignments. The assignment and centroids computation will be repeated until the maximum number of iterations are reached or centroids of newly formed cluster do not change.”

 

 

REFERENCES:

  1. Goodfellow, I.; Bengio, Y.; Courville, A. Deep learning (Vol. 1). InCambridge: MIT press. 2016; pp. 326–366.
  2. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement.ArXiv2018, abs/1804.02767.

Reviewer 4 Report

REVIEW ELECTRONICS-1159996

 

Title: SE-YOLOv3: Real-Time Face Mask Detection Method Based on YOLOv3

 

In this paper, the authors propose an object detection method based on YOLOv3, named Squeeze-and-Excitation YOLOv3 (SE-YOLOv3). The proposed method aims to locate a face in real time and assess how the mask is being worn to aid the control of the pandemic in public areas.

 

 

COMMENTS

 

The authors present an interesting approach in detecting, in real time, if a person is wearing a face mask properly in order to avoid spreading the covid-19 pandemic.

The paper is readable, well written and structured and of sufficient interest to the audience of Electronics.

The authors’ contribution may not be considered as valuable or very significant but, in my view, is sufficient enough given the current interest in protecting the population from covid-19 pandemic.

 

  1. I would like the authors to elaborate a bit more on the practical applications of their contribution in an industrial, commercial or social environment. For example, automatic identification and immediate notification (or warning) of persons/employees/customers who are improperly wearing their masks.

 

  1. One issue which has not been addressed or even slightly discussed by the authors relates to explainable or interpretable machine learning. Interpretability and explainability of predictions have evolved to very significant issues in most applications of ML and one such case is the one discussed by the authors in this paper. In the case of improper mask wearing, it is important to inform the identified person, why he/she is wearing the mask wrongly. In my view, the authors should add few lines on the value and importance of explainability of their SE-YOLOv3 predictions either in their discussion section or, at least, in their future research directions. Some indicative research in the area is given bellow:
    1. Explainable machine learning framework for image classification problems: case study on Glioma cancer prediction,
    2. A Grey-box model exploiting Black-box accuracy and White-box intrinsic interpretability,
    3. A novel explainable image classification framework: case study on Skin cancer and Plant disease prediction
  2. Some of the mathematics given in section 3 are not really needed and can be found in text books but their presence does not bother me.
  3. Some parts of the text are not clear enough: e.g. lines 25-26: “Consequently, mask detection has become a vital computer vision task to help global societies.”
  4. Lines 29-30 “… the condition of incorrect wearing”.
  5. Line 40: “For different situation, we analyze the environment …” ??
  6. Line 183: “The formulated …” you mean “The formulation”?

 

Author Response

Dear reviewer:

 

We sincerely thank you for your valuable feedback that we have used to improve the quality of our manuscript. In addition to the indicated modifications to specifically address the Reviewer’s comments, we have also included other changes in the manuscript to make a clearer description of our findings. In the memo, we have addressed each of the reviewers’ comments on a point-by-point basis, and all of the revised parts have been highlighted in red.

 

 

 

 

Point 1: I would like the authors to elaborate a bit more on the practical applications of their contribution in an industrial, commercial or social environment. For example, automatic identification and immediate notification (or warning) of persons/employees/customers who are improperly wearing their masks.

 

Response 1: We would like to sincerely thank you for your advices and constructive comments. We have added a section to describe our access control gate system prototype made for mask detection. It is in section 4.5 Application on Face Mask detection in line 403-426 “To evaluate the effectiveness and practicality of the proposed method, in this part, we present an access control gate system prototype equipped with SE-YOLOv3, which can be deployed to public places’ entrances…And the average execution time of the system for processing one image frame is 0.13s, which shows that our has system practical application value.”

 

 

Point 2: One issue which has not been addressed or even slightly discussed by the authors relates to explainable or interpretable machine learning. Interpretability and explainability of predictions have evolved to very significant issues in most applications of ML and one such case is the one discussed by the authors in this paper. In the case of improper mask wearing, it is important to inform the identified person, why he/she is wearing the mask wrongly. In my view, the authors should add few lines on the value and importance of explainability of their SE-YOLOv3 predictions either in their discussion section or, at least, in their future research directions. Some indicative research in the area is given bellow:

Explainable machine learning framework for image classification problems: case study on Glioma cancer prediction,

A Grey-box model exploiting Black-box accuracy and White-box intrinsic interpretability,

A novel explainable image classification framework: case study on Skin cancer and Plant disease prediction

 

Response 2: Thank you so much for your great advice. Your suggestion really means a lot to us. We have learned a lot from the research and revised the section 4.3 Ablation Experiment. We have added the visualization of the feature map in SE-YOLOv3 and YOLOv3 (Figure. 6), and tried to explain the meaning of the network in line 362-372, “Then, as mentioned in [1], reasoning and explanation is essential to trust models used for critical predictions. The face mask detector is used for protecting public health safety, whose explainability should be addressed as well. We present the visualization of feature maps from different layer of SE-YOLOv3 and YOLOv3 in Figure 6. ... we find that SE-YOLOv3 looks at the central area of the face, such as eyes, nose and chin, while YOLOv3 looks at the relatively marginal area. This result shows that SE-YOLOv3 tends to focus more on relevant regions than YOLOv3 when making prediction.”

 

 

Point 3: Some of the mathematics given in section 3 are not really needed and can be found in text books but their presence does not bother me.

 

Response 3: Thank you for your kind suggestion. We have considered carefully and thought it might be better to maintain mathematics for general readers.

 

 

Point 4: Methods: Some parts of the text are not clear enough: e.g. lines 25-26: “Consequently, mask detection has become a vital computer vision task to help global societies.”

 

Response 4: We feel sorry for our poor writings, and we have revised the introduction and add more explanation about importance of our study and the reason in line 27-40, “According to World Health Organization (WHO), the right way to wear a mask is adjusting the mask to cover our mouth, nose, and chin. The protection will be greatly reduced if masks are not worn properly. At presents, security guards are arranged at entrances of public places to remind people to wear masks. However, this measure not only exposes the guards to the air that may contain the virus, but also leads to overcrowding at the entrance due to its inefficiency. Therefore, a fast and effective method is needed to address the situation…”

 

 

Point 5: Lines 29-30 “… the condition of incorrect wearing”.

 

Response 5: The sentence has been revised in line 47, “To guide the public away from the pandemic, detection for incorrect masks wearing is needed to be discussed.”

 

 

Point 6: Line 40: “For different situation, we analyze the environment …” ??

 

Response 6: Thank you for your reminder. We have resived this paragraph to better explain the effect of mixup in line 69-71, “To adjust to the real application scenarios, we apply mixup technique to further improve the robustness of the model. As shown in with the result gaining ablation experiment, mixup training method results in a 1.8% improvement in mAP.”

 

 

Point 7: Line 183: “The formulated …” you mean “The formulation”?

 

Response 7: We feel sorry for our carelessness. We have changed the “formulated” to “formulation”. Thanks for your correction.

 

 

REFERENCES:

  1. Pintelas, Emmanuel G., et al. “Explainable Machine Learning Framework for Image Classification Problems: Case Study on Glioma Cancer Prediction.” Journal of Imaging, vol. 6, no. 6, 2020, p. 37.

Author Response File: Author Response.pdf

Reviewer 5 Report

Dear Authors.

This paper is good paper (SE-YOLOv3: Real-Time Face Mask Detection Method Based on YOLOv3 ). But, I decision reconsider after major revision.

Strength of this paper included:

-The topic and is very interesting.

-I think "Face Mask Detection" is new.


Weakness of this paper:
1. This paper is too short for journal publication.
-Just 14Page
-You Need over 17 Page...

2.
I recommend additional/rewrite "Introduction".

3.
I recommend additional/rewrite "Abstract and contribution".
-More contribution.

-Contribution need supported by data and result.

4. Related work: Improve
Important aspect has been mentioned.

4.1. Face recognition at a distance for a stand-alone access control system. Sensors, 20(3), 785.

4.2. Mixed YOLOv3-LITE: A lightweight real-time object detection method. Sensors, 20(7), 1861.

- What is Method Based on YOLOv3.
clearly

-Also, more to 8 new papers published from 2019~2021 by major publishers such as IEEE, ACM, Springer, Elsevier, MDPI, and Wiley.

5. 
Results need clearly.

6.
Scientific Soundness: Average.

7.
English: Moderate English changes required.

Author Response

Dear reviewer:

 

We are gratefully appreciated for your advice and thoughtful comments made to our manuscript.  In addition to the indicated modifications to specifically address the Reviewer’s comments, we have also included other changes in the manuscript to make a clearer description of our findings. In the memo, we have addressed each of the reviewers’ comments on a point-by-point basis, and all of the revised parts have been highlighted in red.

 

 

 

 

Point 1: This paper is too short for journal publication.

-Just 14Page

-You Need over 17 Page...

 

Response 1: Thank you for pointing this out. We have added more contents to our paper to make our results more convincing and clearer. We have added Section 4.5 Application on Face Mask detection to our paper which introduce an application prototype of the proposed method. We added a paragraph for feature map visualization in 4.3. Ablation Experiment to show the effectiveness of the SE-Block of our model. We added technique terms explanation for general readers for better clarification. To this end, the final manuscript reaches 17 pages.

 

Point 2: I recommend additional/rewrite "Introduction".

 

Response 2: Thanks for your suggestion. We have added materials to make the introduction part clearer and emphasize the importance of our work. We supplemented data and revise the language. We have tried our best to polish the language in the revised manuscript.

 

 

Point 3: I recommend additional/rewrite "Abstract and contribution".

-More contribution.

-Contribution need supported by data and result.

 

Response 3: Thank you for this suggestion. We have supplemented data from the experiment to make our contributions more solid. In abstract part, we also mentioned our contributions and added data to support our ideas.  

 

 

Point 4: Related work: Improve

Important aspect has been mentioned.

4.1. Face recognition at a distance for a stand-alone access control system. Sensors, 20(3), 785.

4.2. Mixed YOLOv3-LITE: A lightweight real-time object detection method. Sensors, 20(7), 1861.

- What is Method Based on YOLOv3.

clearly

-Also, more to 8 new papers published from 2019~2021 by major publishers such as IEEE, ACM, Springer, Elsevier, MDPI, and Wiley.

 

Response 4: We sincerely appreciate the valuable comments. We have added section 2.3 Methods Based on YOLOv3 and referenced some recent extended work based on YOLOv3 in line 172-193, “YOLOv3 is competitive in both accuracy and speed, and it is robust in detecting different types of objects. As a result, YOLOv3 has been widely applied in industries, such as manufacturing and military. Many extended works on YOLOv3 have been proposed, which are mainly focus on three aspects, speed, accuracy and model's size. To fulfill the real-time requirement in electronic components manufacturing, Huang et al. [1] incorporated MobileNet network framework to lighten the YOLOv3. ... Zhao et al. [2] proposed Mixed YOLOv3-LITE, which aims for embedded and mobile smart devices. Mixed YOLOv3-LITE employs the shallow backbone of YOLO-LITE to replace the DarkNet-53 and adds a residual structure and parallel high-to-low-resolution subnetworks to achieve the fusion of shallow and deep features. The size of Mixed YOLOv3-LITE is reduced to 20.5 MB, which is 91.70% smaller than the original YOLOv3 model. GC-YOLOv3 [3] introduces a global context block and learnable fusion between the feature extraction network and the feature pyramid network. ...”.

 

We have added 4 more papers published from 2019~2021, the papers are listed in the Reference below. And the fourth paper has been added in line 362-364, “Then, as mentioned in [4], reasoning and explanation is essential to trust models used for critical predictions. The face mask detector is used for protecting public health safety, whose explainability should be addressed as well.”

 

 

Point 5: Results need clearly.

 

Response 5: We sincerely thank the reviewer for careful reading. We have added more data and ablation study to support the results. Also, we added explanation to evaluation metric to make our results clearer.

 

 

Point 6: Scientific Soundness: Average.

 

Response 6: Thank you for your suggestion. We have added more details in experiments settings explanations for evaluation metric to make our result more convincing.

 

 

Point 7: English: Moderate English changes required.

 

Response 7: Thanks for your suggestion. We revised spelling and grammatical errors in the paper need to be corrected and improved the details in our manuscript, which makes readers understand the proposed scheme clearly.

 

 

REFERENCES:

  1. Huang, Rui, et al. “A Rapid Recognition Method for Electronic Components Based on the Improved YOLO-V3 Network.” Electronics, vol. 8, no. 8, 2019, p. 825.
  2. Zhao, Haipeng, et al. “Mixed YOLOv3-LITE: A Lightweight Real-Time Object Detection Method.” Sensors, vol. 20, no. 7, 2020, p. 1861.
  3. Yang, Yang, and Hongmin Deng. “GC-YOLOv3: You Only Look Once with Global Context Block.” Electronics, vol. 9, no. 8, 2020, p. 1235.
  4. Pintelas, Emmanuel G., et al. “Explainable Machine Learning Framework for Image Classification Problems: Case Study on Glioma Cancer Prediction.” Journal of Imaging, vol. 6, no. 6, 2020, p. 37.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

I accept all corrections. No more comments.

Reviewer 5 Report

Dear Authors.

The revision adequately address the concerns expressed in last review. 
So, I recommend that this revised manuscript can now be recommended for publication. (Accept in present form)

Back to TopTop