Next Article in Journal
Quality and Security of Critical Infrastructure Systems
Previous Article in Journal
Evaluating the Robustness of Deep Learning Models against Adversarial Attacks: An Analysis with FGSM, PGD and CW
 
 
Article
Peer-Review Record

Deep Learning and YOLOv8 Utilized in an Accurate Face Mask Detection System

Big Data Cogn. Comput. 2024, 8(1), 9; https://doi.org/10.3390/bdcc8010009
by Christine Dewi *, Danny Manongga *, Hendry, Evangs Mailoa and Kristoko Dwi Hartomo
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Big Data Cogn. Comput. 2024, 8(1), 9; https://doi.org/10.3390/bdcc8010009
Submission received: 28 November 2023 / Revised: 6 January 2024 / Accepted: 10 January 2024 / Published: 16 January 2024

Round 1

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

The study presents the effectiveness of YOLOv8 in facial mask detection to mitigate the spread of COVID-19 in public spaces. The proposed model is reliable in accurately detecting and identifying facial masks, surpassing previous research in terms of performance.

Overall, the manuscript is well-written and organized. However, I have several comments to further improve the quality of the manuscript: 

-          The authors should ensure that all comparisons in Table 4 are made under the same conditions, utilizing both the FMD and MMD databases.

-          Additional methods for comparison under the same conditions should be included.

-          Additionally, a related work section should be added, which summarizes the methods presented in the comparison table (Table 4: Previous Research Comparison).

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 2 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

1-     “A corresponding text file with the same name as each image file in the same directory will be generated as a.txt file.” – This statement is confusing. Please describe which software tool generates this file? What is the significance of this information with respect to the current problem at hand? More importantly, is the name of the file “a.txt” or same as that of the image?

2-     The authors have added some definitions related to the training curves given in figure 5 in the resubmitted paper. However, the shape of each curve needs to be discussed in order to ascertain their respective value with respect to the proposed approach. E.g. it can be noticed that box, cls and dfl losses are still getting lower at the 100th epoch. Why did the authors choose to stop the training at this epoch? Please describe the rationale behind this hyperparameter choice and how does it affect the overall performance.

3-     The authors have not provided an adequate answer to the following query raised in the previous review cycle. “The performance comparison given in Table 4 needs to be clarified and expanded. E.g. the first row gives a lone result on a dataset titled “Our Database of Faces (ORL)”. No explanation about this particular dataset has been given in the text. Moreover, the compared technique, i.e. PCA, has been used in the reference work for face recognition with/without mask which is different from the problem at hand i.e. mask detection. The provided result i.e. 70% AP needs to be explained as well. Similarly, the second row provides detection results for a totally different dataset i.e. MAFA which has also not been discussed in the text. The only valid comparison is with reference works [8] and [9] which have used the same dataset as authors.” It should be understood that the proposed work should be compared against only the relevant efforts described in the literature. For instance, “Face recognition with and without mask” [24] is a totally different problem than the one stated in this manuscript.

4-     The resubmitted manuscript has still not addressed the following query raised in the previous review cycle i.e. “The contribution of the proposed work is not clear. The authors have trained a publicly available model i.e. yolov8m without any modifications on the publicly available datasets. Moreover, the stated accuracy on “Face Mask Detection Dataset” at Kaggle page is 94% which is higher than that reported in this paper albeit on the combination of FMD and MMD dataset. So, it is advisable to strengthen the experimental comparisons with reference works on identical datasets. At present, the manuscript only provides results of training a well-known publicly available detector on publicly available datasets.”

Comments on the Quality of English Language

The manuscript still has a lot of problems related to the tutorial and use of language. E.g.:

1-     “Next, predict the width(w) and height(h) of the boxes x, y, and anchor box (dw and dh).”

2-     “The FMD dataset is comprised of all 853 photos that are included in this dataset, and they are all saved in the PASCAL VOC format”

3-     “the combination of MMD and FMD techniques” – are these techniques or datasets?

4-     “mask_weared” should be “mask_worn” in figure 3?

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 3 Report (New Reviewer)

Comments and Suggestions for Authors

The article is devoted to a current topic related to the development of a computer vision system. The work uses the YOLOv8 framework - the most modern version of the YOLO series frameworks. This framework is widely used and is an effective tool for solving various problems of object recognition and detection. The authors coped with the task posed in the study. Overall, the study is of interest, since wearing masks during an epidemic is a fairly common practice in many countries. Such systems will certainly make it possible to automate a number of processes related to identifying violators. As a remark, we can note the insufficient overview of the comparison of the developed system with existing ready-made analogues. In general, the article can be accepted for publication after minor revision.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Reviewer 4 Report (New Reviewer)

Comments and Suggestions for Authors

The manuscript addresses the facial mask detection problem using a SOTA object detection framework YOLOv8. Superior detection performance has been achieved in terms of both accuracy and precision. Some issues in the current manuscript are listed as follows:

-- The second last paragraph of the introduction section, the contribution list is numbered as 1, 2 and 4 without 3.

-- The subscripts in lines 218 and 219 are not formatted properly.

-- line 229, the equation is messed up.

-- It is suggested to give the details of hyperparameter values for training different models.

-- It is unclear how the training and test data are split. How much training data for each category and how does the split ratio affect the detection performance?

Comments on the Quality of English Language

There are some issues or errors in the typesetting.

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report (New Reviewer)

Comments and Suggestions for Authors

The manuscript can be accepted in the current form.

Author Response

Thanks to the reviewer for the comment and for accepting this paper.

Reviewer 2 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

1-     In figure 5, the loss curves show that the loss is still going lower. The authors have still not provide an explanation of why did they stop the training at 100th epoch in the revised manuscript?

2-     The question of novelty remains answered. The manuscript states that “A novel deep-learning detection model has been built and showcased”. Yolov8 is a well-known detection model and the authors have not made any changes to it other than training it on . The following papers and github pages demonstrates the exact same functionality as being described in this paper.

a.      https://github.com/harikris001/Mask-Detector

b.       https://github.com/Vaaanc/Face-mask-detection---YOLOv8/blob/cadd94cad33be3ca4ff40e8ca6348481e05b274e/face-mask-detection-yolov8.ipynb

c.      https://ijisae.org/index.php/IJISAE/article/view/2966

Comments on the Quality of English Language

n/a

Author Response

Please see the attachment

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report (Previous Reviewer 1)

Comments and Suggestions for Authors

I have no more comments on the revised manuscript. 

 

This manuscript is a resubmission of an earlier submission. The following is a list of the peer review reports and author responses from that submission.

 

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1-     Equations 1-6 require more explanation. All the terms have not been defined.

2-     Is the dataset prepared by the authors?

3-     Figure 5 plots various training/loss curves but these have not been discussed in the accompanying text. What is the significance of these results?

4-     The performance comparison given in Table 4 needs to be clarified and expanded. E.g. the first row gives a lone result on a dataset titled “Our Database of Faces (ORL)”. No explanation about this particular dataset has been given in the text. Moreover, the compared technique, i.e. PCA, has been used in the reference work for face recognition with/without mask which is different from the problem at hand i.e. mask detection. The provided result i.e. 70% AP needs to be explained as well. Similarly, the second row provides detection results for a totally different dataset i.e. MAFA which has also not been discussed in the text. The only valid comparison is with reference works [8] and [9] which have used the same dataset as authors.

5-     The link to reference work 38 is broken.

6-     The contribution of the proposed work is not clear. The authors have trained a publicly available model i.e. yolov8m without any modifications on the publicly available datasets. Moreover, the stated accuracy on “Face Mask Detection Dataset” at Kaggle page is 94% which is higher than that reported in this paper albeit on the combination of FMD and MMD dataset. So, it is advisable to strengthen the experimental comparisons with reference works on identical datasets. At present, the manuscript only provides results of training a well-known publicly available detector on publicly available datasets.

Comments on the Quality of English Language

Language related problems. E.g.:

1-     “This research employ YOLOv8 to identify face mask identification”

2-     “improved by the suggested model to a class "Good" level”

3-     “This is accomplished through the usage of the term "data augmentation."” – redundant sentence.

4-     “we have introduced the efficiency of the proposed model in detecting masked faces in medical contexts.”

Reviewer 2 Report

Comments and Suggestions for Authors

1. The paper lacks novelty and there have been too many related studies. In addition, the advantages of the proposed method were not highlighted in this manuscript, and only the detection effects of several YOLO models were compared without any improvements made to the models.

2. The introduction should describe the background and significance of the research, as well as the current research status, relevant research, current problems, and then introduce the methods used in this article.

3. Section 2.1 introduces the research methods of others, and should not be extensively introduced in the materials and methods section. Instead, it should be included in the introduction.

4. The description of the loss function in the results section should be placed in the materials and methods section

5. Insufficient discussion on the results, and further discussion is needed on the reasons for the misidentification

6. The content structure of the article needs to be adjusted, paying attention to tense.

Comments on the Quality of English Language

Minor editing of English language required

Back to TopTop