Next Article in Journal
Infrared Thermography as a Non-Invasive Tool in Musculoskeletal Disease Rehabilitation—The Control Variables in Applicability—A Systematic Review
Next Article in Special Issue
Use of Artificial Neural Networks to Predict the Progression of Glaucoma in Patients with Sleep Apnea
Previous Article in Journal
Research on Predicting Remain Useful Life of Rolling Bearing Based on Parallel Deep Residual Network
Previous Article in Special Issue
Energy-Aware Multi-Objective Job Shop Scheduling Optimization with Metaheuristics in Manufacturing Industries: A Critical Survey, Results, and Perspectives
 
 
Article
Peer-Review Record

Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4

Appl. Sci. 2022, 12(9), 4301; https://doi.org/10.3390/app12094301
by Jingxin Fang and Xuwei Li *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2022, 12(9), 4301; https://doi.org/10.3390/app12094301
Submission received: 7 March 2022 / Revised: 17 April 2022 / Accepted: 23 April 2022 / Published: 24 April 2022
(This article belongs to the Special Issue Applied Artificial Intelligence (AI))

Round 1

Reviewer 1 Report

It's a well written article with significant scientific contribution, improving the performance of the deep learning techniques for the detection of irregular behavior at the power substations. A custom data set was created which should be shared with research community to later reproduce the results and further compare their proposed techniques.

Author Response

Thank you very much for your comments and suggestions. Based on your suggestions, we have communicated with the related company of State Grid Corporation of China which the dataset belongs, thus to determine whether the dataset can be shared. However, we have been informed that there is a non-disclosure agreement for this dataset, so we are very sorry that this dataset cannot be shared.

Reviewer 2 Report

This is an interesting contribution both in terms of the proposed enhancements in deep learning architecture, as well as the actual application area. The majority of works published in the state-of-the-art are concerned with detection of helmets/PPE, however, in this work, the problem is being tackled systematically in the multi-class setting.

The manuscript is very well written and presented. The results, in particular, the ablation study, shed light into the workings of the proposed system, and provide a good comparison with traditional deep learning methods.

Minor comments:

Technical:

  1. My main concern is the influence of gamma correction on the performance of the system. From the experimental results, it is clear that based on the specific dataset and object classes, the optimal choice is a gamma of 1.2, which increases the mAP. Could the authors comment on an appropriate method for the automated selection of the gamma correction parameter?
  2. The second area for improvement is to shed more light into the performance of the system and its comparison with traditional methods by enhancing the reporting with additional statistical measures, e.g., including statistics such as precision, recall, F1-score, Cohen kappa score, AUC, etc, particularly important in understanding performance in each of the object classes.
  3. Please include recommendations for further work in the conclusions section.

Non-technical:

  1. Line 108: change to "with a total of 494 images."
  2. Line 109: fix the repetion: "from the from the photographs"
  3. Line 115: provide a reference for the LabelImG tool
  4. Line 114 (and subsequently): What is meant by pre-compensation/pre-compensated?
  5. Figure 1: please add some comments in the associated text in regards to the impact of the various values of gamma in the context of object detection.
  6. Line 386: legend of Table 4 needs to be updated.

Author Response

Point 1: My main concern is the influence of gamma correction on the performance of the system. From the experimental results, it is clear that based on the specific dataset and object classes, the optimal choice is a gamma of 1.2, which increases the mAP. Could the authors comment on an appropriate method for the automated selection of the gamma correction parameter?

Reply 1:

Thank you very much for your comments and suggestions. We have also analyzed and discussed the selection of γ in gamma correction, which is also one of our recent research works. Our current idea is to convert the input image into the HSV color space, and then use the Gaussian function to obtain the pixel mean of the H, S, and V channels, and then the average of the three channels is calculated again as the value of γ. Finally, the input image of HSV color space is converted back to the RGB color space, and the adaptive gamma correction is performed with the obtained value of γ. We believe that we can solve the setting and selection of γ to achieve automatic gamma correction of images.

These related words have been added in the revised manuscript, the further work in the conclusions section, see Page 14, the text highlighted in light blue color.

 

Point 2: The second area for improvement is to shed more light into the performance of the system and its comparison with traditional methods by enhancing the reporting with additional statistical measures, e.g., including statistics such as precision, recall, F1-score, Cohen kappa score, AUC, etc, particularly important in understanding performance in each of the object classes.

Reply 2:

Thank you for your comment and advice. The F1-score performance evaluation index has been added in the revised manuscript, The details are as follows:

The fourth evaluation index is F1-score, which represents the harmonic mean of precision and recall, and also reflects the detection accuracy of the model on each class.

The above contents have been added in the revised manuscript, see Page 9-10, the text is highlighted in light blue color.

At the same time, Table 1* has been added into the comparison with different detection algorithms. The improved YOLOv4 is generally better than Faster RCNN, SSD, YOLOv3 and YOLOv4. The F1-score of improved YOLOv4 in the category of the glove is only 0.08 lower than YOLOv4, and the F1-score of improved YOLOv4 in the category of the person is only 0.02 lower than YOLOv3. The relevant contents can be seen in Page 12, Section 4.2.3, the second paragraph, which is highlighted in light blue color.

Table 1*. The F1-scores of each class under different detection methods.

Methods

The F1-score of each class

badge

glove

Helmet

operatingbar

person

powerchecker

Faster RCNN

0.29

0.5

0.79

0.64

0.78

0.41

SSD

0.51

0.56

0.86

0.6

0.9

0.53

YOLOv3

0.82

0.69

0.91

0.59

0.91

0.55

YOLOv4

0.81

0.82

0.93

0.61

0.9

0.58

Improved YOLOV4

0.83

0.74

0.93

0.74

0.89

0.62

 

Point 3: Please include recommendations for further work in the conclusions section.

Reply 3:

Thank you very much for your comments and suggestions. We have added the further works in the conclusions section in the revised manuscript, the details are as follows:

Some problems should be further studied and solved in the future. The indoor and outdoor environments of substations make the images with various backgrounds. The effective and automatic image preprocessing methods are need to be studied. The accuracy and robustness of the detection model have a strong dependence on training image samples. More image samples should be added into the dataset. In addition, the object model would be deployed on computing devices to develop a visual image analysis and real-time detection system. so related hardware issues and practical application effects will be studied and verified in the future.

The above contents have been added in the revised manuscript, see Page 14, the sentence highlighted in light blue color.

 

Point 4: Non-technical:

  1. Line 108: change to “with a total of 494 images.”
  2. Line 109: fix the repeation: “from the from the photographs”
  3. Line 115: provide a reference for the LabelImg tool.
  4. Line 114 (and subsequently): What is meant by pre-compensation/pre-compensated?
  5. Figure 1: please add some comments in the associated text in regards to the impact of the various values of gamma in the context of object detection.
  6. Line 386: legend of Table 4 needs to be updated.

Reply 4:

Thank you very much for your comments and suggestions. We have made the following changes based on your suggestions:

  1. “with a total of ”-> “with a total of 494 images.” See Page 3, the text highlighted in light blue color.
  2. “from the from the photographs”-> “from the photographs” See Page 3, the text highlighted in light blue color.
  3. The reference for the LabelImg tool has been added. See Page 15, the text highlighted in light blue color.
  4. The meaning of pre-compensation is to generate nonlinear distortion of the original input image through the characteristic index γ, and the degree of distortion is determined by the value ofγ.
  5. “When the γ tends to 0 or 1, the image is extremely bright or extremely dark, and the excessive correction results in low image contrast.” This sentence has been added in the associated text, See Page 4, the text highlighted in light blue color.
  6. The legend of Table 4 has been updated to “Detection results under different values of γ”.

Author Response File: Author Response.docx

Reviewer 3 Report

The presented paper describes a modification of different part of algorithm used for object detection. Before publishing in my opinion it must be improved. Some hints can be taken into consideration:

  • The title of given paper leads to misunderstanding. Irregular Behavior Detection stands more in action recognition but the paper describes object detection instead.
  • Irregular Behavior Detection... should have some shortcut which has to be used each time in text instead of using the whole name.
  • What is the reason of using 9:1 of train/test? This is very high division in advantage to train set. Mostly often one can meet with ratio 8:2, 7:3 or as in work "Analysis of Semestral Progress in Higher Technical Education with HMM Models" of Lach et. al. even 66:34. Such a division makes sense - here in paper problem is much easier.
  • Why in such a case there is no cross validation? Both ratio and cross validation (at least 5 fold) should be calculated in order to get any conclusions.
  • What would be final results if into test set data containing normal helmets (or other objects) one would add some images with hat or cap?
  • The description of why the authors can detect irregular behavior basing on object detection should be added.
  • Are the models trained from the beginning or on already trained models with known benchmark such as COCO?

Author Response

Point 1: The title of given paper leads to misunderstanding. Irregular Behavior Detection stands more in action recognition but the paper describes object detection instead.

Reply 1:

Thank you very much for your advice. The title of this paper has been modified as “Object Detection Related to Irregular Behaviors of Substation Personnel Based on Improved YOLOv4” in the revised manuscript, and the title is highlighted in purple color in Page 1.

 

Point 2: Irregular Behavior Detection... should have some shortcut which has to be used each time in text instead of using the whole name.

Reply 2:

Thank you very much for your suggestion. We added the abbreviation "IBD" where "Irregular Behavior Detection" first appeared, and changed all other "Irregular Behavior Detection" in the text to the "IBD" in the revised manuscript, the text highlighted in purple color.

 

Point 3: What is the reason of using 9:1 of train/test? This is very high division in advantage to train set. Mostly often one can meet with ratio 8:2, 7:3 or as in work "Analysis of Semestral Progress in Higher Technical Education with HMM Models" of Lach et. al. even 66:34. Such a division makes sense - here in paper problem is much easier.

Reply 3:

Thank you very much for your suggestion. To explain why this paper chooses 9:1 as the sample ratio, we have added a new section 4.2.2 " Detection results under different sample ratios" to the section 4.2 “Influence Factors and Result Analysis”. The details are follows:

In order to analyze the influence under different sample ratios of the IBD dataset on the detection precision, the ratios of the train_val set and the test set were taken as 9:1, 8:2, 7:3, 6:4 and 5:5. The performance indexes including the AP of each class and mAP of the test results under different sample ratios are shown in Table 2*.

Table 2*. Detection results under different detection methods.

Ratios

The AP of each class (%)

mAP (%)

badge

glove

Helmet

operatingbar

person

powerchecker

5:5

82.15

74.51

92.97

62.27

92.02

59.94

77.31

6:4

83.53

76.40

94.40

63.63

92.41

63.03

78.90

7:3

84.01

74.80

94.93

65.84

91.95

60.31

78.64

8:2

83.92

76.88

95.51

68.56

93.15

70.07

81.35

9:1

84.88

75.96

94.30

74.96

92.80

65.76

81.44

As can be seen from Table 3, as the ratios of train_val set to test set increases, the AP of each class and mAP generally show an upward trend, and the ratio of 9:1 has the greatest impact on the categories of operatingbar and powerchecker. When the ratio is 9:1, all performance indexes have the maximum values, and the mAP can reach 81.44%. In order to improve the generalization ability of the model, as many training samples as possible should be selected to improve the detection accuracy of the practical application of the model, so the sample ratio in this paper was set as 9:1.

The above contents have been added in the revised manuscript, see Page 11, the section 4.2.2 highlighted in purple color.

 

Point 4: Why in such a case there is no cross validation? Both ratio and cross validation (at least 5 fold) should be calculated in order to get any conclusions.

Reply 4:

Thank you very much for your comments. We have used 10-fold cross validation in the experimental process, that is, 90% of the train_val set is used as the training set and 10% of the train_val set is used as the validation set. In each epoch of the model training process, 10-fold cross validation is randomly performed to optimize the model parameters and improve the detection precision of the model.

 

Point 5: What would be final results if into test set data containing normal helmets (or other objects) one would add some images with hat or cap?

Reply 5:

Thank you very much for your comments. There are many person wearing hats or caps in the IBD dataset in this paper. Since it is not marked as a specific category during the labeling process, the trained model will also not detect it, but there will be person wearing hats or caps mistakenly detected as person wearing helmets, which reflects the model has strong dependence on data. If a large number of relevant samples are added into the dataset, it is possible to improve the performance of the model and greatly reduce false detections.

 

Point 6: The description of why the authors can detect irregular behavior basing on object detection should be added.

Reply 6:

Thank you very much for your comments. In the revised manuscript, we added the description of why the irregular behaviors can be detected based on the object detection. The details are follows:

Through the detection of the objects that substation personnel should wear and carry when working, it can reflect whether the substation personnel have irregular behaviors. If the control center finds the irregular behaviors, it can stop them in time to avoid personal safety accidents.

The contents have been added into the revised manuscript, see Page 3, Section 2.1, the first paragraph, which are highlighted in purple color.

 

Point 7: Are the models trained from the beginning or on already trained models with known benchmark such as COCO?

Reply 7:

Thank you for your comment. The model was obtained by training on the IBD dataset based on a pre-trained model, which we mentioned in the experimental process. The pre-trained model was trained on the large public dataset of ImageNet, which is similar to datasets such as COCO and VOC.

Author Response File: Author Response.docx

Round 2

Reviewer 3 Report

Thank you for corrections.

Back to TopTop