Next Article in Journal
Packet Reordering in the Era of 6G: Techniques, Challenges, and Applications
Previous Article in Journal
An Optimal Network-Aware Scheduling Technique for Distributed Deep Learning in Distributed HPC Platforms
 
 
Article
Peer-Review Record

Improved Vehicle Detection Using Weather Classification and Faster R-CNN with Dark Channel Prior

Electronics 2023, 12(14), 3022; https://doi.org/10.3390/electronics12143022
by Ershang Tian and Juntae Kim *
Reviewer 1: Anonymous
Reviewer 2:
Reviewer 3:
Reviewer 4: Anonymous
Electronics 2023, 12(14), 3022; https://doi.org/10.3390/electronics12143022
Submission received: 10 May 2023 / Revised: 2 July 2023 / Accepted: 7 July 2023 / Published: 10 July 2023

Round 1

Reviewer 1 Report

This paper proposes a novel method for vehicle detection based on dark channel prior. Overall, this research is interesting. Some minor modifications should be made before publication.

1) Please highlight your contribution and work limitation in the paper. In addition, the parameter size and inference time should be provided for the comparison.

2) Detecting objects in the vehicle's surrounding environment is more challenging compared to detecting natural scene images. Some recent research has attempted to enhance detection performance by introducing an attention module and additional detection head. For instance, YOLOv5-Tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning. Including this work in the related work section would be valuable.

 

3) By utilizing the shared visual information from multiple vehicles and infrastructure sensors, object detection accuracy can be significantly improved. This approach is particularly effective in overcoming limitations such as occlusion and limited range of view, as it allows for the exchange of information with surrounding vehicles and infrastructure. Further details on related work can be found in: an automated driving systems data acquisition and analytics platform. Given the potential benefits of this approach, it would be appropriate to discuss this work in detail in the introduction.

Author Response

The previous response is updated.

Please refer to the attached file.

--------------------

Point 1: Please highlight your contribution and work limitation in the paper. In addition, the parameter size and inference time should be provided for the comparison.

 

Response 1: According to the reviewer's feedback, we have added a paragraph highlighting our contribution at the end of the introduction, and a sentence mentioning limitation and future direction at the end of conclution. We also added more detailed description on our model in chapter 3.1 and 4.1.

 

Point 2: Detecting objects in the vehicle's surrounding environment is more challenging compared to detecting natural scene images. Some recent research has attempted to enhance detection performance by introducing an attention module and additional detection head. For instance, YOLOv5-Tassel: detecting tassels in RGB UAV imagery with improved YOLOv5 based on transfer learning. Including this work in the related work section would be valuable.

 

Response 2: According to the reviewer's feedback, we have added following paper: W. Liu, K. Quijano and M. M. Crawford, "YOLOv5-Tassel: Detecting Tassels in RGB UAV Imagery With Improved YOLOv5 Based on Transfer Learning," in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 15, 2022

 

Point 3: By utilizing the shared visual information from multiple vehicles and infrastructure sensors, object detection accuracy can be significantly improved. This approach is particularly effective in overcoming limitations such as occlusion and limited range of view, as it allows for the exchange of information with surrounding vehicles and infrastructure. Further details on related work can be found in: an automated driving systems data acquisition and analytics platform. Given the potential benefits of this approach, it would be appropriate to discuss this work in detail in the introduction.

 

Response 3: According to the reviewer's feedback, we have added following paper: Xin Xia, Zonglin Meng, Xu Han, Hanzhao Li, Takahiro Tsukiji, Runsheng Xu, Zhaoliang Zheng, Jiaqi Ma, “An automated driving systems data acquisition and analytics platform,” Transportation Research Part C: Emerging Technologies, Volume 151, 2023

 

Author Response File: Author Response.pdf

Reviewer 2 Report

1. The novelty of this paper is limited. The proposed improved Faster R-CNN is replacing the VGG16 with ResNet101, and the author adopted the feature pyramid design for object detection. These are popular model architecture design, I would not claim them as major contribution. Besides, to prove the effectiveness of the author's particular design, there should be lots of other comparisons in experiments: e.g. ResNet50, VGG, inception net etc.

2.  the author also propose to classify the image first, then do some de-hazing operations and use faster-RCNN. What is the classification model design, what is the performance of the classification? What if the classification model made a wrong decision? these questions are not answered, and there are no experiments to support them. How do you train the model, what's the data split etc. 

3. Typo in figure 2: fully "connectde" -> connected.

4. line 73: "Faster R-CNN is an object detection model that combines a convolutional neural network (CNN) with a region proposal network (RPN)": The statement is strange, the region proposal network itself is a convolutional network. 

5. Line 219 stating the IFRCNN is then applied to both clear and haze images for vehicle detection. This statement is confusing, is it just for experiment and comparison? or it is the proposed workflow? Do you merge the results in the final steop?

6. The author also mentioned the proposed method performs better in detecting small objects, it would be more convincing if experiments are provided. 

7. The paper is lacking details in expeirment design, such as what's the dataset used for training, what is used for testing. Are there any data augmentation training used used in training? If the FRCNN will take input from de-hazed images, how many images of such are used in traning. What's the detailed breakdown for types of images. 

8. The author also mention in line 307: "In order to improve the efficiency we classify the weather inside the images, focusing on images with haze for preprocessing to make the detection more efficient." then, there should be experiment to show how much faster did the proposed method acheive by adding the classification model to filter out sunny weather. 

 

Overall: The idea of classifying the image first and then do preprocess and feed to a FRCNN is straightfoward and make sense. However, more experiments are required to show the effectiveness of this method. And there are modifications needed to make the presentation more clear.

Author Response

The previous response is updated.

Please refer to the attached file.

------------------------

Point 1: The novelty of this paper is limited. The proposed improved Faster R-CNN is replacing the VGG16 with ResNet101, and the author adopted the feature pyramid design for object detection. These are popular model architecture design, I would not claim them as major contribution. Besides, to prove the effectiveness of the author's particular design, there should be lots of other comparisons in experiments: e.g. ResNet50, VGG, inception net etc.

 

Response 1: The reviewer’s comment is correct. However, the major contribution of this paper is to combine faster RCNN with a weather classification model to improve recognition in different weather. According to the reviewer's feedback, we have added a paragraph highlighting our contribution at the end of the introduction. We also added more detailed description on our model in chapter 3.1 and 3.3.

 

Point 2: the author also propose to classify the image first, then do some de-hazing operations and use faster-RCNN. What is the classification model design, what is the performance of the classification? What if the classification model made a wrong decision? these questions are not answered, and there are no experiments to support them. How do you train the model, what's the data split etc.

 

Response 2: The classification model is the renowned VGG16 architecture, and the test accuracy of weather classification is 0.70. If the classification model made a wrong decision, the image is not processed for de-hazing, so the accuracy will be the same as base model. In training the model, 4000 training and 1000 test images are used. According to the reviewer's feedback, we have added sentences explaining these in section 3.1 and 4.1.

 

Point 3: Typo in figure 2: fully "connectde" -> connected.

 

Response 3: According to the reviewer's feedback, the Figure 2 is corrected.

 

Point 4: line 73: "Faster R-CNN is an object detection model that combines a convolutional neural network (CNN) with a region proposal network (RPN)": The statement is strange, the region proposal network itself is a convolutional network.

 

Response 4: The reviewer’s comment is correct. We have modified the paragraph as: “Faster R-CNN introduced the region proposal network and integrated feature extraction, candidate box extraction, bounding box regression, and classification into a single network, resulting in significant improvements in overall performance” in section 2.

 

Point 5: Line 219 stating the IFRCNN is then applied to both clear and haze images for vehicle detection. This statement is confusing, is it just for experiment and comparison? or it is the proposed workflow? Do you merge the results in the final steop?

 

Response 5: It means that the IFRCNN is applied to images without de-hazing. We agree that it is confusing. According to the reviewer's feedback, we have modified the sentence in section 3.3.

 

Point 6: The author also mentioned the proposed method performs better in detecting small objects, it would be more convincing if experiments are provided.

 

Response 6: Although we do not have comprehensive experiments on that, the results in Figure 6 and Table 3 shows qualitatively that the proposed method can detect more small objects. According to the reviewer's feedback, we added sentences mentioning this in section 4.4.

 

Point 7: The paper is lacking details in expeirment design, such as what's the dataset used for training, what is used for testing. Are there any data augmentation training used used in training? If the FRCNN will take input from de-hazed images, how many images of such are used in traning. What's the detailed breakdown for types of images.

 

Response 7: According to the reviewer's feedback, we added more detailed description on the dataset in section 4.1.

 

Point 8: The author also mention in line 307: "In order to improve the efficiency we classify the weather inside the images, focusing on images with haze for preprocessing to make the detection more efficient." then, there should be experiment to show how much faster did the proposed method acheive by adding the classification model to filter out sunny weather.

 

Response 8: The sentence actually mean that we improve accuracy not efficiency. According to the reviewer's feedback, we modified the sentence in conclusion. 

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper proposes a method of improving the accuracy of vehicle detection 305 using the Faster R-CNN model in adverse weather conditions. Experiments show the effectiveness of the proposed method.

I think the proposed method is reasonable and effective. However, there are some important references missing:

 Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey. Neurocomputing 2021.

 

And the figures should be reorganized, such as Figure 4.

Author Response

The previous response is updated.

Please refer to the attached file.

--------------------

Point 1: This paper proposes a method of improving the accuracy of vehicle detection 305 using the Faster R-CNN model in adverse weather conditions. Experiments show the effectiveness of the proposed method.

I think the proposed method is reasonable and effective. However, there are some important references missing:

 Deep Learning for Weakly-Supervised Object Detection and Object Localization: A Survey. Neurocomputing 2021.

 

Response 1: According to the reviewer's feedback, we have added following paper: Feifei Shao, Long Chen, Jian Shao, Wei Ji, Shaoning Xiao, Lu Ye, Yueting Zhuang, Jun Xiao, “Deep Learning for Weakly-Supervised Object Detection and Localization: A Survey,” Neurocomputing, Volume 496, 2022,

 

Point 2: And the figures should be reorganized, such as Figure 4.

 

Response 2: The size of Figure 4 is adjusted.

 

Author Response File: Author Response.pdf

Reviewer 4 Report

1.       Chapter 3. The overall structure of proposed method is shown (Fig.2). However, information about the libraries used and computational environment as well is needed.

2.       Page 3, line 112. It is not clear what dataset of images was used and how it was expanded. Is this dataset available publicly, allowing for the reproduction of results? What is the resolution of the pictures?

3.       Chapter 3.2. For the preprocessing DCP and BM3D are proposed. In the Conclusion it is said: “Different de-fogging algorithms, including image enhancement-based methods and physical model recovery-based methods, were compared, and it was found that the dark channel prior (DCP) was the most efficient one.” However, detailed research on that issue can hardly be noticed in the manuscript.

4.       Page 8, line 273. FPN network is mentioned, but not included in the Table 2. However, it is further mentioned in the conclusion. More details on the COCO dataset should be given as well.

 

5.       What library was used for IFRCNN, what are other application of the network known in the literature?

p.3 l.107 "In the previous introduction..." should be "In the introduction..."  there is only one

Author Response

The previous response is updated.

Please refer to the attached file.

--------------------

Point 1: Chapter 3. The overall structure of proposed method is shown (Fig.2). However, information about the libraries used and computational environment as well is needed.

 

Response 1: In this paper, the Keras deep learning framework, based on TensorFlow, was used to construct and train the model. The model is developed in PyCharm and PyQt5 environment. According to the reviewer's feedback, we have modified description about the libraries used and computational environment in section 3.1 and 4.1

 

Point 2: Page 3, line 112. It is not clear what dataset of images was used and how it was expanded. Is this dataset available publicly, allowing for the reproduction of results? What is the resolution of the pictures?

 

Response 2: We have used the COCO dataset as described in section 3. In addition to the COCO dataset, we have gathered images from online repositories such as Baidu Image Library. These images encompassed diverse modes, including aerial, camera captures, news footage, traffic accidents, and car data recorders, captured from multiple angles to depict complex scenes. The resolution of the images are 224×224 pixels. According to the reviewer's feedback, we have added more description on the dataset in section 3.1 and section 4.

 

Point 3: Chapter 3.2. For the preprocessing DCP and BM3D are proposed. In the Conclusion it is said: “Different de-fogging algorithms, including image enhancement-based methods and physical model recovery-based methods, were compared, and it was found that the dark channel prior (DCP) was the most efficient one.” However, detailed research on that issue can hardly be noticed in the manuscript.

 

Response 3: The reviewer’s point is correct. According to the reviewer's feedback, we have modified the sentence mentioning that we used the dark channel prior (DCP) which is physical model recovery-based methods and one of the most efficient de-fogging algorithms, and we also used BM3D to enhance the image by handling the noise, in conclusion section.

 

Point 4: Page 8, line 273. FPN network is mentioned, but not included in the Table 2. However, it is further mentioned in the conclusion. More details on the COCO dataset should be given as well.

 

Response 4: In table 2, IFRCNN(ours) is the model including FPN. According to the reviewer's feedback, we have added sentence to make it clear in section 4.3. We also added more description on the COCO dataset in section 3.1 and section 4.

 

Point 5: What library was used for IFRCNN, what are other application of the network known in the literature?

 

Response 5: The Keras deep learning framework, based on TensorFlow, was used to construct and train the model. Our research primarily focuses on vehicle detection from CCTV images. Additionally, it can be extended to the ground vehicle detection and identification by unmanned aerial vehicles (UAVs). We have mentioned it in conclusion according to the reviewer's feedback.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

Thank you for the response. 

However, I still found the comments author makes are not very convincing. 

1. The claim of improving Faster-R-CNN (see chapter 3.3.) is not convincing and not supported by experiments. Experiment in table 2 is also lacking a fair comparison: what about Faster R-CNN + Resnet backbone? etc. 

2. The most significant weakness is that there are no details for the weather classification model. The authors just mentioned they use CNN, which is very vague. Do the authors use VGG or Resnet or other networks for classification? is the model pretrained on the other dataset before? what's the accuracy or performance of the classification model. I believe these are important for this proposed framework, since the classification model will decide the next operations for each image. 

2.  Chapter 4.1 is very confusing and hard to understand. The author didn't provide performance for the classification model; Figure 4 is showing the training and validation loss and accuracy during training. what's the test set mentioned in line 242? how many images are in it? what's the split? 

3. Line 245: "Thus, the accuracy and loss function of the weather detection model are improved by model training, which leads to improved model efficiency. " what does it mean? how does it lead to improved model efficiency? what does it mean by "loss function of the weather detection model is improved" ?

4. There are multiple places the author claims improving efficiency, yet there are few expeirments supporting the claim. 

5. Typo still shown in figure 2: fully "connectd" -> connected.

 

Author Response

The round 1 response is also updated.

This is round 2 response. A file is also attached .

--------------------------

Point 1: The claim of improving Faster-R-CNN (see chapter 3.3.) is not convincing and not supported by experiments. Experiment in table 2 is also lacking a fair comparison: what about Faster R-CNN + Resnet backbone? etc.

 

Response 1: In Table 2, Feature Extraction Network and Regional Advice Network mean Faster RCNN. The result shows that by using ResNet and FPN the performance is increased. According to the reviewer's feedback, we changed the expression in Table 2.

 

Point 2: The most significant weakness is that there are no details for the weather classification model. The authors just mentioned they use CNN, which is very vague. Do the authors use VGG or Resnet or other networks for classification? is the model pretrained on the other dataset before? what's the accuracy or performance of the classification model. I believe these are important for this proposed framework, since the classification model will decide the next operations for each image.

 

Response 2: The classification model is the VGG16 architecture, and the test accuracy of weather classification is 0.70. We used the pretrained model and fine-tuned it with our weather dataset which consists of 3000 images from COCO dataset and 2000 collected and augmented images. According to the reviewer's feedback, we added these descriptions in section 3.1 and 4.1

 

Point 2-1: Chapter 4.1 is very confusing and hard to understand. The author didn't provide performance for the classification model; Figure 4 is showing the training and validation loss and accuracy during training. what's the test set mentioned in line 242? how many images are in it? what's the split?

 

Response 2-1: For the training of weather classification model, we used dataset with 4000 training and 1000 test images. the test accuracy of weather classification is 0.70. According to the reviewer's feedback, we added these descriptions in section 4.1

 

Point 3: Line 245: "Thus, the accuracy and loss function of the weather detection model are improved by model training, which leads to improved model efficiency. " what does it mean? how does it lead to improved model efficiency? what does it mean by "loss function of the weather detection model is improved" ?

 

Response 3: The reviewer’s comment is correct. The sentence that the reviewer pointed out is inappropriate. It actually means that the accuracy is improved through training. We have changed the sentences in section 4.1.

 

Point 4: There are multiple places the author claims improving efficiency, yet there are few expeirments supporting the claim.

 

Response 4: The reviewer’s comment is correct. It actually means improving accuracy not efficiency. According to the reviewer's feedback, we modified all the sentences mentioning efficiency to accuracy.

 

Point 5: Typo still shown in figure 2: fully "connectd" -> connected.

 

Response 5: It is corrected.

 

Author Response File: Author Response.pdf

Round 3

Reviewer 2 Report

Thanks for addressing the concerns. 

Back to TopTop