Next Article in Journal
“Here Are the Rules: Ignore All Rules”: Automatic Contradiction Detection in Spanish
Next Article in Special Issue
Multiple Visual-Semantic Embedding for Video Retrieval from Query Sentence
Previous Article in Journal
Highway Speed Prediction Using Gated Recurrent Unit Neural Networks
Previous Article in Special Issue
Wheel Hub Defects Image Recognition Base on Zero-Shot Learning
 
 
Article
Peer-Review Record

A New Real-Time Detection and Tracking Method in Videos for Small Target Traffic Signs

Appl. Sci. 2021, 11(7), 3061; https://doi.org/10.3390/app11073061
by Shaojian Song *, Yuanchao Li, Qingbao Huang and Gang Li
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2021, 11(7), 3061; https://doi.org/10.3390/app11073061
Submission received: 28 February 2021 / Revised: 19 March 2021 / Accepted: 26 March 2021 / Published: 30 March 2021
(This article belongs to the Special Issue Artificial Intelligence for Computer Vision)

Round 1

Reviewer 1 Report

The proposed method is not well explained and does not seem to differ substantially from previous methods. However, the results seem to be better and more effort should be made to highlight the strengths of the proposed method.

Errata have been found in the text and some of the figures.

Author Response

Reviewer #1, Concern #1: The proposed method is not well explained and does not seem to differ substantially from previous methods.

Author responses: Thanks a lot for your suggestions. The authors have modified the structure of proposed improved YOLOv3 and added a detailed description of the proposed method so as to explain the essential differences with the existing approach more clearly (seen in revised manuscript page 5 and 6). Furthermore, how to trade off the real-time performance and high precision of the detection, recognition and tracking in videos is a very challenging task for autonomous vehicles or assisted driving systems. Although YOLOv3 is the most popular architecture in this field, but it has several drawbacks: 1) Low detection and recognition of the small signs in large images. 2) It is hard to balance the real-time performance and detection precision because of the deep learning architecture and high computational costs. 3) The poor tracking capability of the original YOLOv3.

  Hence, aim to make the architecture of the YOLOv3 more fit to detect small traffic signs and make it capable of better tracking capability, the authors modified the architecture of the original YOLOv3 (seen in Figure.3 and Figure.4). In order to highlight the proposed method on the scale of small target detection performance, we added the predicting results of each output layer in Figure. 3 and the change of feature maps are modified to make the whole structure more clear. The detailed steps are follows:

1) Adding one extra layer as an alternative to one of the original layer on the CNN architecture and remove the unnecessary layer because traffic signs in real traffic are mostly small objects. On the one hand, the detection ability of small targets can be enhanced; on the other hand, redundant layer can be deleted to reduce computing cost to improve real-time performance.

2) Significant external factors such as camera movement, occlusion, blur, etc. are also inherent issues that may occur with traffic signs in the real traffic environment in the video detection, the bounding box is prone to flickering and missing targets, resulting in missed detections and false detections. Hence, combined with our improved YOLOv3, Deep-Sort is integrated in the detection method of the improved YOLOv3 to reduce false detections and missed detections in video detection,

What is more, at present, most of state-of-the-art methods only detect 45 categories of common traffic signs, which is far from meeting the requirements of the autonomous vehicles or assisted driving systems in real-world scenes. And our method has been used to detect the most traffic signs (152 categories), and solves the problem of low frequency of some traffic signs existing in the TT100K dataset by using the several data augmentation methods, including Gaussian Noise, Shot Noise, Impulse Noise, Defocus Blur, Frosted Glass Blur, Motion Blur, Zoom Blur, Snow, Frost, Fog, Brightness, Contrast, Elastic, Pixelate, and JPEG.

Our method has achieved better results in terms of small targets, multi classification and real-time performance compared with current state-of-the-art methods.

 Reviewer #1, Concern #2: However, the results seem to be better and more effort should be made to highlight the strengths of the proposed method.

 Author responses: Thank you for your suggestion. The authors found that the advantages of the method presented in this paper were not explicitly highlighted in the manuscript. The authors have further refined the advantages of the proposed method in the revised manuscript in the part of Results and Conclusion.

 Reviewer #1, Concern #3: Errata have been found in the text and some of the figures.

Author responses: The authors thank for the reviewer’s comment. The authors have repeatedly checked the figures and text in the manuscript and corrected any possible mistakes. Please see the revised one.

Reviewer 2 Report

The authors present a methodology that is an improvement of the well-established YOLOv3 network. The network is modified in order to be able to discriminate traffic signs (small objects in the visual field) from a camera feed. The proposed approach seems to perform well and achieve a high-precision classification performance. Furthermore, it is nice to see that the authors not only compared their approach to the original YOLOv3 but to other state-of-the-art methodologies providing an even better idea of their improvement.

Author Response

Reviewer #2, Concern #1: The authors present a methodology that is an improvement of the well-established YOLOv3 network. The network is modified in order to be able to discriminate traffic signs (small objects in the visual field) from a camera feed. The proposed approach seems to perform well and achieve a high-precision classification performance. Furthermore, it is nice to see that the authors not only compared their approach to the original YOLOv3 but to other state-of-the-art methodologies providing an even better idea of their improvement.

 

Author responses: The authors appreciate the reviewer’s comment. We have supplemented some detailed description of the proposed method. In addition, we have asked a kind professor whose native language is English, to polish our manuscript so as to make it more fluent and easy to understand. Please see in the revised manuscript.

Reviewer 3 Report

In the paper, a new learning method combined with improved YOLOv3(You Only Look Once) and a multi-object tracking algorithm (Deep-Sort) is proposed to detect small target traffic signs. My concerns are as follows:

1) Why authors use YOLOv3 instead of newer version of YOLO?

2) Method 1 should be better explained in the paper. 

3) Authors should read the manuscript carefully to correct typos and English mistakes.

 

Author Response

Reviewer #3, Concern #1: Why authors use YOLOv3 instead of newer version of YOLO?

Author responses: The authors appreciate for reviewer’s comment. The main reason why we use YOLOV3 is that YOLOV3 is a typical one-stage object detection algorithm, which can better meet unmanned tasks in terms of accuracy and speed. Although the YOLOv4 algorithm has been proposed, compared with YOLOV3, it only transplants some tricks of other object detection algorithms, which is an integrated innovation. There is no essential difference between the two methods. What’s more, the application range of YOLOv4 is limited due to the addition of the best tricks of the current detection algorithms. YOLOv3, because of its simple structure and its support for all kinds of industrial hardware, is more suitable for the task of Real-World unmanned driving to be solved in this paper. The reason is same too for YOLOv5.

Reviewer #3, Concern #2: Method 1 should be better explained in the paper.

Author responses: The authors thank the reviewer’s comment which helps us to explain Method 1. The authors have updated the manuscript by adding additional explanations for the detection algorithm which is helpful for the comprehension of Method 1(seen page 5 and 6 in the revised manuscript).

Reviewer #3, Concern #3: Authors should read the manuscript carefully to correct types and English mistakes.

Author responses: We are sorry for some typos and grammar errors in last manuscript. We have asked a kind professor whose native language is English, to polish our manuscript so as to make it more fluent and easy to understand. Please see in the revised manuscript.

 

Round 2

Reviewer 3 Report

Authors addressed my concerns in the paper.

Back to TopTop