Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessCommunication

Peer-Review Record

Short Communication: Detecting Heavy Goods Vehicles in Rest Areas in Winter Conditions Using YOLOv5

Algorithms 2021, 14(4), 114; https://doi.org/10.3390/a14040114

by Margrit Kasper-Eulaers^*,†, Nico Hahn^†

, Stian Berger, Tom Sebulonsen, Øystein Myrland and Per Egil Kummervold^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Algorithms 2021, 14(4), 114; https://doi.org/10.3390/a14040114

Submission received: 28 February 2021 / Revised: 26 March 2021 / Accepted: 29 March 2021 / Published: 31 March 2021

(This article belongs to the Special Issue Machine-Learning in Computer Vision Applications)

Round 1

Reviewer 1 Report

In the abstract it should say “Proper” not “Propper”, please do a spell check.

In line 42 YOLO should be spelled out You Only Look Once (YOLO) and the YOLO web site and seminal paper should be cited. This is a serious issue.

https://pjreddie.com/darknet/yolo/

Redmon 2016 CVPR “You Only Look Once: Unified, Real-Time Object Detection”

In line 50 I do not believe complexity is a real motivation for the use of CNN; rather ease of use, pretrained models that can be tuned, availability, and performance are stronger motivators. If complexity is REALLY why it was chosen the alternatives should be cited.

I believe it would have been interesting to compare the results with a YOLO detector “out of the box”, as they already detect cars and trucks without further training. The thermal image modality is different than the pretrained models but it would be further validation of the performance improvements the authors obtained.

The Colab notebook should be cited in the references. It should also be made clear that the algorithm is not training a YOLO network ‘from scratch’, I suspect it is performing a “transfer learning” function where only the upper layers are trained.

The abstract mentions “snowy conditions” and I incorrectly assumed that mean when snow was actively falling. I am curious to know how actively snowing impacts the algorithm and I would imagine that would be a very interesting case since that is when people may be likely to stop for a break. This may be beyond the scope of the paper, but I think it is worth mentioning in the Discussion section.

Author Response

Dear reviewer,

We are very grateful for your time and valuable input in your review of our manuscript, entitled “Detecting Heavy Goods Vehicles in Rest Areas under Winter Conditions Using YOLOv5”. We appreciate your constructive remarks which have increased the quality of the manuscript and our project.

In the following we (indicated as AUTHORS) address each point brought up by the reviewers (indicated as REV) and outline the revisions we have made to the manuscript (alongside the line numbers referring to the document revised_with_change_tracking).

Open Review 01 Mar 2021 14:06:24

REV: In the abstract it should say “Proper” not “Propper”, please do a spell check.

AUTHORS: Thank you for pointing out this typo. We did a spell check of the entire manuscript and had two additional proofreaders checking for language mistakes and typos.

REV: In line 42 YOLO should be spelled out You Only Look Once (YOLO) and the YOLO web site and seminal paper should be cited. This is a serious issue.

https://pjreddie.com/darknet/yolo/

Redmon 2016 CVPR “You Only Look Once: Unified, Real-Time Object Detection”

AUTHORS: Thank you for pointing this out, we agree. We have now spelled out You Only Look Once (YOLO) at it first occurrence at line 48. We also added the two references, please see line 63 and numbers 51 and 52 in the references.

REV: In line 50 I do not believe complexity is a real motivation for the use of CNN; rather ease of use, pretrained models that can be tuned, availability, and performance are stronger motivators. If complexity is REALLY why it was chosen the alternatives should be cited.

AUTHORS: Thank you for challenging us on this question. See lines 57-63. and 66-73. for our changes, where we explain why this pre-trained network fits well to detect trucks in the images of our case study:
“The decision to use convolutional neural networks was made due to their ease of use. There are a number of pre-trained models that can be tuned for a variety of tasks. They are also readily available, computationally inexpensive and show good performance metrics. Object recognition systems from the YOLO family [51,52] are often used for vehicle recognition tasks, e.g. [27–29,37] and have been shown to outperform other target recognition algorithms [53,54]. YOLOv5 has proven to significantly improve the processing time of deeper networks [50]. This attribute will gain in importance when moving forward with the project to bigger datasets and real-time detection. YOLOv5 is pre-trained on the Common Objects in Context (COCO) dataset, an extensive dataset for object recognition, segmentation, and labelling. This dataset contains over 200k labelled images with 80 different classes, including the classes car and truck [50,55]. Therefore, YOLOv5 can be used as such to detect heavy goods vehicles and can be used as a starting point for an altered model to detect heavy goods vehicle features like their front and rear.”

REV: I believe it would have been interesting to compare the results with a YOLO detector “out of the box”, as they already detect cars and trucks without further training. The thermal image modality is different than the pretrained models but it would be further validation of the performance improvements the authors obtained.

AUTHORS: Thank you very much for this suggestion. We added the comparison with a baseline model (YOLOv5 with COCO weights) to the manuscript to illustrate the improvements we could reach with our approach, see line 144-157, Table 2 and Figure 7: “To evaluate the model trained with the rest area dataset, we compared it to YOLOv5 [50], without it being trained on any additionally data as a baseline model only using COCO weights. This model contains amongst other classes, the car and truck class, however it does not distinguish between truck_front and truck_back. Table 2 shows the accuracy of the baseline and the altered model for the four available classes. The baseline model, which is trained on heavy goods vehicles as a whole, had difficulties detecting them on the test images of the rest area dataset. It either did not recognize the trucks or it did so with much less certainty than the altered model with the two new classes. The additional training also improved detecting heavy goods vehicles on images on which the cabin was cut off. Some examples of the detection for the test data of the two models are shown in Figure 7.”

REV: The Colab notebook should be cited in the references.

AUTHORS: We added the notebook as reference number 58 cited at line 114.

REV: It should also be made clear that the algorithm is not training a YOLO network ‘from scratch’, I suspect it is performing a “transfer learning” function where only the upper layers are trained.

AUTHORS: Thank you for pointing out this potential misunderstanding. To make it more clear on how we proceeded we elaborate in the abstract (line 8) that we perform transfer learning and explain it as well in the training section at line 114-116: “We used a notebook developed by Roboflow.ai [58] which is based on YOLOv5 [50] and uses pre-trained COCO weights. We added the rest area dataset and adjusted the number of epochs to be trained as well as the stack size to train the model for our classes.”

REV: The abstract mentions “snowy conditions” and I incorrectly assumed that mean when snow was actively falling. I am curious to know how actively snowing impacts the algorithm and I would imagine that would be a very interesting case since that is when people may be likely to stop for a break. This may be beyond the scope of the paper, but I think it is worth mentioning in the Discussion section.

AUTHORS: This is a very interesting question. As part of the project, we were in touch with two truck drivers’ associations and the road authorities and learned that trucks are well equipped for the winter conditions and that roads are cleared quickly from falling snow. According to what we have learned, active snowfall does not seem to be a challenge and also in winter truck traffic is highly schedule driven. We added this information at line 20-21 to stress that the time when trucks arrive at the rest area is dependent on their schedule and not on the weather: “Due to this regulations and contractual delivery agreements, heavy goods vehicle traffic is highly schedule driven.”. To explain more thoroughly, why we refer to challenging winter conditions, we added some information about snow covered markings in line 41-42 “In winter, the markings of the parking spots are covered by snow and ice and therefore not visible” and added Figure 2 to show the effect of precipitation on the images of the thermal camera. Images with precipitation look foggy. To describe in more detail what we mean with winter condition, we also added the information that during the winter in Northern Scandinavia, there is the polar night. The last polar night of this winter was in the beginning of the project such that most part of the day was still dark during the data collection. We mention the polar night at line 5 and refer to the different light conditions at 163-165: “Furthermore, the camera delivered usable images for all light and weather conditions that occurred during the project period.”

While reviewing, we noticed that it is difficult to see the augmentation described at line 106-108 in Figure 4 “Examples of augmented training data illustrating horizontal mirroring, resizing and changes of the grey scale.” Therefore, we exchanged the pictures in Figure 4, with pictures that are more characteristic and illustrate the augmentation done before applying YOLOv5 better.

We wish again to thank you for you time to be critical about our work and hence improve our manuscript. We hope that you appreciate our replies and changes and agree that the manuscript is now up to standard for publication.

Yours sincerely,

Margrit Kasper-Eulaers (on behalf of all co-authors)

Author Response File: Author Response.docx

Reviewer 2 Report

The paper implemented CNN to detect the front and rear of heavy good vehicles based on YOLO. The paper is written clearly and organized well. However, the reported accuracy in Table 2 is not competitive for typical vision-based object detection, especially for only three categories. Other concerns are as follows.

Literature reviews are not updated. Most reference paper are before 2019.
Figure 4 is not explained in the manuscript. What is the difference of Box, Objectness, and Classification?
Promising detection stated in the abstract is not supported by the experimental results.
Conclusion should be added in the end.

Author Response

Dear reviewer,

Open Review 12 Mar 2021 07:19:56

REV: The paper implemented CNN to detect the front and rear of heavy good vehicles based on YOLO. The paper is written clearly and organized well.

AUTHORS: Thank you, we appreciate the positive feedback as well as the time for reviewing our manuscript.

REV: However, the reported accuracy in Table 2 is not competitive for typical vision-based object detection, especially for only three categories.

AUTHORS: Thank you very much for your feedback. We are aware of that the model still needs to be improved and we describe in the discussion ideas to improve the model further. We added the comparison with a baseline model (YOLOv5 with COCO weights) to the manuscript to illustrate that the detection of trucks in our dataset is not a trivial task, that can be solved with an out of the box model. Compared to the baseline model we could show improvements with our approach, please see line 144-157, Table 2 and Figure 7: “To evaluate the model trained with the rest area dataset, we compared it to YOLOv5 [50], without it being trained on any additionally data as a baseline model only using COCO weights. This model contains amongst other classes, the car and truck class, however it does not distinguish between truck_front and truck_back. Table 2 shows the accuracy of the baseline and the altered model for the four available classes. The baseline model, which is trained on heavy goods vehicles as a whole, had difficulties detecting them on the test images of the rest area dataset. It either did not recognize the trucks or it did so with much less certainty than the altered model with the two new classes. The additional training also improved detecting heavy goods vehicles on images on which the cabin was cut off. Some examples of the detection for the test data of the two models are shown in Figure 7.”

REV: Literature reviews are not updated. Most reference paper are before 2019.

AUTHORS: We have now updated our literature review, please see line 31-39, now presenting a review in which in total 23 of 49 references are from 2019 or more recent.

REV: Figure 4 is not explained in the manuscript. What is the difference of Box, Objectness, and Classification?

AUTHORS: Thank you for pointing out these lacking info. We have added an explanation in line 120-126: “There are three different types of loss shown in Figure 5: box loss, objectness loss and classification loss. The box loss represents how well the algorithm can locate the centre of an object and how well the predicted bounding box covers an object. Objectness is essentially a measure of the probability that an object exists in a proposed region of interest. If the objectivity is high, it means that the image window is likely to contain an object. Classification loss gives an idea of how well the algorithm can predict the correct class of a given object..

REV: Promising detection stated in the abstract is not supported by the experimental results.

AUTHORS: We have now used more nuanced wording in the abstract to not overpromise on our results and to be more precise in how we improve. Please see line 12-14: “we show an improvement in detecting heavy goods vehicles using their front and rear instead of the whole vehicle”

REV: Conclusion should be added in the end.

AUTHORS: At line 200-209, we give a brief summary of our conclusion: “Section 4 shows that there are many steps that still need to be taken to improve the detection of heavy goods vehicles on rest areas. However, we have already shown that when analyzing images from small angle cameras to detect objects that occur in groups and have a high number of overlaps and cut-offs, the model can be improved by detecting certain characteristic features instead of the whole object. Furthermore, the usage of thermal network cameras has proven to be valuable given the purpose of the project and the dark and snowy winter conditions in Northern Scandinavia. We are confident that with a bigger training set and the implementation of the changes suggested in section 4, the algorithm can be improved even further.”

We wish again to thank you for your time to be critical about our work and hence improve our manuscript. We hope that you appreciate our replies and changes and agree that the manuscript is now up to standard for publication.

Yours sincerely,

Margrit Kasper-Eulaers (on behalf of all co-authors)

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The suggested changes and issues were resolved adequately.

Article Menu

Short Communication: Detecting Heavy Goods Vehicles in Rest Areas in Winter Conditions Using YOLOv5

Further Information

Guidelines

MDPI Initiatives

Follow MDPI