Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Traffic Sign Detection Based on the Improved YOLOv5

Appl. Sci. 2023, 13(17), 9748; https://doi.org/10.3390/app13179748

by Rongyun Zhang^1,*, Kunming Zheng¹, Peicheng Shi^1,2

, Ye Mei¹, Haoran Li¹ and Tian Qiu¹

Reviewer 1:

Abdullahi Chowdhury

Reviewer 2: Anonymous

Reviewer 3:

Jong Bae Kim

Appl. Sci. 2023, 13(17), 9748; https://doi.org/10.3390/app13179748

Submission received: 16 July 2023 / Revised: 21 August 2023 / Accepted: 23 August 2023 / Published: 29 August 2023

Round 1

Reviewer 1 Report

Abstract:

“Traditional traffic sign detection method cannot have satisfied the demand of intelligent driving at this time.”

“cannot have satisfied”? rewrite the full sentence. What do you mean by traditional traffic sign detection method?

Although the majority of detection algorithms run at a fast rate, their detection accuracy is low.

Does it mean a minority of them have high accuracy?

Instead of using some generic words (e.g., traditional, some, most), write the name or details of the work.

Introduction:

“With the advancement of Intelligent Technology, intelligent driving systems have become a controversial issue among researchers.”

Why controversial? If you are referring to any social or trust issue, is that scope of your paper?

“key component in smart driving”

Is this the right term?

“sends timely information to the driver”

Make sure how the traffic

“In [11],”

You have used “In [],” many times. Please use the appropriate reference format and grammatical structure.

“Therefore, for the problem that detection accuracy and speed cannot be well balanced, this research illustrates a new YOLOv5 for detecting traffic signs.”

What is your main contribution?

Overall:

Description of the model is not properly described. Try to use a model flow chart or algorithm to describe your model.

Equations 7-12 require further explanation including a description of the notations and usefulness of the equations in your model.

Describe equations 18 and 19.

Result analysis sections need further explanation describing your result.

There are number of spelling and grammatical mistakes in this report.

Author Response

August 11, 2023

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled“Manuscript ID: applsci-2515915, Title: Traffic Sign Detection Based on the Improved YOLOv5.” Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. In addition, other parts of the text have been revised, such as the revision and improvement of the figures, as well as other parts of the text that need to be improved. The revised portion are marked in yellow in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Response to comment (Reviewer 1): (“Traditional traffic sign detection method cannot have satisfied the demand of intelligent driving at this time.”

“cannot have satisfied”? rewrite the full sentence. What do you mean by traditional traffic sign detection method?)

Response: The traditional traffic sign detection method refers to the detection method based on the color and shape of traffic signs. Modification. “Although the detection method of traffic signs based on color or shape can achieve recognition of large categories of signs such as prohibitions and warnings, the recognition categories are few and the accuracy is not high.”

Response to comment (Reviewer 1): (Although the majority of detection algorithms run at a fast rate, their detection accuracy is low. Does it mean a minority of them have high accuracy?Instead of using some generic words (e.g., traditional, some, most), write the name or details of the work.)

Response: Modification. “Although the traffic sign detection algorithm based on color or shape is small in computation and good in real-time, the color features are greatly affected by light and weather.”

Response to comment (Reviewer 1): (“With the advancement of Intelligent Technology, intelligent driving systems have become a controversial issue among researchers.”Why controversial? If you are referring to any social or trust issue, is that scope of your paper?)

Response: “With the development of intelligent technology, the rapid detection and recognition of traffic signs has become a hot issue for researchers at this stage.”

Response to comment (Reviewer 1):(“key component in smart driving”Is this the right term?)

Response: Modification. “Traffic sign detection is an important part of smart driving.”

Response to comment (Reviewer 1):(“sends timely information to the driver”Make sure how the traffic.)

Response: Modification. Traffic sign detection plays an important role in intelligent driving, after collects traffic sign information, the intelligent driving system can make the car automatically plan the path and adjust the speed during the driving process, so as to avoid traffic accidents.

Response to comment (Reviewer 1):(“In [11],” You have used “In [],” many times. Please use the appropriate reference format and grammatical structure.)

Response: Addition. Appropriate sentence changes have been made in the text.

Example: Hu J. et al. proposed a compact model was designed and a squeezing and excitation model was proposed to raise the capability of salient features by learning correlations between two channels [11].

Response to comment (Reviewer 1):(“Therefore, for the problem that detection accuracy and speed cannot be well balanced, this research illustrates a new YOLOv5 for detecting traffic signs.”What is your main contribution?)

Response: Answer. In this paper, SIoU loss function is used to optimize the training model, and attention mechanism is added to enhance the feature extraction ability of the model and improve the detection accuracy of small targets. Finally, ACONC is used as the activation function of YOLOv5, which makes YOLOv5 have better convergence and improves model robustness The main contributions of this paper are as follows.

- The CBAM is fused with the CSP1_3 model in YOLOv5 to form a new CSP1_3CBAM model, which not only highlights key features but also suppresses invalid features to achieve effective detection of small target regions.

- The SiLU activation function in YOLOv5 network is improved, and use the ACONC activation function, which renders the activation function network more sufficient for better convergence and improves the robustness of YOLOv5.

- The loss function in YOLOv5 network is improved, and use the SIoU loss function, which optimizes the training model to improve the accuracy of small targets.

Response to comment (Reviewer 1):(Description of the model is not properly described. Try to use a model flow chart or algorithm to describe your model.)

Response:

Figure 2. Flow chart of YOLOv5 algorithm

Response to comment (Reviewer 1): (Equations 7-12 require further explanation including a description of the notations and usefulness of the equations in your model. Describe equations 18 and 19.)

Response: The SIoU loss function consists of four Cost functions: IoU cost, Distance cost, Angle cost and Shape cost. The formula is shown in formula (7).

（7）

Where IOU is IoU cost, is distance loss, is Shape cost.

The formula for Distance cost is shown in (8)

（8）

The parameter definition in distance cost is shown in formula (9) - (11).

	（9）
	（10）
	（11）

Where and are the width and height of the smallest rectangle bounded by two boxes, is the distance value, is Angle cost.

The formula for Angle cost is as follows.

（12）

Where ， is true box and predict box center height difference, is true box and predict box center distance.

The formula for Shape cost is as follows.

（13）

The parameter definition in Shape cost is shown in formula (14) - (15).

	（14）
	（15）

Response to comment (Reviewer 1): (Result analysis sections need further explanation describing your result.)

Response: It has been modified in the paper.

We tried our best to improve the manuscript and made some changes in the manuscript. These changes will not influence the content and framework of the paper. And here we did not list the changes but marked in revised paper. We appreciate for Editors/Reviewers’ warm work earnestly, and hope that the correction will meet with approval. Once again, thank you very much for your comments and suggestions.

Yours

Sincerely

Zhang Rongyun

The school of Mechanical Engineering, Anhui Polytechnic University

Beijing Middle Road, Wuhu 241000, China

Phone: 15805605696

Email Address: hanfengzhiwei@163.com

Reviewer 2 Report

The basic problem is that the second chapter starts with the improved version. For me the literature review and the former models are missing, without these I cannot accept the paper. Without these elements the new results cannot be compared deeply to former models and results.

Other problem is that the improving rate seems not too high. Please detail, how the detection accuracy can be improved to 100%. For using in the real life and avoiding any accidents, only the 100% can be used surely.

One more major comment:

I cannot see the difference in the Figure 7, please detail better, what we should see.

Other minor comments:

Next time please add the line numbers.

By [1-3] please use [1]-[3], since the 1-3=-2 can be a mathematical operation. Other request, that please detail deeper each references. It is true by references, which were referenced in one bracket, such as [4-5], etc.

What is the model and resolution of driving recorder. Can the resolution affect the detection accuracy?

You described a Windows 10 system with CPU, GPU, RAM, etc. However this device is rather big in a car. How can You built this device into a car? I think, it is necessary for building into a car for a real-time controlling.

Can this algorithm be used only for plates, or also for traffic signs painted onto pathway?

What is the case, if the traffic sign is partly hidden, e.g. by a tree?

Author Response

August 11, 2023

Dear Editors and Reviewers:

1. Response to comment (Reviewer 2):(The basic problem is that the second chapter starts with the improved version. For me the literature review and the former models are missing, without these I cannot accept the paper. Without these elements the new results cannot be compared deeply to former models and results.)

Response: The literature review and previous models are introduced in Section 1.

Response to comment (Reviewer 2):(Other problem is that the improving rate seems not too high. Please detail, how the detection accuracy can be improved to 100%. For using in the real life and avoiding any accidents, only the 100% can be used surely.)

Response: TT100k traffic sign dataset is used in this paper, which contains 45 classes of traffic sign dataset and 9176 traffic sign pictures. Therefore, the data cannot avoid a lot of noise in the annotation process, and the network cannot fully fit, so any normal model cannot obtain 100% accuracy from a practical point of view. If you want to improve the detection accuracy to 100%, then reduce the number and categories of images in the dataset, and the accuracy is likely to be close to 100%

Response to comment (Reviewer 2):(I cannot see the difference in the Figure 7, please detail better, what we should see.)

Response: Answer.

In this picture, for example, the traffic sign in the first picture on the left is the speed limit of 20km/h, and a correct prediction result contains three pieces of information, Category, confidence and location information of the predicted target. The box in the figure represents the predicted position, pl30 indicates that the predicted result is a speed limit of 30km/h, and 0.73 is the confidence level The greater the confidence degree here, the closer the predicted results of the model are to the actual results.

Response to comment (Reviewer 2):(Next time please add the line numbers.)

Response: the line numbers are already in the paper.

Response to comment (Reviewer 2):(By [1-3] please use [1]-[3], since the 1-3=-2 can be a mathematical operation. Other request, that please detail deeper each references. It is true by references, which were referenced in one bracket, such as [4-5], etc.)

Response: Modification. Changed in the article.

Response to comment (Reviewer 2):(What is the model and resolution of driving recorder. Can the resolution affect the detection accuracy?)

Response: The picture specification of the data set captured by the driving recorder is 2048×2048, and the high-resolution picture will affect the detection accuracy. The higher the picture resolution, the higher the detection accuracy, but the higher the computer performance required. In this paper, when training the data set, the picture size is uniformly set to 640×640, and low-pixel pictures are used for training, which can improve the detection accuracy of low-resolution traffic signs.

Response to comment (Reviewer 2):(You described a Windows 10 system with CPU, GPU, RAM, etc. However, this device is rather big in a car. How can You built this device into a car? I think, it is necessary for building into a car for a real-time controlling.)

Response: Since the data set contains nearly 1w pictures and requires a large amount of computing resources, this paper uses a Windows 10 system with GPU for training. If a computer with small computing resources is used, it will take a huge amount of training time. The weight file trained in this paper is only 14.3M. When real-time prediction is carried out on the car, a processor with CPU is required or the algorithm is lightweight, and the size and number of parameters of the model are reduced, so that the computing resources required for the trained weight file are smaller.

Response to comment (Reviewer 2):(Can this algorithm be used only for plates, or also for traffic signs painted onto pathway?)

Response: In this paper, the indicator line on the ground is not trained in the training data set, and the algorithm in this paper cannot recognize the ground marker line due to the limitations of the data set. If the ground marker line is recognized, it is only necessary to add the ground marker line data set to the original data set, and then use the algorithm to train the new dataset.

Response to comment (Reviewer 2): (What is the case, if the traffic sign is partly hidden, e.g. by a tree?)

Response: For the detection of partially occluded traffic signs, due to the addition of the attention mechanism module in the convolutional layer in this paper, important information in the feature map can be focused purposefully, so as to better detect the partially occluded traffic signs.

Yours

Sincerely

Zhang Rongyun

The school of Mechanical Engineering, Anhui Polytechnic University

Beijing Middle Road, Wuhu 241000, China

Phone: 15805605696

Email Address: hanfengzhiwei@163.com

Reviewer 3 Report

The proposed paper introduces an enhanced YOLOv5 approach for detecting traffic signs. The proposed method utilizes the SIoU loss function in the YOLOv5 model, resulting in accurate traffic sign detection outcomes.

To further pique readers' interest, the following points should be addressed:

1. It is essential to conduct experiments using different training datasets for traffic sign detection. This would demonstrate that the proposed method is not restricted to the TT100K Chinese traffic sign dataset and can be effectively applied in diverse road environments.

2. A detailed description of the learning parameters and preconditions used in the various models compared in Table 2 is necessary. Providing information on anchor boxes, iterations, initial parameters, and epochs used in the learning process for each model would enhance the paper's clarity.

3. To validate the performance of the proposed method, a comparison of computing resources such as memory and CPU usage for each learning model is required. This analysis would offer insights into the computational efficiency of the proposed method.

There is a need to express the summary's English sentences in a more natural manner.

Author Response

August 11, 2023

Dear Editors and Reviewers:

1 Response to comment (Reviewer 3): (It is essential to conduct experiments using different training datasets for traffic sign detection. This would demonstrate that the proposed method is not restricted to the TT100K Chinese traffic sign dataset and can be effectively applied in diverse road.)

Response: Addition. The same training was carried out on the GTSDB traffic sign dataset. The German Traffic Sign Detection Benchmark (GTSDB) is widely used for the evaluation of traffic sign detection. It includes 900 images with a resolution of 1360×800 (600 for training and 300 for testing), and the mAP after training is 92.5%. The results show that the algorithm has good generalization ability.

2.Response to comment (Reviewer 3): (It is essential to conduct experiments using different training datasets for traffic sign detection. This would demonstrate that the proposed method is not restricted to the TT100K Chinese traffic sign dataset and can be effectively applied in diverse road.)

Response: Addition.

Table 2 Partial training parameters of the model

Models	Input size	Learning rate	Epoch	Batch size
YOLOv5	640×640	0.001	150	16
SSD300	300×300	0.001	300	32
Faster R-CNN	416×416	0.001	2500	20
Zhu	-	-	-	-
YOLOv3	416×416	0.001	200	8
YOLOv4	416×416	0.001	200	8
[25]	416×416	0.001	200	8
Ours	640×640	0.001	150	16

The learning rate of all models is set to 0.001, the input size, epoch and batch size are shown in Table 2 below, the anchor boxes are set to the original data, and YOLOv5 model adopts the same parameter setting strategy as us for training and adjusting parameters. the input size is 640 x 640, the batch size is 16, and the epoch is 150.

Response to comment (Reviewer 3): (To validate the performance of the proposed method, a comparison of computing resources such as memory and CPU usage for each learning model is required. This analysis would offer insights into the computational efficiency of the proposed method.)

Response: Addition.

Model	SIoU	ACONC	CBAM	Parameters	GFLOPs	P/%	R/%	mAP@0.5/%
YOLOv5				7066239	16.4	73.2	74.2	75.7
A	√			7066239	16.4	79.3	75.8	79.7
B		√		7469903	16.7	77.2	74.4	78.7
C			√	7110151	16.5	78.0	73.5	78.6
Ours	√	√	√	7513815	16.8	81.9	77.2	81.9

When only the loss function is improved, parameter and GFLOPs do not change, indicating that the number of layers in the network does not change when the loss function is changed, the mAP increases from 75.7% to 79.7%, an increase of 4.0 percentage points. When only the activation function is improved, as the parameters in-creases, the GFLOPs increases, and the hardware computation efficiency is also higher, the mAP increases from 75.7% to 78.7%, an increase of 3.0 %. When only the feature enhancement model is fused, parameters and GFLOPs increases slightly, the hardware computing efficiency is also increased slightly, the mAP increases from 75.7% to 78.6%, an increase of 2.9 %. When all three are added to the original algorithm, the number of parameters and GFLOPs of the network increase, and the hardware computing efficiency also increases, the improved YOLOv5 precision rate increased from 73.2% to 81.9%, an increase of 8.7%; the recall rate increased from 74.2% to 77.2%, an increase of 3.0%, and the mAP increased from 75.7% to 81.9%, an increase of 6.2%. After calculation, the FPS also increased from 26.88 to 30.42 frames per second. In general, the im-proved model detection speed and accuracy effectively improved.

Yours

Sincerely

Zhang Rongyun

The school of Mechanical Engineering, Anhui Polytechnic University

Beijing Middle Road, Wuhu 241000, China

Phone: 15805605696

Email Address: hanfengzhiwei@163.com

Round 2

Reviewer 1 Report

None

Authors must go through the manuscript thoroughly to improve the quality of the writing.

Author Response

August 22, 2023

Dear Editors and Reviewers:

Thank you for your letter and for the reviewers’ comments concerning our manuscript entitled“Manuscript ID: applsci-2515915, Title: Traffic Sign Detection Based on the Improved YOLOv5.” Those comments are all valuable and very helpful for revising and improving our paper, as well as the important guiding significance to our researches. We have studied comments carefully and have made correction which we hope meet with approval. In addition, other parts of the text have been revised, such as the revision and improvement of the figures, as well as other parts of the text that need to be improved. The revised portion are marked in red in the paper. The main corrections in the paper and the responds to the reviewer’s comments are as flowing:

Response to comment (Reviewer 1): (Comments on the Quality of English Language:“Authors must go through the manuscript thoroughly to improve the quality of the writing?)

Response: This question has been revised in the article and marked in red.

Yours

Sincerely

Zhang Rongyun

The school of Mechanical Engineering, Anhui Polytechnic University

Beijing Middle Road, Wuhu 241000, China

Phone: 15805605696

Email Address: hanfengzhiwei@163.com

Article Menu

Traffic Sign Detection Based on the Improved YOLOv5

Further Information

Guidelines

MDPI Initiatives

Follow MDPI