5.1. Experimental Data
In the experiment, this article selects the Ua-Detrac dataset to verify the effectiveness of the algorithm. The dataset contains information on cars, trucks, and buses, with a total of 140,000 images and 8250 manually labeled vehicles. The data-computing environment used in this paper is the Windows 10 operating system, the CUDA version is 11.7, and the graphics card model is a GeForce RTX 3080. Python version 3.7 is selected as the network development framework, and the whole development process is completed in the PyCharm integrated development environment. In this dataset, vehicles are divided into four categories, namely, cars, buses, vans, and others. Weather is divided into four categories: cloudy, night, sunny, and rainy. The dataset scale is based on the square root of the area (in pixels), which is divided into three levels: small (0–50 pixels), medium (50–150 pixels), and large (more than 150 pixels). The degree of occlusion is defined according to the score of the occluded vehicle bounding box, which is divided into three categories: no occlusion, partial occlusion (1–50%), and severe occlusion (more than 50%). The cutoff rate is defined as the extent to which the vehicle components exceed the frame.
To maintain consistency, the number of iterations is 100, and the batch size is 16. The DDSC_V3_YOLOv5s algorithm iteratively trains for 100 epochs in total, and the change curves of the training loss function and the verification loss function are shown in
Figure 5. The IBi_YOLOv5s algorithm iteratively trains for 100 epochs. The loss function loss curve of IBi_ YOLOv5s is shown in
Figure 5. DV3_IBi_YOLOv5s was iteratively trained 100 times, and the change curves of its training loss function and verification loss function are shown in
Figure 5.
As shown in
Figure 5a, the loss function curve of the YOLOv5 algorithm after adding lightweight DDSC_V3 is smoother. In the first 10 epochs, the loss function decreased rapidly and then tended to stabilize without obvious fluctuations, indicating that the improved model is more stable and has better convergence. As shown in
Figure 5b, during the first 10 epochs of training, the loss function curve showed an exponential downward trend, while the later region tended to stabilize, but when 70 epochs were used, the loss function curve declined rapidly again and finally tended to converge and no longer fluctuated significantly. As shown in
Figure 5c, the observation curve shows that the loss function decreases rapidly within the initial 5 epochs and tends to converge and remain relatively stable after 70 epochs.
5.2. Ablation Experiment
To verify the effectiveness of the proposed algorithm, ablation experiments are carried out on each module.
- (1)
Ablation Experiment of the Mobilenetv3 module
The effectiveness of the MobileNetv3 module based on DDSC (DDSC_V3 for short), a component of DDSC_V3_YOLOv5s, was verified by ablation experiments.
As shown in
Table 1, the introduction of the MobileNetv3 module improves the detection speed by 3.59% and reduces the detection accuracy by 1.75% compared with those of the benchmark algorithm yolov5s on the Ua-detrac dataset. Compared with that of the MobileNetv3 model, the detection accuracy of the DDSC_V3 module is improved by 5.20%, and the detection speed is improved by 11.69%. Compared with that of the benchmark algorithm YOLOv5s, the detection accuracy is reduced by 1.75% and the detection speed is increased by 15.32%. The integration of Mobilenetv3 and ddsc can improve the detection speed without increasing the detection accuracy. Experiments have proven the effectiveness of the DDSC_V3 module proposed in this article.
- (2)
Ablation Experiment of the BiFPN_ICBAM module
The ICBAM attention mechanism, a component of IBi_YOLOv5s, and the effectiveness of the BiFPN_ICBAM module were verified by ablation experiments.
As shown in
Table 2, compared with the benchmark algorithm YOLOv5s, when the FPN is improved by adding an attention mechanism, the detection accuracy increases and the tracking speed decreases. The best combination algorithm is the combination of the bifpn714;icbam algorithms. Compared with that of the benchmark algorithm YOLOv5s, the tracking accuracy is improved by 5.29%, and the detection speed is 96.19 fps. Experiments show that the optimization scheme of icbam and icbam BIFPN proposed in this article is effective.
- (3)
Ablation Experiment of DDSC_V3 and the IBi module
To verify the effectiveness of the components of the algorithm DV3_IBi_YOLOv5s in this article, ablation experiments were carried out on the DDSC_V3, and the IBi module was verified via ablation experiments.
As shown in
Table 3, compared with the benchmark algorithm yolov5s, the introduction of the ddsc_v3 module in the ua-detrac dataset increased the detection speed by 3.59% and reduced the detection accuracy by 1.75%. Compared with the benchmark algorithm, the introduction of the IBi module reduced the detection speed by 11.19% and improved the detection accuracy by 4.29%. Compared with the benchmark algorithm, the simultaneous introduction of the DDSC_V3 module and the IBI module improved the detection accuracy by 7.34% and the detection speed by 10.80%. Ablation experiments demonstrate the effectiveness of the proposed optimization scheme for backbone and neck networks.
5.3. Analysis of Algorithm Effectiveness
In this section, in the Car, Truck, and Bus categories of the UA-DETRAC dataset, the detection performance of YOLOv5s is compared with that of the DV3_IBi_YOLOv5s algorithm, DDSC_V3_YOLOv5s algorithm, and IBi_YOLOv5s algorithm proposed in this article.
- (1)
Performance evaluation results
To verify the performance of the algorithm, the accuracy (Precision (%)), recall (%), accuracy (AP (%)), average accuracy (mAP (%)), F1 score, parameter count (MB), and FPS (frame/s) of DV3-IBi_YOLOv5s will be compared with other algorithms in the Car, Truck, and Bus categories. Firstly, in order to analyze the algorithm performance, DV3-IBi_YOLOv5s was compared with several improved algorithms such as YOLOv5s, DDSCV3_YOLOv5s, IBi_YOLOv5s, etc., in Precision, Recall, and AP. At the same time, we will compare the algorithm with several traditional methods such as YOLOv6, YOLOv7, YOLOv8, SSD, and Faster RCNN to verify the performance of this method. The experimental results are shown in
Table 4.
As shown in
Table 4, the average values of the precision, recall, and AP of the YOLOv5s, DDSC_V3_YOLOv5s, IBi_YOLOv5s, and DV3_IBi_YOLOv5s algorithms for the three vehicle detection classifications are 82.77%, 53.43%, and 66.49%; (62%, 52%), 65%, and 65.17%; 86%, 54.23%, and 69.81%; and 86.02%, 54.39%, and 71.19%, respectively. Compared with the benchmark algorithm YOLOv5s, the DV3_IBi_YOLOv5s proposed in this article improved the overall precision, recall, and AP values by 3.93%, 1.79%, and 7.34%, respectively. Compared with those of the other two single improved models, the overall precision, recall, and AP of the DV3_IBi_YOLOv5s model proposed in this article are significantly better.
Secondly, to verify the performance of this algorithm in F1 score evaluation metrics, DV3-IBi_YOLOv5s was compared with other algorithms in F1 score evaluation metrics, as shown in
Table 5.
To verify the performance of this algorithm in mAP, parameter quantity, and FPS evaluation metrics, DV3-IBi_YOLOv5s was compared with other algorithms in F1 score evaluation metrics, as shown in
Table 6.
As shown in
Table 6, the DV3_IBi_YOLOv5s algorithm in this article increased the map value by 7.34% and reduced the number of parameters by 47.95% compared with those of the YOLOv5s algorithm; moreover, the detection speed increased by 10.80%. The data in the table show that the DV3_IBi_YOLOv5 algorithm in this article achieves a high map value. Although it is inferior to DDSC_V3_YOLOv5s in terms of parameter quantity and detection speed, it is basically flat. The experimental results fully verify the effectiveness of the algorithm in improving the accuracy and speed of vehicle detection.
- (2)
Lightweight network visualization results
Figure 6 shows a representative visual image of the detection results. The left side of
Figure 6 shows the test results of YOLOv5s, and the right side shows the test results of DDSC_V3_ YOLOv5s. Among them, (a)–(d) represent vehicle detection scenarios in different situations.
Figure 6a,b shows vehicle detection in a dark environment. Compared with DDSC_V3_YOLOv5s, YOLOv5s in the figure has missed detections and failed to correctly identify the vehicle at the bottom of the image. In this scenario, DDSC_V3_YOLOv5s correctly identified the car, and its confidence reached 0.86, indicating that the detection accuracy of the improved DDSC_V3_YOLOv5s will not be affected in the case of insufficient light. By comparing the detection results of
Figure 6c,d, it can be found that the problem of missing detection of small target vehicles at a distance in
Figure 6c is significant, while the detection effect of
Figure 6d is relatively good, and it can capture distant vehicle targets, although the confidence is still not high enough.
- (3)
Multiscale network visualization results
Figure 7 shows a comparison of the detection results of the YOLOv5s and IBi_YOLOv5s algorithms in a dark lighting environment at night and a street environment with a large number of vehicles. Among them, (a)–(d) represent vehicle detection scenarios in different situations. In
Figure 7a,b, IBi_YOLOv5s can more accurately detect the bus on the left and small car-like targets at a distance in a complex environment than YOLOv5s can. Visualization results show that, to a certain extent, the IBi_YOLOv5s algorithm is better than the original YOLOv5s algorithm for vehicle detection in complex environments. In
Figure 7c,d, IBi_YOLOv5s successfully located and correctly classified all vehicles, while YOLOv5s successfully located all vehicles but did not correctly classify them. Specifically, in
Figure 7c, Truck is incorrectly classified as the Car category, which verifies that IBi_YOLOv5s has more advantages than the original YOLOv5s algorithm in terms of classification of detection targets.
In
Figure 8, the left side shows the test results of YOLOv5s and the right side shows the test results of the IBI_YOLOv5s proposed in this article. Compared with
Figure 8a,b, it can be observed that the bus is not correctly detected and located in (a), but one vehicle is mistakenly detected as two buses. In contrast,
Figure 8b has high confidence and can more accurately detect and locate the bus information in the image. The results show that the performance of IBi_YOLOv5s in the feature perception of bus vehicles is better than that of the original YOLOv5s algorithm to a certain extent. Compared with
Figure 8c,d, the recognition rate of the car in
Figure 8c is not high. Garbage cans in roadside shadows are incorrectly recognized as vehicles, and the recognition confidence of cars is slightly lower than that in
Figure 8d. The results show that the vehicle detection performance of IBi_YOLOv5s is better than that of the original YOLOv5s algorithm, especially in the shadowed area.
- (4)
Lightweight multiscale network visualization results
In
Figure 9, the left side shows the detection results of YOLOv5s, while the right side shows the detection results of DV3_IBi_YOLOv5s. Among them, (a)–(d) represent vehicle detection scenarios in different situations. Comparing
Figure 9a,b, it can be found that both can identify small-scale and numerous car targets on the street. However, there is false detection in
Figure 9a; for example, the bus category on the left is falsely detected as a car category. To some extent, the DV3_IBi_YOLOv5s model has better feature extraction ability and a better detection effect. On the other hand, the comparison between
Figure 9c,d shows that the detection frame in
Figure 9c only recognizes part of the structure and does not completely frame the bus, while the detection frame size in
Figure 9d is more accurate and closer to the actual size of the bus. Compared with the above results, the detection effect of DV3_IBi_YOLOv5s is better, and the target positioning is more accurate in the target detection task.
The experimental data strongly verify that the multistage improvement of the YOLOv5s algorithm in this paper has a substantial effect. Compared with the original YOLOv5s algorithm, the performance improvement of DV3_IBi_YOLOv5s is not only reflected by quantitative data but can also be displayed by visual images through a comparison of the prediction results. This means that the application of the DV3_IBi_YOLOv5s algorithm in actual scenes will significantly improve the speed and accuracy of vehicle detection, thus greatly improving the overall effect of vehicle detection.