Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis
Abstract
:1. Introduction
1.1. Classification of Object Detection Algorithms
1.1.1. Classic Algorithms
1.1.2. Neural Network (DNN)-Based Algorithms
1.1.3. Zero/Few-Shot Learning Algorithms
- Introduce the main versions and improvement measures of YOLO series algorithms;
- Summarize the industrial application fields and application examples of YOLO series algorithms;
- Summarize the general improvement measures for the industrial application of YOLO series algorithms;
- Test the performance of the main versions of YOLO series algorithms;
- Point out the development directions and challenges of YOLO series algorithms.
2. Algorithm Introduction
2.1. YOLOv1
2.1.1. Detection Process
- Image Input: Input the original image.
- Preprocessing Process: Perform simple preliminary processing on the image, including resizing the image to an appropriate size, dividing the image into S × S grids, etc.
- Convolutional Neural Network: Through the operations of convolutional layer, pooling layer, fully connected layer, etc. It outputs a three-dimensional tensor to represent the obtained prediction information. The specific framework of the convolutional neural network is shown in Figure 5.
- 4.
- Postprocessing Process: Filter redundant bounding boxes by methods such as non-maximum suppression.
- 5.
- Image Output: Integrate the processed data onto the image and output.
2.1.2. Training Process
- Dataset Input: Input the images with labels.
- Preprocessing Process: Similar to the detection process, image size processing, grid division, and other preprocesses are performed.
- Convolutional Neural Network: Input the image from the dataset into convolutional neural network to obtain a three-dimensional tensor containing prediction information.
- Postprocessing Process: Filter redundant boundaries by methods such as non-maximum suppression.
- Network Parameter Adjustment: Through regression analysis, adjust the network parameters in the convolutional neural network so that the three-dimensional tensor information output by the network can be closer to the real value.
2.2. YOLOv2
- Darknet-19: YOLOv2 uses Darknet-19 as the main framework of the convolutional neural network. Compared with the convolutional neural network structure of YOLOv1, this network framework is more sufficient for extracting the feature of the target object. This helps YOLOv2 possesses a higher detection accuracy and a faster detection speed.
- Passthrough Layer: To address the challenge of detecting small targets more effectively in YOLOv1, YOLOv2 adds a passthrough layer to the convolutional neural network. This combines deep low-resolution features with shallow high-resolution features, which improving the algorithm detection performance for small target objects.
- Anchor Boxes: YOLOv2 uses a different way to predict coordinates. It removes the fully connected layer and adds anchor boxes to the convolutional layer. By predicting the offset of the anchor boxes, the objects to be detected are located indirectly, which simplifies the network predicting process for the position coordinates of the objects.
- Location Prediction: In the early stage of algorithm training, unrestricted coordinate prediction often cannot make the loss function converge quickly. Therefore, YOLOv2 uses the Logistic function to limit the bounding box offset relative to the anchor box, which improves the efficiency and stability in the initial training process of the algorithm.
- Dimension Clusters: Dimension clustering is used to solve the problem of weak adaptability of anchor boxes to objects of different sizes. In order to enable the anchor boxes to adapt to objects of different shapes and sizes, YOLOv2 uses dimensional clustering to get the size and number of the anchor boxes. YOLOv2 uses IoU representation method to describe the fitting degree of the anchor boxes to objects of different sizes. The expression formula is as Equation (4):Through dimensional clustering, YOLOv2 modifies its predictive analysis mechanism by optimizing the size and number of the anchor boxes to speed up the convergence of the loss function during algorithm training.
- High Resolution Classifier: Using the high-resolution images to improve the classifier obtained by training so that the network can adapt to the high-resolution images input during the detection process in advance. In this way, the algorithm can obtain a better detection performance in the process of detection.
- Batch Normalization: Adding the batch normalization [29] layer after all convolutional layers. This increases the network parameters and the network calculation amount. However, it has a good influence on speed and accuracy optimization of the entire network in the training process.
- Multiscale Training: During the training of the neural network, YOLOv2 adopts the method of training images of different scales to increase the robustness of the network for image recognition.
2.3. YOLOv3
- Darknet-53: YOLOv3 uses the Darknet-53 network framework for feature extraction. Compared with the Darknet-19 of YOLOv2, Darknet-53 greatly increases the number of the convolution layers in the network, and thereby, it further increases the network’s ability to extract image features.
- FPN Concept Block: YOLOv3 uses a similar concept to Feature Pyramid Networks (FPN) [30] to extract features from three different scales. This enables YOLOv3 to predict objects through three different scales. Furthermore, this enhancement also results in YOLOv3 exhibiting superior performance in detecting small objects.
- Bounding Box Prediction: In the bounding box prediction during the training process, YOLOv3 calculates an objectness score for each bounding box predicted by the network through Logistic regression. The objectness score quantifies the extent of alignment between the bounding box and the ground truth. Each grid only uses the anchor boxes with a higher objectness score for subsequent operations. The anchor boxes that are not used do not affect the coordinates and classification values in the loss function. This reducing the instability of YOLOv3 in the early stage of training and improving its training speed.
- Cross-scale Prediction: To improve the prediction performance of the network for objects of different scales, YOLOv3 predicts objects on three scales. Three anchor boxes are adopted for prediction on each grid at each scale.
- Class Prediction: YOLOv3 uses an independent Logistic classifier for multilabel classification. This enables YOLOv3 to show better performance in the face of more overlapping labels in the dataset.
- Data Augmentation: In the process of training, data augmentation can produce more images for training, which makes the loss function more easier to converge. Image processing methods, e.g., optical distortion and geometric distortion, are beneficial to the extraction of deep-level features of images by convolutional neural networks.
2.4. YOLOv4
- CSP Structure: YOLOv4 adds a Cross Stage Partial (CSP) [31] structure to the backbone network and the branch network, which efficiently reduces the number of parameters in the whole network. Thereby, it optimizes the real-time performance of the algorithm.
- SPP Block: Enhanced receptive field mainly refers to enhancing the ability of the algorithms to perceive large area and global range information from the image. The main methods include Spatial Pyramid Pooling (SPP) [32], Atrous Spatial Pyramid Pooling (ASPP) [33], Receptive Field Block (RFB) [34], etc. YOLOv4 selects the SPP block to enhance its receptive field.
- PAN Block: In the process of object detection, it is not only necessary to utilize the deep level image features for forwarding inference, but also to utilize the underlying features to improve the prediction effect of the object detection algorithms. Commonly used feature integration methods include Feature Pyramid Networks (FPN), Scalewise Feature Aggregation Module (SFAM) [35], Adaptively Spatial Feature Fusion (ASFF) [36], BiFPN [37], etc. YOLOv4 selects PAN as its feature integration block.
- Spatial Attention Module: The attention mechanism is used to enable the object detection algorithms to retain important data information during detection process. At the same time, the networks also use it to suppress invalid data information so that the algorithm can analyze the data in a concentrated and effective manner. The main methods are Squeeze-and-Excitation (SE) [38], Spatial Attention Module (SAM) [39], etc. YOLOv4 uses SAM as its attention mechanism.
- CIoU Bounding Box Loss Function: Whether the prediction performance of the predicted bounding box can be more accurately represented plays a key role in optimizing the comprehensive performance of the algorithms. The methods to describe the performance of the predicted bounding box mainly include IoU-Loss, GIoU-Loss [40], DIoU-Loss [41], CIoU-Loss [41], etc. YOLOv4 uses the CIoU-Loss as its bounding box loss function.
- Focal Loss Function: For training process, due to the different proportions of different categories of images in the image datasets, the corresponding weights of different categories of objects may be different. This makes the algorithm easy to ignore those objects that make up a small proportion of the dataset. The focus loss function can balance the weight difference of the object detection algorithm for different types of objects in the image dataset.
- Mish Activation Function: The traditional tanh and sigmoid activation functions possess the problem of gradient disappearance in training process. To solve this, Nair and Hinton [42] proposed the ReLU activation function in 2010, which essentially solved the problem of gradient disappearance. Furthermore, derived methods include LReLU [43], PReLU [44], ReLU6 [45], Scaled Exponential Linear Unit (SELU) [46], Swish [47], hard-Swish [48], and Mish [49]. YOLOv4 uses Mish as its activation function, which helps YOLOv4 obtain better stability in the training process.
- DIoU-NMS Post-processing Method: The post-processing method refers to removing the repeated prediction for the same object and retaining the prediction with higher confidence. It is the process that performing the final detection on the predicted bounding box before outputting the final image processing result. The main methods are NMS, greedy NMS [24], soft NMS [50], DIoU-NMS [41], etc. YOLOv4 selects DIoU-NMS as its post-processing method, which improves YOLOv4’s performance in detecting occluded objects.
- Cross mini-Batch Normalization: YOLOv4 proposed the Cross mini-Batch Normalization (CmBN) [20] for the training process. CmBN is an optimization of the Cross-Iteration Batch Normalization (CBN) [51]. CmBN collects statistics only between mini-batches within a single batch. It makes the algorithms upgrade the parameters in one batch.
- Self-Adversarial Training: Self-Adversarial Training is an important way to increase the robustness of algorithms. In the adversarial training process, the input images will be mixed with some small disturbances. After the training process, the algorithms will adapt to the change, which makes the algorithms more robust.
- Data Augmentation: Data augmentation can increase the number and richness of datasets. In addition to traditional methods like optical distortion and geometric distortion, image data can also be enhanced by occluding target objects and generating adversarial networks. Some representative measures include random erase [52], CutOut [53], hide-and-seek [54], grid mask [55], DropOut [56], DropConnect [57], DropBlock [58], MixUp [59], CutMix [60], Style Transfer GAN [61], etc. YOLOv4 uses CutMix and Mosaic [20] as its data augmentation methods.
2.5. YOLOv5
- Focus Block: Focus block uses slicing operation to down-sample the input image. Meanwhile, it also increases the input channel of the image while keeping the input information unchanged. This makes the feature extracted from the convolutional neural network more sufficient.
- GIoU Bounding Box Loss Function: YOLOv5 uses the GIOU-Loss method for calculating the bounding box loss function. This loss function effectively captures the disparity between the predicted bounding box performance and the actual performance, thereby enhancing the algorithm’s training efficacy.
- Adaptive Anchor Box Calculation: The clustering process of anchor boxes is embedded into the training process. It automatically calculates the optimal value of anchor boxes in different datasets. This predictive analysis mechanism helps YOLOv5 improve the performance in the detection process.
- Adaptive Image Scaling: During the detection process, the input image is usually converted to the size specified by the network by scaling and filling. However, too much filling will increase the redundant information. Therefore, in YOLOv5, the redundant information of image filling is minimized. Meanwhile, the detection efficiency of the algorithm is improved by adaptive image scaling. In the process of training, the method of filling to the specified size is still used in the algorithm.
- Data Augmentation: Through random scaling, random cropping, and random arrangement of images, multiple images are integrated into one image. This greatly improves the diversity of input data. Furthermore, this also helps reduce overfitting in the training process.
2.6. YOLOv6
- A new network backbone called EfficientRep: For small networks, the RepBlock module built on RepVGG is used. For large networks, the authors modified a more efficient CSP block called the CSPStackRep block. The neck of YOLOv6 adopts PAN topology following YOLOv4 and YOLOv5. Based on the PAN topology, the authors augmented the neck with RepBlocks or CSPStackRep Blocks, and the modified neck is called Rep-PAN. For the decoupled head, the authors simplified it to make it more efficient.
- Label Assignment: The authors conducted extensive label assignment experiments on YOLOv6 and verified that TAL is more effective.
- Loss Function: For each loss, the authors systematically experimented with all available techniques and finally selected VariFocal Loss as the classification loss method and SIoU/GIoU Loss as regression loss method.
2.7. YOLOv7
- Extended efficient layer aggregation networks: ELAN is a strategy that allows a deep model to learn and converge more efficiently by controlling the shortest longest gradient path. The proposed E-ELAN in YOLOv7 uses expand, shuffle, and merge cardinality to achieve the ability to continuously enhance the learning ability of the network without destroying the original gradient path.
- Model scaling for concatenation-based models: The main purpose of model scaling is to adjust some attributes of the model and generate models of different scales to meet the needs of different inference speeds. YOLOv7 has a concatenation-based architecture; thus, when scaling up or down the depth, the input width of the subsequent network layer changes, which causes the ratio of the input channel and output channel of the subsequent layer to change, and the hardware utilization of the model decreases. Therefore, the authors proposed the corresponding compound model scaling method for a concatenation-based model. When the authors scaled the depth factor of a computational block, they also calculated the change of the output channel of that block. Then, they performed width factor scaling with the same amount of change on the transition layers. Their proposed compound scaling method can maintain the properties that the model had at the initial design and can maintain the optimal structure.
- Planned reparameterized convolution: While RepConv achieves excellent performance on VGG, its accuracy is significantly reduced when applied directly to architectures such as ResNet and DenseNet. Because the identity connection carried in RepConv destroys the residuals in ResNet and the concatenation in DenseNet. Based on this, the authors use RepConv without identity connection (RepConvN) to design the planning reparametrization convolutional structure.
- Coarse for auxiliary and fine for lead loss: Lead head is the head responsible for the final output; auxiliary head is the head responsible for auxiliary training.
2.8. Summary
3. Industrial Application
3.1. Industrial Application Examples of YOLO Series of Object Detection Algorithms
3.1.1. Product Quality and Defect Detection
3.1.2. Item Identification and Location Positioning
3.1.3. Industrial Scene Environment Monitoring
3.2. Summary of Improvement Measures for Industrial Application
- Increasing the Image Number: Datasets with more images can reduce the overfitting phenomenon in the algorithm training process. When optimizing datasets, the number of images in the datasets should be increased first. Therefore, how to obtain a large number of images is an important task in making datasets. Because of the small number of relative datasets in industrial fields, the training images are usually taken by oneself. This leads to the small number of acquired images. Methods to increase the image number include web crawling, video pinching, etc.
- Data Distribution Balance: In the process of making the datasets, an important issue is the data distribution in the datasets. If the images in a dataset are unevenly distributed, such as the uneven distribution of large, medium, and small objects. This situation can lead to poor robustness and generalization of the trained algorithm. Therefore, it is important to consider the influence of the images distribution on the training results while making the dataset.
- Image Enhancement: Image enhancement can effectively increase the quality of the images in the dataset so that the trained algorithm has strong robustness and generalization. The enhancement of the images includes the enhancement of the original images and the enhancement of the feature maps. The main methods for image enhancement are Random Brightness, Cartoon Effect, Contrast Limited Adaptive Histogram Equalization (CLAHE) [77], Color Temperature Transformation, Random Contrast, Edge Enhancement, Horizontal Flip, HSV Transform, Perspective Transformation, Salt and Pepper Noise, etc. Figure 16 shows the comparison between the original images and the enhanced images. These methods are useful to make certain features in the image clearer than the original images.
- Image Preprocessing: For applications with more noise in the input image, it is not enough to use image enhancement methods only. It is also vital to preprocess the input images to optimize the performance of detection and training processes of the algorithm. For example, in an environment with more fog, a dark channel image dehazing [76] algorithm can be used to preliminarily process the image. The processing effect of the dark channel image dehazing algorithm is shown in Figure 17.
- Hyperparameter Optimization: For the optimization of the algorithms, one of the simplest methods is to optimize the hyperparameters involved in the algorithm training. Typically, enhancing algorithm performance involves increasing the number of iterations and fine-tuning the learning rate during the training process. With regard to the optimization methods for the learning rate, there are some common methods includes Stochastic Gradient Descent [78], Adam [79], AdaGrad [80], etc.
- Transfer Learning: There are two ways for transfer learning [81] when the number of samples is insufficient. Firstly, the parameters of the fixed feature extraction part of the feature map remain unchanged, then the network parameters of the remaining part are trained. When the number of samples is sufficient, some parameters of the feature maps are kept unchanged, after several rounds of training on the remaining network parameters, some or all of the parameters of the feature extraction network are “unfrozen”. Finally, the network sets a small learning rate to fine-tune the entire network. During training process of the algorithms, the pretrained models can be used to initialize the algorithms to speed up the decline of the loss function during the training process. This can improve the learning efficiency and reduce the training time.
- Multiscale Training: In the training process, input images of different scales into the networks for training. This can make the networks possess better robustness and generalization for images of different input sizes. However, this method will extend the training time of the algorithm.
- k-means Clustering: For a specific type of target objects to be detected, k-means dimension clustering is usually used to modify the size and number of anchor boxes. This could speed up the convergence of the loss function during the training process, and thereby, it can improve the stability of the algorithms. In addition, k-means++ [82] could also be used for modifying the size and number of the anchor boxes.
- Network Structure Increase: The optimization of increasing the structure of the network can improve the performance of the algorithm in specific aspects by adding some designed blocks to the network structure. The blocks and structures usually added to the network include the residual structure [83], DenseNet block, CSP structure, SPP block, FPN block, attention block, etc.
- Network Structure Reduction: The network structure reduction can be mainly divided into a weight level [84], layer level [85], and channel level [86]. Weight-level pruning has high flexibility and generality, which can achieve higher compression ratios. However, it usually requires special software or hardware accelerators for fast inference on sparse models. Layer-level pruning is simpler but less flexible. Removing layers is only effective when the network depth is deep enough. Channel-level pruning strikes a good balance between flexibility and ease of implementation, as it can be applied to convolutional neural networks or full connect neural networks.
- Network Structure Replacement: By replacing some structures in the network, the networks can achieve better performance. Common replacement measures include replacing pooling operation with convolution operation, replacing traditional NMS with improved NMS (greedy NMS, soft NMS, DIoU-NMS, etc.), replacing traditional IoU loss function with improved IoU loss functions (GIoU-Loss, DIoU-Loss, CIoU-Loss, etc.), replacing traditional convolution Depthwise Separable Convolution [87], etc.
3.3. Security Challenges in Industrial Applications
- Robust Model Training: Utilizing adversarial training techniques to make the model more resilient to adversarial attacks.
- Input Validation: Implementing strict input validation and preprocessing to ensure that only valid and clean data reaches the detection system.
- Monitoring and Logging: Continuously monitoring the system for unusual activity and logging all inputs and outputs for forensic analysis.
- Physical Security: Ensuring physical security measures are in place to prevent unauthorized access to the detection system.
3.4. Summary
4. Practical Adaptations of YOLO Series Algorithms in Industrial Fields
4.1. Model Compression
- Pruning: Removing redundant neurons or filters to reduce model size.
- Quantization: Converting floating-point weights to lower precision (e.g., 8-bit integers).
- Knowledge Distillation: Transferring knowledge from a large teacher model to a smaller student model.
4.2. Edge Deployment
- Hardware Acceleration: Utilizing specialized hardware like GPUs, TPUs, or FPGAs to speed up inference.
- Inference Optimization: Techniques like batch processing, caching, and parallelization to improve efficiency.
- Edge Cloud Collaboration: Offloading computationally intensive tasks to cloud servers when necessary.
4.3. Data Enhancement and Preprocessing
- Data Augmentation: Applying transformations like rotation, scaling, and color jittering to increase dataset diversity.
- Preprocessing: Techniques like image dehazing, noise reduction, and normalization to improve image clarity and reduce artifacts.
4.4. Real-Time Performance Optimization
- Multiscale Prediction: Utilizing multiple scales of input images to capture objects of varying sizes.
- Lightweight Network Structures: Employing architectures like MobileNet or EfficientNet to reduce computational load.
- Attention Mechanisms: Incorporating attention modules to focus on relevant features, enhancing detection accuracy.
5. Algorithm Comparison
5.1. Test Environment
5.2. Datasets Introduction
5.3. Results Comparison
5.4. Summary
6. Development Directions and Challenges
6.1. Real-Time Trend
6.2. High-Precision Trend
6.3. Lightweight Trend
6.4. Multiscale Prediction Trend
6.5. High Reliability Trend
6.6. Generality Trend
6.7. Challenges
- Conflict Between Real Time and High Precision. Generally, improving the accuracy of object detection algorithms means reducing the speed of the algorithm. Conversely, increasing the speed of object detection algorithms means a decrease in algorithm accuracy. Therefore, in the process of algorithm development, a significant challenge is how to make the object detection algorithm balance the performance between speed and accuracy.
- Conflict Between Lightweight and Multiscale Prediction. Algorithms using multiscale prediction mechanisms usually possess complex network structures. However, the lightweight of the algorithms requires the network structure to be as simple as possible. Therefore, in the process of optimizing the algorithms, there is a conflict between the lightweight and the multiscale prediction of the algorithms. There needs to be a trade-off between the lightweight and the multiscale prediction.
- Conflict Between High Reliability and Generality. Generally, the algorithms with high reliability usually cannot make the algorithms possess good generality, and the improvement of the algorithms also needs to consider the balance between the reliability and generality. Especially for algorithms applied in industrial fields, it is more necessary to consider the balance between the reliability and generality of the algorithms. For example, if the reliability of the algorithm is reduced too much in industrial applications, it may lead to some hazards.
6.8. Security Against Adversarial Attacks
- Adversarial Training: Incorporating adversarial examples into the training set to improve the robustness of the model;
- Defensive Distillation: Using a distilled model that is less sensitive to small perturbations;
- Detection of Adversarial Examples: Implementing mechanisms to detect and flag potentially adversarial inputs before they affect the system;
- Robust Model Architectures: Designing neural network architectures that are inherently more resistant to adversarial attacks.
6.9. Alternative YOLO-Based Approaches
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Liu, R.; Yang, B.; Zio, E.; Chen, X. Artificial intelligence for fault diagnosis of rotating machinery: A review. Mech. Syst. Signal Process. 2018, 108, 33–47. [Google Scholar] [CrossRef]
- Gautam, D.; Mawardi, Z.; Elliott, L.; Loewensteiner, D.; Whiteside, T.; Brooks, S. Detection of Invasive Species (Siam Weed) Using Drone-Based Imaging and YOLO Deep Learning Model. Remote Sens. 2025, 17, 120. [Google Scholar] [CrossRef]
- Liu, G. Surface defect detection methods based on deep learning: A brief review. In Proceedings of the 2020 2nd International Conference on Information Technology and Computer Application (ITCA), Guangzhou, China, 18–20 December 2020; pp. 200–203. [Google Scholar]
- Yang, Y.; Ma, X.; Mu, C.; Wang, Z. Rapid Recognition and Localization Based on Deep Learning and Random Filtering. In Proceedings of the 2019 5th International Conference on Control, Automation and Robotics (ICCAR), Beijing, China, 19–22 April 2019; pp. 177–182. [Google Scholar] [CrossRef]
- Zhao, S.; Liu, J.; Bai, Z.; Hu, C.; Jin, Y. Crop pest recognition in real agricultural environment using convolutional neural networks by a parallel attention mechanism. Front. Plant Sci. 2022, 13, 839572. [Google Scholar] [CrossRef]
- Deng, L.; Mao, Z.; Li, X.; Hu, Z.; Duan, F.; Yan, Y. UAV-based multispectral remote sensing for precision agriculture: A comparison between different cameras. ISPRS J. Photogramm. Remote Sens. 2018, 146, 124–136. [Google Scholar] [CrossRef]
- Hesamian, M.H.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef]
- Peng, Z.; Liu, W.; Ning, Z.; Zhao, Q.; Cheng, S.; Hu, J. 3D Multi-object Tracking in Autonomous Driving: A survey. In Proceedings of the 2024 36th Chinese Control and Decision Conference (CCDC), Xi’an, China, 25–27 May 2024; pp. 4964–4971. [Google Scholar]
- Reddy, J.; Niu, H.; Scott, J.L.L.; Bhandari, M.; Landivar, J.A.; Bednarz, C.W.; Duffield, N. Cotton Yield Prediction via UAV-Based Cotton Boll Image Segmentation Using YOLO Model and Segment Anything Model (SAM). Remote Sens. 2024, 16, 4346. [Google Scholar] [CrossRef]
- Huang, Y.; Wang, D.; Wu, B.; An, D. NST-YOLO11: ViT Merged Model with Neuron Attention for Arbitrary-Oriented Ship Detection in SAR Images. Remote Sens. 2024, 16, 4760. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
- Ravuri, S.; Lenc, K.; Willson, M.; Kangin, D.; Lam, R.; Mirowski, P.; Fitzsimons, M.; Athanassiadou, M.; Kashem, S.; Madge, S.; et al. Skillful Precipitation Nowcasting Using Deep Generative Models of Radar. Nature 2021, 597, 672–677. [Google Scholar] [CrossRef]
- Fu, C.Y.; Liu, W.; Ranga, A.; Tyagi, A.; Berg, A.C. DSSD: Deconvolutional Single Shot Detector. arXiv 2017, arXiv:1701.06659. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollar, P. Focal Loss for Dense Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 318–327. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You Only Look Once: Unified, Real-Time Object Detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLO9000: Better, Faster, Stronger. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar] [CrossRef]
- Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
- Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
- Yan, X.; Shen, B.; Li, H. Small objects detection method for UAVs aerial image based on YOLOv5s. In Proceedings of the 2023 IEEE 6th International Conference on Electronic Information and Communication Technology (ICEICT), Qingdao, China, 21–24 July 2023; pp. 61–66. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer International Publishing: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
- Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-Shot Refinement Neural Network for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013. [Google Scholar]
- Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Character, L.; Ortiz, A.; Beach, T.; Luzzadder-Beach, S. Archaeologic Machine Learning for Shipwreck Detection Using Lidar and Sonar. Remote Sens. 2021, 13, 1759. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning, Lille, France, 6–11 July 2015; Volume 37, pp. 448–456. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 936–944. [Google Scholar] [CrossRef]
- Wang, C.Y.; Liao, H.Y.M.; Yeh, I.H.; Wu, Y.H.; Chen, P.Y.; Hsieh, J.W. CSPNet: A New Backbone that can Enhance Learning Capability of CNN. arXiv 2019, arXiv:1911.11929. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- Chen, L.C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 834–848. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Receptive Field Block Net for Accurate and Fast Object Detection. arXiv 2017, arXiv:1711.07767. [Google Scholar] [CrossRef]
- Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H. M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network. arXiv 2018, arXiv:1811.04533. [Google Scholar] [CrossRef]
- Liu, S.; Huang, D.; Wang, Y. Learning Spatial Fusion for Single-Shot Object Detection. arXiv 2019, arXiv:1911.09516. [Google Scholar] [CrossRef]
- Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. arXiv 2019, arXiv:1911.09070. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Albanie, S.; Sun, G.; Wu, E. Squeeze-and-Excitation Networks. arXiv 2017, arXiv:1709.01507. [Google Scholar] [CrossRef]
- Ying, X.; Wang, Y.; Wang, L.; Sheng, W.; An, W.; Guo, Y. A Stereo Attention Module for Stereo Image Super-Resolution. IEEE Signal Process. Lett. 2020, 27, 496–500. [Google Scholar] [CrossRef]
- Rezatofighi, H.; Tsoi, N.; Gwak, J.; Sadeghian, A.; Reid, I.; Savarese, S. Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression. arXiv 2019, arXiv:1902.09630. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. arXiv 2019, arXiv:1911.08287. [Google Scholar] [CrossRef]
- Nair, V.; Hinton, G.E. Rectified Linear Units Improve Restricted Boltzmann Machines. In Proceedings of the ICML, Haifa, Israel, 21–24 June 2010. [Google Scholar]
- Maas, A.L. Rectifier Nonlinearities Improve Neural Network Acoustic Models. Proc. ICML 2013, 30, 3. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. arXiv 2015, arXiv:1502.01852. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Klambauer, G.; Unterthiner, T.; Mayr, A.; Hochreiter, S. Self-Normalizing Neural Networks. arXiv 2017, arXiv:1706.02515. [Google Scholar] [CrossRef]
- Ramachandran, P.; Zoph, B.; Le, Q.V. Searching for Activation Functions. arXiv 2017, arXiv:1710.05941. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
- Misra, D. Mish: A Self Regularized Non-Monotonic Activation Function. arXiv 2019, arXiv:1908.08681. [Google Scholar] [CrossRef]
- Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS—Improving Object Detection with One Line of Code. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar] [CrossRef]
- Yao, Z.; Cao, Y.; Zheng, S.; Huang, G.; Lin, S. Cross-Iteration Batch Normalization. arXiv 2020, arXiv:2002.05712. [Google Scholar] [CrossRef]
- Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. arXiv 2017, arXiv:1708.04896. [Google Scholar] [CrossRef]
- DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar] [CrossRef]
- Singh, K.K.; Lee, Y.J. Hide-and-Seek: Forcing a Network to be Meticulous for Weakly-Supervised Object and Action Localization. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 3544–3553. [Google Scholar] [CrossRef]
- Chen, P.; Liu, S.; Zhao, H.; Jia, J. GridMask Data Augmentation. arXiv 2020, arXiv:2001.04086. [Google Scholar] [CrossRef]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Wan, L.; Zeiler, M.; Zhang, S.; LeCun, Y.; Fergus, R. Regularization of Neural Networks Using Dropconnect. In Proceedings of the 30th International Conference on International Conference on Machine Learning—Volume 28, Atlanta, GA, USA, 17–19 June 2013; ICML’13. pp. III–1058–III–1066. [Google Scholar]
- Ghiasi, G.; Lin, T.Y.; Le, Q.V. DropBlock: A regularization method for convolutional networks. arXiv 2018, arXiv:1810.12890. [Google Scholar] [CrossRef]
- Zhang, H.; Cisse, M.; Dauphin, Y.N.; Lopez-Paz, D. mixup: Beyond Empirical Risk Minimization. arXiv 2017, arXiv:1710.09412. [Google Scholar] [CrossRef]
- Yun, S.; Han, D.; Oh, S.J.; Chun, S.; Choe, J.; Yoo, Y. CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features. arXiv 2019, arXiv:1905.04899. [Google Scholar] [CrossRef]
- Geirhos, R.; Rubisch, P.; Michaelis, C.; Bethge, M.; Wichmann, F.A.; Brendel, W. ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness. arXiv 2018, arXiv:1811.12231. [Google Scholar] [CrossRef]
- Li, J.Y.; Su, Z.F.; Geng, J.H.; Yin, Y.X. Real-time Detection of Steel Strip Surface Defects Based on Improved YOLO Detection Network. IFAC-PapersOnLine 2018, 51, 76–81. [Google Scholar] [CrossRef]
- Zhang, H.W.; Zhang, L.J.; Li, P.F.; Gu, D. Yarn-dyed Fabric Defect Detection with YOLOV2 Based on Deep Convolution Neural Networks. In Proceedings of the IEEE 7th Data Driven Control and Learning Systems Conference (DDCLS), Enshi, China, 25–27 May 2018; pp. 170–174. [Google Scholar]
- Kou, X.; Liu, S.; Cheng, K.; Qian, Y. Development of a YOLO-V3-based model for detecting defects on steel strip surface. Measurement 2021, 182, 109454. [Google Scholar] [CrossRef]
- Zheng, Z.; Zhao, J.; Li, Y. Research on Detecting Bearing-Cover Defects Based on Improved YOLOv3. IEEE Access 2021, 9, 10304–10315. [Google Scholar] [CrossRef]
- Li, J.; Yang, T.; Wang, H. An Industrial Automation Packaging Defect Detection Method Based on Deep learning. Packag. Eng. 2020, 41, 175–184. [Google Scholar] [CrossRef]
- Sun, Q.J.; Chen, D.L.; Wang, S.; Liu, S.X. Recognition Method for Handwritten Steel Billet Identification Number Based on Yolo Deep Convolutional Neural Network. In Proceedings of the 32nd Chinese Control Furthermore, Decision Conference (CCDC), Hefei, China, 22–24 August 2020; Chinese Control and Decision Conference. pp. 5642–5646. [Google Scholar]
- Huang, J.Y.; Lu, Y.Y. A Method for Identifying and Classifying Resistors and Capacitors Based on YOLO Network. In Proceedings of the 4th IEEE International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 1–5. [Google Scholar]
- Wang, L.; Zhou, Q.; Wang, L.; Jiang, H.; Lin, S. Improved convolutional neural network algorithm for real-time recognition and location of mechanical parts. Intell. Comput. Appl. 2019, 9, 36–41+46. [Google Scholar]
- Huang, R.; Gu, J.N.; Sun, X.H.; Hou, Y.T.; Uddin, S. A Rapid Recognition Method for Electronic Components Based on the Improved YOLO-V3 Network. Electronics 2019, 8, 825. [Google Scholar] [CrossRef]
- Lu, Y.; Yang, B.; Gao, Y.; Xu, Z. An automatic sorting system for electronic components detached from waste printed circuit boards. Waste Manag. 2022, 137, 1–8. [Google Scholar] [CrossRef]
- Lin, X.; Li, Y.; Song, W. Target detection algorithm in industrial scenc based on SlimYOLOv3. Appl. Res. Comput. 2021, 38, 1889–1893. [Google Scholar] [CrossRef]
- Wang, B.; Li, W.J.; Tang, H. Improved YOLO v3 Algorithm and Its Application in Helmet Detection. Comput. Eng. Appl. 2020, 56, 33–40. [Google Scholar]
- Deng, B.; Lei, X.; Ye, M. Safety helmet detection method based on YOLO v4. In Proceedings of the 2020 16th International Conference on Computational Intelligence and Security (CIS), Guangxi, China, 27–30 November 2020. [Google Scholar] [CrossRef]
- Wang, W.; Zhang, B.; Wang, Z.; Zhang, F.; Ren, H.; Wang, J. Intelligent identification method of mine fire video images based on YOLOv5. Ind. Mine Autom. 2021, 47, 53–57. [Google Scholar] [CrossRef]
- He, K.; Sun, J.; Tang, X. Single Image Haze Removal Using Dark Channel Prior. IEEE Trans. Pattern Anal. Mach. Intell. 2011, 33, 2341–2353. [Google Scholar] [CrossRef]
- Zuiderveld, K.J. Contrast Limited Adaptive Histogram Equalization. In Graphics Gems IV; Heckbert, P.S., Ed.; Academic Press Professional, Inc.: San Diego, CA, USA, 1994; pp. 474–485. [Google Scholar]
- Song, S.; Chaudhuri, K.; Sarwate, A.D. Stochastic gradient descent with differentially private updates. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 245–248. [Google Scholar] [CrossRef]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Luo, L.; Xiong, Y.; Liu, Y.; Sun, X. Adaptive Gradient Methods with Dynamic Bound of Learning Rate. arXiv 2019, arXiv:1902.09843. [Google Scholar]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
- Arthur, D.; Vassilvitskii, S. K-Means++: The Advantages of Careful Seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, New Orleans, LA, USA, 7–9 January 2017; SODA ’07. pp. 1027–1035. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Li, H.; Kadav, A.; Durdanovic, I.; Samet, H.; Graf, H.P. Pruning Filters for Efficient ConvNets; University of Maryland: College Park, MD, USA, 2016. [Google Scholar]
- Wen, W.; Wu, C.; Wang, Y.; Chen, Y.; Li, H. Learning Structured Sparsity in Deep Neural Networks. Adv. Neural Inf. Process. Syst. 2016, 29, 2074–2082. [Google Scholar]
- SIfre, L.; Mallat, S. Rigid-Motion Scattering for Texture Classification. arXiv 2014, arXiv:1403.1687. [Google Scholar] [CrossRef]
- Printed Circuit Board Dataset. Available online: https://archive.ics.uci.edu/dataset/990/printed+circuit+board+processed+image (accessed on 23 February 2025).
- Hard Hat Workers Dataset. Available online: https://public.roboflow.com/object-detection/hard-hat-workers (accessed on 23 February 2025).
- Hot Rolled Strip Dataset. Available online: https://paperswithcode.com/dataset/uavdt (accessed on 23 February 2025).
- Metal Surface Defect Dataset. Available online: https://www.kaggle.com/datasets/fantacher/neu-metal-surface-defects-data (accessed on 23 February 2025).
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The unmanned aerial vehicle benchmark: Object detection and tracking. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 370–386. [Google Scholar]
Algorithm | BFLOPS | mAP@0.25 | mAP@0.50 | mAP@0.75 | IoU Threshold | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
YOLOv1 | 40.155 | 0.45% | 0.17% | 0.01% | 0.25 | 0.00 | 0.00 | 0.00 |
0.50 | 0.00 | 0.00 | 0.00 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv2 | 62.688 | 15.55% | 6.40% | 0.02% | 0.25 | 0.24 | 0.06 | 0.09 |
0.50 | 0.25 | 0.06 | 0.10 | |||||
0.75 | 0.02 | 0.00 | 0.01 | |||||
YOLOv3 | 98.978 | 26.10% | 19.38% | 2.86% | 0.25 | 0.21 | 0.02 | 0.04 |
0.50 | 0.14 | 0.01 | 0.03 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv4 | 90.281 | 39.39% | 32.78% | 3.67% | 0.25 | 0.53 | 0.18 | 0.27 |
0.50 | 0.51 | 0.51 | 0.26 | |||||
0.75 | 0.16 | 0.06 | 0.08 | |||||
YOLOv5x | 204.100 | 16.90% | 16.90% | 16.90% | 0.25 | 0.17 | 0.17 | 0.17 |
0.50 | 0.17 | 0.17 | 0.17 | |||||
0.75 | 0.17 | 0.17 | 0.17 | |||||
YOLOv6-M6 | 379.500 | 35.49% | 30.28% | 10.59% | 0.25 | 0.44 | 0.14 | 0.21 |
0.50 | 0.43 | 0.14 | 0.21 | |||||
0.75 | 0.20 | 0.12 | 0.15 | |||||
YOLOv7x | 188.790 | 78.83% | 62.54% | 46.43% | 0.25 | 0.10 | 0.64 | 0.17 |
0.50 | 0.10 | 0.64 | 0.17 | |||||
0.75 | 0.10 | 0.64 | 0.17 | |||||
Faster R-CNN | 300.000 | 75.00% | 60.00% | 40.00% | 0.25 | 0.08 | 0.60 | 0.15 |
Algorithm | BFLOPS | mAP@0.25 | mAP@0.50 | mAP@0.75 | IoU Threshold | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
YOLOv1 | 40.155 | 0.35% | 0.21% | 0.03% | 0.25 | 0.00 | 0.00 | 0.00 |
0.50 | 0.00 | 0.00 | 0.00 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv2 | 62.673 | 55.62% | 39.08% | 3.60% | 0.25 | 0.46 | 0.19 | 0.27 |
0.50 | 0.76 | 0.32 | 0.45 | |||||
0.75 | 0.29 | 0.12 | 0.17 | |||||
YOLOv3 | 98.934 | 74.51% | 68.06% | 16.62% | 0.25 | 0.97 | 0.37 | 0.54 |
0.50 | 0.97 | 0.37 | 0.54 | |||||
0.75 | 0.64 | 0.25 | 0.35 | |||||
YOLOv4 | 90.237 | 87.55% | 85.50% | 45.19% | 0.25 | 0.91 | 0.70 | 0.79 |
0.50 | 0.93 | 0.72 | 0.81 | |||||
0.75 | 0.69 | 0.53 | 0.60 | |||||
YOLOv5x | 204.000 | 92.50% | 90.60% | 89.50% | 0.25 | 0.94 | 0.90 | 0.92 |
0.50 | 0.93 | 0.90 | 0.92 | |||||
0.75 | 0.93 | 0.90 | 0.92 | |||||
YOLOv6-M6 | 379.500 | 93.55% | 87.70% | 77.94% | 0.25 | 0.94 | 0.92 | 0.92 |
0.50 | 0.92 | 0.90 | 0.90 | |||||
0.75 | 0.90 | 0.87 | 0.88 | |||||
YOLOv7x | 188.790 | 95.64% | 93.22% | 93.18% | 0.25 | 0.96 | 0.94 | 0.94 |
0.50 | 0.96 | 0.94 | 0.94 | |||||
0.75 | 0.96 | 0.94 | 0.94 | |||||
Faster R-CNN | 300.000 | 75.00% | 60.00% | 40.00% | 0.25 | 0.08 | 0.60 | 0.15 |
Algorithm | BFLOPS | mAP@0.25 | mAP@0.50 | mAP@0.75 | IoU Threshold | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
YOLOv1 | 40.155 | 0.49% | 0.35% | 0.10% | 0.25 | 0.00 | 0.00 | 0.00 |
0.50 | 0.00 | 0.00 | 0.00 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv2 | 62.688 | 7.26% | 1.72% | 0.11% | 0.25 | 0.00 | 0.00 | 0.00 |
0.50 | 0.00 | 0.00 | 0.00 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv3 | 98.978 | 10.30% | 5.83% | 0.43% | 0.25 | 0.43 | 0.01 | 0.02 |
0.50 | 0.57 | 0.01 | 0.02 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv4 | 90.281 | 69.41% | 43.89% | 5.32% | 0.25 | 0.75 | 0.06 | 0.11 |
0.50 | 0.65 | 0.07 | 0.13 | |||||
0.75 | 0.35 | 0.04 | 0.07 | |||||
YOLOv5x | 204.100 | 65.50% | 65.50% | 65.40% | 0.25 | 0.82 | 0.50 | 0.62 |
0.50 | 0.82 | 0.50 | 0.62 | |||||
0.75 | 0.82 | 0.50 | 0.62 | |||||
YOLOv6-M6 | 379.500 | 66.01% | 21.63% | 20.47% | 0.25 | 0.65 | 0.48 | 0.55 |
0.50 | 0.60 | 0.45 | 0.51 | |||||
0.75 | 0.60 | 0.44 | 0.50 | |||||
YOLOv7x | 188.790 | 79.57% | 77.83% | 77.64% | 0.25 | 0.90 | 0.72 | 0.80 |
0.50 | 0.90 | 0.72 | 0.80 | |||||
0.75 | 0.90 | 0.72 | 0.80 | |||||
Faster R-CNN | 300.000 | 75.00% | 60.00% | 40.00% | 0.25 | 0.08 | 0.60 | 0.15 |
Algorithm | BFLOPS | mAP@0.25 | mAP@0.50 | mAP@0.75 | IoU Threshold | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
YOLOv1 | 40.155 | 0.33% | 0.11% | 0.01% | 0.25 | 0.00 | 0.00 | 0.00 |
0.50 | 0.00 | 0.00 | 0.00 | |||||
0.75 | 0.00 | 0.00 | 0.00 | |||||
YOLOv2 | 62.702 | 0.22% | 0.13% | 0.02% | 0.25 | 0.24 | 0.06 | 0.09 |
0.50 | 0.25 | 0.06 | 0.10 | |||||
0.75 | 0.02 | 0.00 | 0.01 | |||||
YOLOv3 | 99.022 | 36.63% | 18.48% | 0.72% | 0.25 | 0.65 | 0.05 | 0.09 |
0.50 | 0.43 | 0.03 | 0.06 | |||||
0.75 | 0.09 | 0.01 | 0.01 | |||||
YOLOv4 | 90.325 | 62.96% | 38.04% | 5.54% | 0.25 | 0.57 | 0.23 | 0.32 |
0.50 | 0.54 | 0.21 | 0.30 | |||||
0.75 | 0.18 | 0.07 | 0.10 | |||||
YOLOv5x | 204.200 | 57.80% | 57.80% | 57.00% | 0.25 | 0.77 | 0.46 | 0.58 |
0.50 | 0.77 | 0.46 | 0.58 | |||||
0.75 | 0.76 | 0.46 | 0.58 | |||||
YOLOv6-M6 | 379.500 | 58.62% | 54.43% | 49.97% | 0.25 | 0.81 | 0.53 | 0.64 |
0.50 | 0.80 | 0.53 | 0.63 | |||||
0.75 | 0.78 | 0.49 | 0.60 | |||||
YOLOv7x | 188.790 | 76.84% | 72.95% | 72.38% | 0.25 | 0.88 | 0.70 | 0.77 |
0.50 | 0.88 | 0.70 | 0.77 | |||||
0.75 | 0.88 | 0.70 | 0.77 | |||||
Faster R-CNN | 300.000 | 75.00% | 60.00% | 40.00% | 0.25 | 0.08 | 0.60 | 0.15 |
Algorithm | BFLOPS | mAP@0.25 | mAP@0.50 | mAP@0.75 | IoU Threshold | Precision | Recall | F1-Score |
---|---|---|---|---|---|---|---|---|
YOLOv1 | 40.155 | 0.12% | 0.49% | 0.88% | 0.25 | 0.13 | 0.04 | 0.03 |
YOLOv2 | 62.688 | 15.55% | 6.40% | 0.02% | 0.25 | 0.24 | 0.06 | 0.09 |
YOLOv3 | 98.978 | 26.10% | 19.38% | 2.86% | 0.25 | 0.21 | 0.02 | 0.04 |
YOLOV4 | 90.281 | 39.39% | 32.78% | 3.67% | 0.25 | 0.53 | 0.18 | 0.27 |
YOLOv5x | 204.1 | 16.90% | 16.90% | 16.90% | 0.25 | 0.17 | 0.17 | 0.17 |
YOLOv6-M6 | 379.5 | 35.49% | 30.28% | 10.59% | 0.25 | 0.44 | 0.14 | 0.21 |
YOLOV7X | 188.79 | 78.83% | 62.54% | 46.43% | 0.25 | 0.1 | 0.64 | 0.17 |
Faster R-CNN | 300.000 | 75.00% | 60.00% | 40.00% | 0.25 | 0.08 | 0.60 | 0.15 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Kang, S.; Hu, Z.; Liu, L.; Zhang, K.; Cao, Z. Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics 2025, 14, 1104. https://doi.org/10.3390/electronics14061104
Kang S, Hu Z, Liu L, Zhang K, Cao Z. Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics. 2025; 14(6):1104. https://doi.org/10.3390/electronics14061104
Chicago/Turabian StyleKang, Shizhao, Ziyu Hu, Lianjun Liu, Kexin Zhang, and Zhiyu Cao. 2025. "Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis" Electronics 14, no. 6: 1104. https://doi.org/10.3390/electronics14061104
APA StyleKang, S., Hu, Z., Liu, L., Zhang, K., & Cao, Z. (2025). Object Detection YOLO Algorithms and Their Industrial Applications: Overview and Comparative Analysis. Electronics, 14(6), 1104. https://doi.org/10.3390/electronics14061104