Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (9)

Search Parameters:
Keywords = Inner-Shape-IoU

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
20 pages, 1860 KB  
Article
An Improved YOLOv11n Model Based on Wavelet Convolution for Object Detection in Soccer Scenes
by Yue Wu, Lanxin Geng, Xinqi Guo, Chao Wu and Gui Yu
Symmetry 2025, 17(10), 1612; https://doi.org/10.3390/sym17101612 - 28 Sep 2025
Viewed by 321
Abstract
Object detection in soccer scenes serves as a fundamental task for soccer video analysis and target tracking. This paper proposes WCC-YOLO, a symmetry-enhanced object detection framework based on YOLOv11n. Our approach integrates symmetry principles at multiple levels: (1) The novel C3k2-WTConv module synergistically [...] Read more.
Object detection in soccer scenes serves as a fundamental task for soccer video analysis and target tracking. This paper proposes WCC-YOLO, a symmetry-enhanced object detection framework based on YOLOv11n. Our approach integrates symmetry principles at multiple levels: (1) The novel C3k2-WTConv module synergistically combines conventional convolution with wavelet decomposition, leveraging the orthogonal symmetry of Haar wavelet quadrature mirror filters (QMFs) to achieve balanced frequency-domain decomposition and enhance multi-scale feature representation. (2) The Channel Prior Convolutional Attention (CPCA) mechanism incorporates symmetrical operations—using average-max pooling pairs in channel attention and multi-scale convolutional kernels in spatial attention—to automatically learn to prioritize semantically salient regions through channel-wise feature recalibration, thereby enabling balanced feature representation. Coupled with InnerShape-IoU for refined bounding box regression, WCC-YOLO achieves a 4.5% improvement in mAP@0.5:0.95 and a 5.7% gain in mAP@0.5 compared to the baseline YOLOv11n while simultaneously reducing the number of parameters and maintaining near-identical inference latency (δ < 0.1 ms). This work demonstrates the value of explicit symmetry-aware modeling for sports analytics. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

20 pages, 25324 KB  
Article
DGSS-YOLOv8s: A Real-Time Model for Small and Complex Object Detection in Autonomous Vehicles
by Siqiang Cheng, Lingshan Chen and Kun Yang
Algorithms 2025, 18(6), 358; https://doi.org/10.3390/a18060358 - 11 Jun 2025
Cited by 1 | Viewed by 2116
Abstract
Object detection in complex road scenes is vital for autonomous driving, facing challenges such as object occlusion, small target sizes, and irregularly shaped targets. To address these issues, this paper introduces DGSS-YOLOv8s, a model designed to enhance detection accuracy and high-FPS performance within [...] Read more.
Object detection in complex road scenes is vital for autonomous driving, facing challenges such as object occlusion, small target sizes, and irregularly shaped targets. To address these issues, this paper introduces DGSS-YOLOv8s, a model designed to enhance detection accuracy and high-FPS performance within the You Only Look Once version 8 small (YOLOv8s) framework. The key innovation lies in the synergistic integration of several architectural enhancements: the DCNv3_LKA_C2f module, leveraging Deformable Convolution v3 (DCNv3) and Large Kernel Attention (LKA) for better the capture of complex object shapes; an Optimized Feature Pyramid Network structure (Optimized-GFPN) for improved multi-scale feature fusion; the Detect_SA module, incorporating spatial Self-Attention (SA) at the detection head for broader context awareness; and an Inner-Shape Intersection over Union (IoU) loss function to improve bounding box regression accuracy. These components collectively target the aforementioned challenges in road environments. Evaluations on the Berkeley DeepDrive 100K (BDD100K) and Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) datasets demonstrate the model’s effectiveness. Compared to baseline YOLOv8s, DGSS-YOLOv8s achieves mean Average Precision (mAP)@50 improvements of 2.4% (BDD100K) and 4.6% (KITTI). Significant gains were observed for challenging categories, notably 87.3% mAP@50 for cyclists on KITTI, and small object detection (AP-small) improved by up to 9.7% on KITTI. Crucially, DGSS-YOLOv8s achieved high processing speeds suitable for autonomous driving, operating at 103.1 FPS (BDD100K) and 102.5 FPS (KITTI) on an NVIDIA GeForce RTX 4090 GPU. These results highlight that DGSS-YOLOv8s effectively balances enhanced detection accuracy for complex scenarios with high processing speed, demonstrating its potential for demanding autonomous driving applications. Full article
(This article belongs to the Special Issue Advances in Computer Vision: Emerging Trends and Applications)
Show Figures

Figure 1

20 pages, 8948 KB  
Article
Detection of Sealing Surface of Electric Vehicle Electronic Water Pump Housings Based on Lightweight YOLOv8n
by Li Sun, Yi Shen, Jie Li, Weiyu Jiang, Xiang Bian and Mingxin Yuan
Electronics 2025, 14(2), 258; https://doi.org/10.3390/electronics14020258 - 9 Jan 2025
Viewed by 923
Abstract
Due to the characteristics of large size differences and shape variations in the sealing surface of electric vehicle electronic water pump housings, and the shortcomings of traditional YOLO defect detection models such as large volume and low accuracy, a lightweight defect detection algorithm [...] Read more.
Due to the characteristics of large size differences and shape variations in the sealing surface of electric vehicle electronic water pump housings, and the shortcomings of traditional YOLO defect detection models such as large volume and low accuracy, a lightweight defect detection algorithm based on YOLOv8n (You Only Look Once version 8n) is proposed for the sealing surface of electric vehicle electronic water pump housings. First, on the basis of introducing the MoblieNetv3 module, the YOLOv8n network structure is redesigned, which not only achieves network lightweighting but also improves the detection accuracy of the model. Then, DualConv (Dual Convolutional) convolution is introduced and the CMPDual (Cross Max Pooling Dual) module is designed to further optimize the detection model, which reduces redundant parameters and computational complexity of the model. Finally, in response to the characteristics of large size differences and shape variations in sealing surface defects, the Inner-WIoU (Inner-Wise-IoU) loss function is used instead of the CIoU (Complete-IoU) loss function in YOLOv8n, which improves the positioning accuracy of the defect area bounding box and further enhances the detection accuracy of the model. The ablation experiment based on the dataset constructed in this paper shows that compared with the YOLOv8n model, the weight of the proposed model is reduced by 61.9%, the computational complexity is reduced by 58.0%, the detection accuracy is improved by 9.4%, and the mAP@0.5 is improved by 6.9%. The comparison of detection results from different models shows that the proposed model has an average improvement of 6.9% in detection accuracy and an average improvement of 8.6% on mAP@0.5, which indicates that the proposed detection model effectively improves defect detection accuracy while ensuring model lightweighting. Full article
Show Figures

Figure 1

21 pages, 10113 KB  
Article
An Improved Bird Detection Method Using Surveillance Videos from Poyang Lake Based on YOLOv8
by Jianchao Ma, Jiayuan Guo, Xiaolong Zheng and Chaoyang Fang
Animals 2024, 14(23), 3353; https://doi.org/10.3390/ani14233353 - 21 Nov 2024
Cited by 2 | Viewed by 2854
Abstract
Poyang Lake is the largest freshwater lake in China and plays a significant ecological role. Deep-learning-based video surveillance can effectively monitor bird species on the lake, contributing to the local biodiversity preservation. To address the challenges of multi-scale object detection against complex backgrounds, [...] Read more.
Poyang Lake is the largest freshwater lake in China and plays a significant ecological role. Deep-learning-based video surveillance can effectively monitor bird species on the lake, contributing to the local biodiversity preservation. To address the challenges of multi-scale object detection against complex backgrounds, such as a high density and severe occlusion, we propose a new model known as the YOLOv8-bird model. First, we use Receptive-Field Attention convolution, which improves the model’s ability to capture and utilize image information. Second, we redesign a feature fusion network, termed the DyASF-P2, which enhances the network’s ability to capture small object features and reduces the target information loss. Third, a lightweight detection head is designed to effectively reduce the model’s size without sacrificing the precision. Last, the Inner-ShapeIoU loss function is proposed to address the multi-scale bird localization challenge. Experimental results on the PYL-5-2023 dataset demonstrate that the YOLOv8-bird model achieves precision, recall, mAP@0.5, and mAP@0.5:0.95 scores of 94.6%, 89.4%, 94.8%, and 70.4%, respectively. Additionally, the model outperforms other mainstream object detection models in terms of accuracy. These results indicate that the proposed YOLOv8-bird model is well-suited for bird detection and counting tasks, which enable it to support biodiversity monitoring in the complex environment of Poyang Lake. Full article
(This article belongs to the Section Birds)
Show Figures

Figure 1

26 pages, 11965 KB  
Article
AMFEF-DETR: An End-to-End Adaptive Multi-Scale Feature Extraction and Fusion Object Detection Network Based on UAV Aerial Images
by Sen Wang, Huiping Jiang, Jixiang Yang, Xuan Ma and Jiamin Chen
Drones 2024, 8(10), 523; https://doi.org/10.3390/drones8100523 - 26 Sep 2024
Cited by 20 | Viewed by 3631
Abstract
To address the challenge of low detection accuracy and slow detection speed in unmanned aerial vehicle (UAV) aerial images target detection tasks, caused by factors such as complex ground environments, varying UAV flight altitudes and angles, and changes in lighting conditions, this study [...] Read more.
To address the challenge of low detection accuracy and slow detection speed in unmanned aerial vehicle (UAV) aerial images target detection tasks, caused by factors such as complex ground environments, varying UAV flight altitudes and angles, and changes in lighting conditions, this study proposes an end-to-end adaptive multi-scale feature extraction and fusion detection network, named AMFEF-DETR. Specifically, to extract target features from complex backgrounds more accurately, we propose an adaptive backbone network, FADC-ResNet, which dynamically adjusts dilation rates and performs adaptive frequency awareness. This enables the convolutional kernels to effectively adapt to varying scales of ground targets, capturing more details while expanding the receptive field. We also propose a HiLo attention-based intra-scale feature interaction (HLIFI) module to handle high-level features from the backbone. This module uses dual-pathway encoding of high and low frequencies to enhance the focus on the details of dense small targets while reducing noise interference. Additionally, the bidirectional adaptive feature pyramid network (BAFPN) is proposed for cross-scale feature fusion, integrating semantic information and enhancing adaptability. The Inner-Shape-IoU loss function, designed to focus on bounding box shapes and incorporate auxiliary boxes, is introduced to accelerate convergence and improve regression accuracy. When evaluated on the VisDrone dataset, the AMFEF-DETR demonstrated improvements of 4.02% and 16.71% in mAP50 and FPS, respectively, compared to the RT-DETR. Additionally, the AMFEF-DETR model exhibited strong robustness, achieving mAP50 values 2.68% and 3.75% higher than the RT-DETR and YOLOv10, respectively, on the HIT-UAV dataset. Full article
Show Figures

Figure 1

17 pages, 7206 KB  
Article
A Multi-Scale Content-Structure Feature Extraction Network Applied to Gully Extraction
by Feiyang Dong, Jizhong Jin, Lei Li, Heyang Li and Yucheng Zhang
Remote Sens. 2024, 16(19), 3562; https://doi.org/10.3390/rs16193562 - 25 Sep 2024
Cited by 3 | Viewed by 1682
Abstract
Black soil is a precious soil resource, yet it is severely affected by gully erosion, which is one of the most serious manifestations of land degradation. The determination of the location and shape of gullies is crucial for the work of gully erosion [...] Read more.
Black soil is a precious soil resource, yet it is severely affected by gully erosion, which is one of the most serious manifestations of land degradation. The determination of the location and shape of gullies is crucial for the work of gully erosion control. Traditional field measurement methods consume a large amount of human resources, so it is of great significance to use artificial intelligence techniques to automatically extract gullies from satellite remote sensing images. This study obtained the gully distribution map of the southwestern region of the Dahe Bay Farm in Inner Mongolia through field investigation and measurement and created a gully remote sensing dataset. We designed a multi-scale content structure feature extraction network to analyze remote sensing images and achieve automatic gully extraction. The multi-layer information obtained through the resnet34 network is input into the multi-scale structure extraction module and the multi-scale content extraction module designed by us, respectively, obtained richer intrinsic information about the image. We designed a structure content fusion network to further fuse structural features and content features and improve the depth of the model’s understanding of the image. Finally, we designed a muti-scale feature fusion module to further fuse low-level and high-level information, enhance the comprehensive understanding of the model, and improve the ability to extract gullies. The experimental results show that the multi-scale content structure feature extraction network can effectively avoid the interference of complex backgrounds in satellite remote sensing images. Compared with the classic semantic segmentation models, DeepLabV3+, PSPNet, and UNet, our model achieved the best results in several evaluation metrics, the F1 score, recall rate, and intersection over union (IoU), with an F1 score of 0.745, a recall of 0.777, and an IoU of 0.586. These results proved that our method is a highly automated and reliable method for extracting gullies from satellite remote sensing images, which simplifies the process of gully extraction and provides us with an accurate guide to locate the location of gullies, analyze the shape of gullies, and then provide accurate guidance for gully management. Full article
Show Figures

Figure 1

25 pages, 11774 KB  
Article
CR-YOLOv9: Improved YOLOv9 Multi-Stage Strawberry Fruit Maturity Detection Application Integrated with CRNET
by Rong Ye, Guoqi Shao, Quan Gao, Hongrui Zhang and Tong Li
Foods 2024, 13(16), 2571; https://doi.org/10.3390/foods13162571 - 17 Aug 2024
Cited by 11 | Viewed by 2128
Abstract
Strawberries are a commonly used agricultural product in the food industry. In the traditional production model, labor costs are high, and extensive picking techniques can result in food safety issues, like poor taste and fruit rot. In response to the existing challenges of [...] Read more.
Strawberries are a commonly used agricultural product in the food industry. In the traditional production model, labor costs are high, and extensive picking techniques can result in food safety issues, like poor taste and fruit rot. In response to the existing challenges of low detection accuracy and slow detection speed in the assessment of strawberry fruit maturity in orchards, a CR-YOLOv9 multi-stage method for strawberry fruit maturity detection was introduced. The composite thinning network, CRNet, is utilized for target fusion, employing multi-branch blocks to enhance images by restoring high-frequency details. To address the issue of low computational efficiency in the multi-head self-attention (MHSA) model due to redundant attention heads, the design concept of CGA is introduced. This concept aligns input feature grouping with the number of attention heads, offering the distinct segmentation of complete features for each attention head, thereby reducing computational redundancy. A hybrid operator, ACmix, is proposed to enhance the efficiency of image classification and target detection. Additionally, the Inner-IoU concept, in conjunction with Shape-IoU, is introduced to replace the original loss function, thereby enhancing the accuracy of detecting small targets in complex scenes. The experimental results demonstrate that CR-YOLOv9 achieves a precision rate of 97.52%, a recall rate of 95.34%, and an mAP@50 of 97.95%. These values are notably higher than those of YOLOv9 by 4.2%, 5.07%, and 3.34%. Furthermore, the detection speed of CR-YOLOv9 is 84, making it suitable for the real-time detection of strawberry ripeness in orchards. The results demonstrate that the CR-YOLOv9 algorithm discussed in this study exhibits high detection accuracy and rapid detection speed. This enables more efficient and automated strawberry picking, meeting the public’s requirements for food safety. Full article
Show Figures

Figure 1

19 pages, 7563 KB  
Article
HR-YOLOv8: A Crop Growth Status Object Detection Method Based on YOLOv8
by Jin Zhang, Wenzhong Yang, Zhifeng Lu and Danny Chen
Electronics 2024, 13(9), 1620; https://doi.org/10.3390/electronics13091620 - 24 Apr 2024
Cited by 12 | Viewed by 3010
Abstract
Crop growth status detection is significant in agriculture and is vital in planting planning, crop yield, and reducing the consumption of fertilizers and workforce. However, little attention has been paid to detecting the growth status of each crop. Accuracy remains a challenging problem [...] Read more.
Crop growth status detection is significant in agriculture and is vital in planting planning, crop yield, and reducing the consumption of fertilizers and workforce. However, little attention has been paid to detecting the growth status of each crop. Accuracy remains a challenging problem due to the small size of individual targets in the image. This paper proposes an object detection model, HR-YOLOv8, where HR means High-Resolution, based on a self-attention mechanism to alleviate the above problem. First, we add a new dual self-attention mechanism to the backbone network of YOLOv8 to improve the model’s attention to small targets. Second, we use InnerShape(IS)-IoU as the bounding box regression loss, computed by focusing on the shape and size of the bounding box itself. Finally, we modify the feature fusion part by connecting the convolution streams from high resolution to low resolution in parallel instead of in series. As a result, our method can maintain a high resolution in the feature fusion part rather than recovering high resolution from low resolution, and the learned representation is more spatially accurate. Repeated multiresolution fusion improves the high-resolution representation with the help of the low-resolution representation. Our proposed HR-YOLOv8 model improves the detection performance on crop growth states. The experimental results show that on the oilpalmuav dataset and strawberry ripeness dataset, our model has fewer parameters compared to the baseline model, and the average detection accuracy is 5.2% and 0.6% higher than the baseline model, respectively. Our model’s overall performance is much better than other mainstream models. The proposed method effectively improves the ability to detect small objects. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

14 pages, 7691 KB  
Article
Improving Semantic Segmentation via Decoupled Body and Edge Information
by Lintao Yu, Anni Yao and Jin Duan
Entropy 2023, 25(6), 891; https://doi.org/10.3390/e25060891 - 2 Jun 2023
Cited by 1 | Viewed by 2991
Abstract
In this paper, we propose a method that uses the idea of decoupling and unites edge information for semantic segmentation. We build a new dual-stream CNN architecture that fully considers the interaction between the body and the edge of the object, and our [...] Read more.
In this paper, we propose a method that uses the idea of decoupling and unites edge information for semantic segmentation. We build a new dual-stream CNN architecture that fully considers the interaction between the body and the edge of the object, and our method significantly improves the segmentation performance of small objects and object boundaries. The dual-stream CNN architecture mainly consists of a body-stream module and an edge-stream module, which process the feature map of the segmented object into two parts with low coupling: body features and edge features. The body stream warps the image features by learning the flow-field offset, warps the body pixels toward object inner parts, completes the generation of the body features, and enhances the object’s inner consistency. In the generation of edge features, the current state-of-the-art model processes information such as color, shape, and texture under a single network, which will ignore the recognition of important information. Our method separates the edge-processing branch in the network, i.e., the edge stream. The edge stream processes information in parallel with the body stream and effectively eliminates the noise of useless information by introducing a non-edge suppression layer to emphasize the importance of edge information. We validate our method on the large-scale public dataset Cityscapes, and our method greatly improves the segmentation performance of hard-to-segment objects and achieves state-of-the-art result. Notably, the method in this paper can achieve 82.6% mIoU on the Cityscapes with only fine-annotated data. Full article
Show Figures

Figure 1

Back to TopTop