4.3. Ablation Experiment
To validate the effectiveness of GhostConv, C2f-RepGhost, and CBAM in improving the YOLOv8n model, ablation experiments were conducted on a multi-scale rice pest and disease dataset. The experiments involved eight models, comparing the performance changes between the improved models and the original model across various metrics. The experimental results are shown in
Table 4. The results were analyzed to assess the impact of different modules on model improvement.
Case 2: Replacing the convolutional layers in the base YOLOv8n network with GhostConv, while keeping the first layer as regular convolution. The model showed a 9.46% reduction in parameter count and an 8.54% reduction in computational load. However, there was a drop in accuracy, with recall decreasing by 0.8 percentage points.
Case 3: Replacing the C2f layers in the YOLOv8n network with C2f-RepGhost layers. This resulted in a 26.10% reduction in parameters and a 23.17% reduction in computational load, but the accuracy metrics all declined, with recall dropping by 1.6 percentage points.
Case 4: Adding the CBAM module before the SPPF layer in the YOLOv8n network. This caused a 2.19% increase in parameter count and a 1.22% increase in computational cost. Notably, while mAP50 increased by 0.1, recall decreased by 0.4 percentage points.
In Cases 2, 3, and 4, each modification only introduced one module compared to the original model. Cases 2 and 3, which use lightweight modules, achieved reductions in parameters and computational load. However, the simplification of feature extraction operations led to a decrease in recall. The introduction of the CBAM module in Case 4 also resulted in reduced recall. The main function of CBAM is to emphasize important features and suppress irrelevant ones, but some important features are mistakenly regarded as irrelevant, leading to a decrease in model accuracy.
Case 5: Replacing the C2f layers with C2f-RepGhost layers and changing all regular convolution layers (except for the first layer) to GhostConv. This resulted in a 35.39% reduction in parameters and a 29.27% reduction in computational load, but all accuracy metrics declined, with recall decreasing by 2.6 percentage points.
Case 6: Replacing C2f layers with C2f-RepGhost layers and adding CBAM before the SPPF layer. This resulted in a 23.91% reduction in parameters and a 21.95% reduction in computational load. mAP50 increased by 1.6, while accuracy dropped by 0.1 percentage points, and recall improved by 0.8 percentage points.
Case 7: Replacing the convolutional layers in the base YOLOv8n network with GhostConv, while keeping the first layer as regular convolution, and adding CBAM before the SPPF layer. The model showed a 7.27% reduction in parameters and a 7.32% reduction in computational load. mAP50 increased by 1.9, accuracy decreased by 0.5 percentage points, and recall increased by 2.5 percentage points.
In Cases 5, 6, and 7, two modules were combined to modify the original model. In Case 5, the combination of two lightweight modules led to notable reductions in both parameters and computational load. However, this combination resulted in the greatest decline in accuracy, with recall decreasing by 2.6 percentage points, indicating a severe issue with missed detections.
In Cases 6 and 7, which combine one lightweight module with CBAM, accuracy showed a slight decrease, but other metrics improved, and reductions in parameters and computational load were achieved. This indicates that combining a lightweight module with CBAM resulted in better model accuracy compared to adding CBAM alone. The lightweight module simplified the feature extraction process, reducing the number of features, while CBAM emphasized the extraction of important features, improving recall. However, some irrelevant features were still mistakenly identified as important, leading to a slight decrease in accuracy.
Comparing Case 6 and Case 7 reveals the differences between the Ghost and RepGhost networks. The Ghost network achieved greater improvements in recall and mAP but also resulted in a larger drop in accuracy. The RepGhost network, an improvement on Ghost, reduces the model’s dependence on data distribution by introducing the BN layer, thus enhancing generalization. Therefore, the accuracy decline was smaller in the RepGhost network.
RGC-YOLO: This model incorporates all the improved modules. It reduces parameters by 33.20% and computational load by 29.27%, while improving overall accuracy. mAP increased by 2.4, accuracy improved by 1.8 percentage points, and recall increased by 2.1 percentage points. The inclusion of both lightweight modules minimized the extraction of irrelevant features, while the CBAM module emphasized important features, resulting in improved model accuracy. The results demonstrate that RGC-YOLO not only meets the lightweight design requirements but also shows improvements in accuracy.
Furthermore, the prediction results of the ablation experiment are shown in
Figure 8. It can be observed that for larger-scale targets like Bacterial Blight, the models improved with the lightweight modules GhostConv and C2f_RepGhost exhibited some missed detections. However, after adding the hybrid attention module, the missed detection issue was notably reduced. On the other hand, due to the Intersection over Union (IoU) threshold of 0.7 for Non-Maximum Suppression (NMS), the Case 4 and Case 6 models experienced overlapping predicted boxes in the detection of brown spot disease. By introducing GhostConv, redundant feature maps were successfully reduced, effectively alleviating the issue of overlapping predicted boxes.
For the detection of small-scale targets like Rice Planthopper, Case 1 showed both missed detections and false detections. This was mainly due to the functionality of the C2f module, which refines and enhances features through multiple layers to capture more complex details. However, this also led to some unimportant features being misclassified as target features, causing false detections. To address this issue, the attention mechanism was added. This improved the model’s ability to extract important target features and effectively suppress irrelevant features, resulting in more accurate detection of Rice Planthopper.
4.4. Heatmap Analysis of the Attention Mechanism
To further analyze the performance of the lightweight improved model RGC-YOLO and the impact of different modules in the model improvements, heatmap testing was conducted on the test set. During the testing, a confidence threshold of 0.0001 was set to generate clear heatmaps. Heatmaps visually demonstrate which areas of the image have the greatest impact on the model’s prediction results. To ensure the heatmap output is from the same output layer, for the network without the CBAM module, the heatmap from the 21st layer of the network was extracted for analysis. For the network with the CBAM module, the heatmap from the 22nd layer of the network was extracted for analysis. In the heatmap, the red areas show where the model focuses the most, indicating a strong contribution to detection. The yellow areas represent regions with less attention, while the blue areas reflect minimal impact on target detection, marking them as redundant information. The feature visualization results are shown in
Figure 9. For Case 1, Case 2, Case 3, and Case 5, before adding the CBAM module, the model’s feature extraction results were relatively scattered, and the areas of focus were not prominent. For Case 4, Case 6, Case 7, and RGC-YOLO, which included the CBAM module, the heatmaps show that the areas of focus are close to rectangular shapes. The Case 4 model is based on the YOLOv8n framework with the direct addition of the CBAM module. Visualization results indicate that the model’s attention is predominantly concentrated in a rectangular region near the center of the image, with progressively lower attention levels toward the edges of the image. Case 6, based on Case 3 with the CBAM module added, enhanced attention to the edges. Case 7, which builds on Case 2 with the addition of the CBAM module, also increased attention to the edge areas. However, the feature extraction for Blast did not display clear distribution patterns. The use of GhostConv to replace Conv simplified the feature map generation process, resulting in incomplete feature extraction. However, RGC-YOLO, which replaced the C2f layer with the C2f RepGhost layer and replaced all Conv layers (except the first layer) with GhostConv, while also adding the CBAM module before the SPPF layer in the backbone network, reduced the generation of redundant feature maps and increased the attention to key features. The areas of focus were closely aligned with the pest and disease regions, resulting in notable improvements in feature extraction.
Table 5 presents the training results of Case 1 and RGC-YOLO on the multi-scale dataset. The results show that, compared to the base network YOLOv8n (Case 1), RGC-YOLO has improved the recognition accuracy for all four types of rice pests and diseases. Specifically, for large-scale diseases like Rice Bacterial Blight, the recognition accuracy did not improve. This is because the disease features of bacterial leaf blight are prominent, and the lesion area is relatively large, allowing the base network to sufficiently extract its lesion characteristics. However, for medium-scale diseases such as rice blasts and brown spots, the accuracy of RGC-YOLO improved. The challenge in recognizing Rice Blast lies in the variability of lesion shape during different stages of the disease, while brown spot disease has smaller Ground Truth Bounding. Box sizes but more distinct lesion features. RGC-YOLO effectively focuses on important features through the hybrid attention mechanism, suppressing irrelevant features, which led to a substantial improvement in recognition accuracy.
4.5. Comparative Experiment
To validate the performance of the proposed model, RGC-YOLO was compared with several state-of-the-art object detection models, including YOLOv5s, Faster RCNN, SSD, and the lightweight YOLOv8-Ghost model, which is built using GhostNet as its backbone. The comparison results are summarized in
Table 6.
In terms of mAP50, RGC-YOLO achieved the highest performance, surpassing YOLOv5s, Faster RCNN, SSD, and YOLOv8-Ghost by 6.6, 15.5, 4.8, and 4.8 percentage points, respectively. Notably, for recall, RGC-YOLO achieved an impressive 90.8%, outperforming other models and effectively addressing the issue of missed detections.
Regarding precision, RGC-YOLO and SSD both achieved a value of 88%, the highest among all models. In comparison, YOLOv5s, Faster RCNN, and YOLOv8-Ghost were 7.2, 27, and 1.5 percentage points lower than RGC-YOLO, respectively. These results highlight the superior overall performance of RGC-YOLO in terms of both precision and recall.
In terms of model parameters and floating-point operations per second (FLOPs), RGC-YOLO demonstrated a significant advantage over larger models such as Faster RCNN and SSD, with its parameter count being only 1/14 and 1/13 of these models, respectively. Similarly, the FLOPs of RGC-YOLO were approximately one-tenth of those of Faster RCNN and SSD. For memory usage, RGC-YOLOs weight file size was only 4.21 MB, considerably smaller than Faster RCNN (108 MB) and SSD (91.7 MB). Although YOLOv8-Ghost used 14.25% less memory, 13.79% fewer FLOPs, and 14.76% fewer parameters than RGC-YOLO, its accuracy was notably lower, especially in recall, which reached only 78.6%. Such a low recall rate can lead to severe missed detection issues in real-time scenarios.
In terms of inference time, the Faster-RCNN model takes the longest time for a single image, approximately 151 times longer than the RGC-YOLO model. The SSD model has considerably reduced inference time compared to Faster-RCNN, but it still takes approximately 23 times longer. Among the YOLO series models, YOLOv5s inference time per image is 0.3 milliseconds shorter than RGC-YOLO, but in terms of accuracy, YOLOv5 performs worse on all metrics, with a recall rate 10 percentage points lower than RGC-YOLO. Although the YOLOv8-Ghost model has fewer parameters and GFLOPS, its inference time per image is longer than RGC-YOLO. This is due to the BN layer in the repghost module of RGC-YOLO, which merges parameters into the convolutional layers during inference, thus reducing memory usage.
We compared RGC-YOLO with existing models such as Faster R-CNN and SSD and found that RGC-YOLO not only has a significant advantage in accuracy but also features smaller model parameters and computational cost, along with a shorter inference time. In terms of inference time per image, the Faster-RCNN and SSD models are approximately 151 times and 23 times slower than RGC-YOLO, respectively. Furthermore, compared to other models in the YOLO series, such as YOLOv5 and YOLO-GHOST, RGC-YOLO exhibits superior recognition accuracy. Specifically, it outperforms YOLOv5 and YOLOv8-Ghost by 10 and 12.2 percentage points in recall rate, respectively. Moreover, the differences in parameter size and computational cost are minimal, enabling RGC-YOLO to maintain a lightweight design while notably improving accuracy.