Next Article in Journal
Histological and Molecular Characterization of the Musa spp. x Pseudocercospora musae Pathosystem
Previous Article in Journal
Estimation of Surface Ozone Effects on Winter Wheat Yield across the North China Plain
Previous Article in Special Issue
Analysis of the Factors Affecting the Deposition Coverage of Air-Assisted Electrostatic Spray on Tomato Leaves
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

YOLOv5s-ECCW: A Lightweight Detection Model for Sugarcane Smut in Natural Environments

1
School of Mathematics and Computing Science, Guilin University of Electronic Technology, Guilin 541004, China
2
Sugarcane Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
3
Key Laboratory of Sugarcane Biotechnology and Genetic Improvement (Guangxi), Ministry of Agriculture, Nanning 530007, China
4
Guangxi Key Laboratory of Sugarcane Genetic Improvement, Nanning 530007, China
5
Agricultural Science and Technology Information Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
6
Biotechnology Research Institute, Guangxi Academy of Agricultural Sciences, Nanning 530007, China
7
Baise Agricultural Scientific Research Institute, Baise 533612, China
*
Authors to whom correspondence should be addressed.
These authors have contributed equally to this work.
Agronomy 2024, 14(10), 2327; https://doi.org/10.3390/agronomy14102327
Submission received: 12 September 2024 / Revised: 2 October 2024 / Accepted: 7 October 2024 / Published: 10 October 2024
(This article belongs to the Special Issue In-Field Detection and Monitoring Technology in Precision Agriculture)

Abstract

:
Sugarcane smut, a serious disease caused by the fungus Sporosorium scitamineum, can result in 30% to 100% cane loss. The most affordable and efficient measure of preventing and handling sugarcane smut disease is to select disease-resistant varieties. A comprehensive evaluation of disease resistance based on the incidence of smut disease is essential during the selection process, necessitating the rapid and accurate identification of sugarcane smut. Traditional identification methods, which rely on visual observation of symptoms, are time-consuming, costly, and inefficient. To address these limitations, we present the lightweight sugarcane smut detection model (YOLOv5s-ECCW), which incorporates several innovative features. Specifically, the EfficientNetV2 is incorporated into the YOLOv5 network to achieve model compression while maintaining high detection accuracy. The convolutional block attention mechanism (CBAM) is added to the backbone network to improve its feature extraction capability and suppress irrelevant information. The C3STR module is used to replace the C3 module, enhancing the ability to capture global large targets. The WIoU loss function is used in place of the CIoU one to improve the bounding box regression’s accuracy. The experimental results demonstrate that the YOLOv5s-ECCW model achieves a mean average precision (mAP) of 97.8% with only 4.9 G FLOPs and 3.25 M parameters. Compared with the original YOLOv5, our improvements include a 0.2% increase in mAP, a 54% reduction in parameters, and a 70.3% decrease in computational requirements. The proposed model outperforms YOLOv4, SSD, YOLOv5, and YOLOv8 in terms of accuracy, efficiency, and model size. The YOLOv5s-ECCW model meets the urgent need for the accurate real-time identification of sugarcane smut, supporting better disease management and selection of resistant varieties.

1. Introduction

China, following Brazil and India, is the third largest sugarcane grower in the world. Sugarcane has historically been the main source of sugar. It is also a significant commercial crop in China [1]. Sugarcane smut is the most significant disease affecting sugarcane production and economic benefits in China. The fungus Sporisorium scitamineum is the causal agent of the disease. Its most characteristic symptom is a downward curling black whip at the tip of the susceptible cane, ranging in length from a few centimeters to tens of centimeters [2,3]. Sugarcane smut is greatly controlled if resistant varieties are planted. The quick and precise recognition of black whip is a key stage in the process of selecting and breeding smut-resistant sugarcane varieties. Traditionally, sugarcane smut identification has relied heavily on manual effort, thus being flexible and simple. However, expanding the sugarcane planting area will unavoidably result in more work and decreased identification efficiency, making the method of relying on manual identification less applicable.
Image processing and deep learning technologies are gradually replacing visual observation of symptoms and are widely used in agriculture. They can detect qualities or traits like pests and diseases that negatively affect agricultural production. Therefore, this paper focuses on using advanced techniques to quickly and accurately identify sugarcane smut. Previous research on the image recognition of sugarcane diseases has primarily focused on common diseases, yielding significant results. Ref. [4] segmented the disease symptoms in the images using color transformation and color histograms. They identified sugarcane diseases like red streak, ring spot, orange rust, and red rot based on the maximum likelihood of occurrence of typical symptoms. Ref. [5] designed a mobile tool for identifying sugarcane yellow spot disease based on a support vector machine, thus describing and classifying the differences between healthy and diseased leaves. Ref. [6] developed a web application to detect sugarcane diseases. It employed the k-means clustering algorithm and adaptive histogram equalization to preprocess leaf pictures and segment lesion areas. The SVM classifier combined with multivariate feature extraction was then used to identify eye spot, red rot, and ring spot.
Deep learning surpasses traditional manual feature extraction by employing convolutional neural networks (CNNs) to automatically create feature extractors based on learned weight parameters, improving generalization. Therefore, deep learning methods have been increasingly applied to sugarcane diseases. Ref. [7] created a deep transfer learning model using quantum-behaved particle swarm optimization to accurately recognize and classify sugarcane leaf disease. Ref. [8] utilized the inception nadam L2 regularized gradient descent (NLRGD) CNN model to classify mosaic, yellow leaf, red rot, red-orange rust, and smut, achieving 96.75% accuracy. Ref. [9] evaluated VGG-19 and VGG-16, finding that VGG-19 more effectively classified diseases in images of healthy and diseased sugarcane. Ref. [10] proposed three methods for classifying sugarcane leaf diseases based on DenseNet121 and vision transformers, with accuracies of 92.87%, 93.34%, and 87.37%. Ref. [11] proposed the shuffle-convolutional-based lightweight vision transformer (SLViT), which achieved an accuracy of 87.64% in classifying sugarcane leaf diseases. Ref. [12] used VGG-16 for classifying healthy sugarcane and smut, reaching an accuracy of 94.74%.
While these methods can accurately classify sugarcane diseases, they cannot identify the target disease’s precise location. In contrast, object detection algorithms combine localization and classification to locate the target and receive specific information. There are two types of object detection algorithm: single stage and two stage. The single-stage algorithms, such as you only look once (YOLO), offer faster detection speed, making them more practical, though they may have a slightly lower accuracy than the two-stage algorithms. For example, Ref. [13] used the YOLO convolutional neural network algorithm to construct a diagnosis system that automatically identified sugarcane red stripe. Ref. [14] evaluated four models, Faster R-CNN, YOLOR, DETR, and YOLOv5, for detecting sugarcane white leaf disease (WLD). The experiment concluded that the YOLOv5 model is superior to others in terms of accuracy, [email protected], model size, and [email protected]. This indicated that the YOLOv5 model is suitable for detecting WLD. The YOLO algorithm demonstrates superior performance in existing sugarcane disease detection tasks, providing new directions for sugarcane smut disease detection.
The YOLO algorithm has gone through several iterations and improvements since it was proposed. However, YOLOv5 is still the most widely used version in various fields with high detection precision and inference speed. In this study, we aim to construct a lightweight object detection model using the YOLOv5 algorithm to achieve fast and accurate identification of sugarcane smut in natural environments. To achieve this purpose, we employ YOLOv5 as the object detection framework, EfficientNetV2 as the backbone network, introduce the CBAM and C3STR module to enhance the feature extraction capability, and adopt the WIoU loss function to enhance the convergence speed. By optimizing the YOLO algorithm, the proposed algorithm achieves efficient recognition of sugarcane smut at low energy consumption, which is advantageous for mobile device applications. As far as we know, this is the first example of sugarcane smut detection using the YOLOv5s-ECCW model.
The rest of this paper is organized as follows: Section 2 describes the dataset’s collection and preprocessing, the proposed model, and the improvement of the corresponding module. The results and analysis of the experiment are presented in Section 3. Then, Section 4 discusses the research’s important findings and their significance in the context of previous studies and future directions. Finally, Section 5 summarizes the conclusions.

2. Materials and Methods

2.1. Dataset

This study’s research object is sugarcane smut in natural environments. The most characteristic symptom of the disease is the presence of a black whip. Therefore, smut disease is identified by determining whether the sugarcane plants in images contain black whips. Smut disease images were captured at 10 sugarcane planting bases located in Laibin, Chongzuo, Nanning, Liuzhou, Guigang, and Baise in Guangxi. Images were captured using the Canon R digital camera (manufactured by Canon Inc., located in Tokyo, Japan), with each image containing one or more black whip target. To increase data diversity, the shooting angle and position were adjusted under natural light. After manually removing duplicates and blurring the collected images, 1093 images were obtained. The targets were then manually labeled using the labeling image annotation tool, saving the annotation results in XML format and setting the category label to smut. The data enhancement code was used to process the images and their annotation boxes by translating, altering brightness, adding noise, cutting, and flipping, resulting in the creation of a dataset with 3279 images. Finally, an 8:1:1 random split of the dataset was performed, resulting in three groups: a test set, a validation set, and a training set. The training set contained 2623 images, the test set contained 328 images, and the validation set contained 328 images.

2.2. YOLOv5 Network

YOLOv5 is a one-stage object detection method and is widely applied in agriculture. The flexibility and faster detection speed of this algorithm made it more advantageous than other detection models for model deployment. Therefore, YOLOv5 was chosen for improvement in this study. Four versions of the YOLOv5 network model were divided based on network width and depth: s, m, l, and x [15]. The YOLOv5 model was selected as the baseline in order to lower the hardware requirements. It had the fewest parameters and the fastest training speed. The YOLOv5 source code is available here: https://github.com/ultralytics/yolov5 (accessed on 12 July 2024).
The YOLOv5 network structure consisted of four components: input, backbone, neck, and head. The input image was preprocessed using mosaic data augmentation, adaptive anchor frame calculation, and adaptive image scaling. The backbone feature extraction framework employed an improved darknet network structure (CSPDarknet53), utilizing modules like Focus, Conv, C3, and SPP (spatial pyramid pooling). Figure 1 shows the specific operation of the Focus slice. The neck adopted the FPN + PAN feature fusion module to merge features from multiple layers in order to collect more comprehensive information. The head part included multiple convolutional and pooling layers for converting feature mappings into target frames, confidence scores, and category probabilities. In short, YOLOv5 can locate target objects in an image and classify them accordingly, with efficient, accurate, and fast object detection capabilities. Compared with the traditional YOLO series, YOLOv5 uses smaller convolutional kernels and fewer convolutional layers, reducing the model’s complexity and computation.

2.3. Improved Lightweight YOLOv5 Network Design

2.3.1. EfficientnetV2 Network

Due to the constrained computation resources and memory capacity of mobile devices, it is challenging to deploy convolutional neural networks. Therefore, a lightweight model is needed to reduce computational and memory overheads. The lightweight network (EfficientnetV2) can lower parameters and computational complexity while maintaining detection precision. It balances depth, width, and resolution to maximize network performance with limited computational resources. Compared with EfficientNetV1, EfficientNetV2 reduced model parameters and computations by combining several residual blocks to create a single convolutional layer [16]. In addition, the EfficientNetV2 network introduced an incremental learning strategy and an adaptive regular strength adjustment mechanism to improve the training accuracy for images of different sizes [17]. This led to the training-aware NAS strategy yielding the best grouping of Fused-MBConv and MBConv. The Fused-MBConv was employed for shallow convolution, and MBConv was used for deep convolution [18]. This resulted in a faster training speed for the EfficientNetV2 model under the same parameters and hardware conditions [16]. Regarding computational efficiency, EfficientNetV2 was more advantageous for mobile or resource-limited devices. Therefore, the base backbone network for this research was the EfficientnetV2-s network.
Table 1 shows the parameter settings of the EfficientnetV2-s network. The EfficientnetV2-s network contained 8 stages, in which Stage 0 performed 3 × 3 ordinary convolution operations, Stage 1 to 3 performed Fused-MBConv, Stage 4 to 6 performed MBConv, and Stage 7 performed 1 × 1 convolution, pooling, and fully connected operations. Because the study object contained one class of sugarcane smut that required feature extraction rather than classification, the first seven stages of EfficientnetV2-s were used.

2.3.2. CBAM Module

When collecting image data of sugarcane smut, the disease targets in images may be obscured by the complex background and light intensity variations, easily causing the phenomenon of missing and erroneous detection. The convolutional block attention mechanism (CBAM) was incorporated with the backbone to solve these problems [19]. The CBAM module has the ability to suppress irrelevant features and concentrate on the critical elements of the disease. The module can enhance the network’s capacity to extract key traits of disease targets while successfully lessening the network’s sensitivity to noise and irrelevant information [17].
The channel and spatial attention module were the two main parts of the CBAM module [20]. Its overall structure is shown in Figure 2 below. In the channel attention module, a H × W × C feature was input into the maximum and average pooling layers, respectively. The input features were then calculated by the shared fully connected layer to obtain two processed channel features. The Sigmoid activation function was used to determine the weight of each channel in the input feature layer after the two features were added. The weight and the original input feature were multiplied to obtain the channel attention feature. The average and maximum pooling layers from the spatial attention module were concatenated to process the feature that came from the channel attention module. Then, the channel information was fused using a 7 × 7 convolution kernel, and the convolved result was subjected to a Sigmoid function to obtain the spatial weights. Finally, the final processing result was obtained by multiplying the spatial weights and the original input feature.

2.3.3. C3STR Module

Due to the varying sizes of sugarcane smut in fields, targets too large or too small were easily missed in the detection process. To improve the ability to recognize sugarcane smut, the C3 module of the large-size detection head was replaced by the C3STR module. The C3STR module was an improved version based on the Swin transformer, which uses the STR module to replace the bottleneck module in the C3 structure [21]. It improved the detection performance of global large targets with a slight increase in parameters and computation. The specific structure is given in Figure 3. The STR module of the C3STR module was a pairwise combination of two different Swin transformer blocks, one of which used the window multi-head self-attention module (W-MSA), and the other used the sliding window multi-head self-attention module (SW-MSA) [22]. This combination can expand the sensing field of smut features and effectively transfer feature information between neighboring windows, thus enhance the extraction of smut target features. The C3STR module can more effectively gather contextual information and capture global information compared with the original C3 module.

2.3.4. WIoU Loss Function

The difference between the actual label and the prediction result is measured by the loss function. When the predicted outcome is nearer to the real label, the loss function’s value decreases. In the YOLOv5 network, the loss function adopts CIoU (complete IoU loss). This method takes into account the centroid distance, overlap area, and aspect ratio between the actual and anticipated boxes. It increases the stability of the target box regression [23].
However, the CIoU loss function’s penalty term decreases to 0 when the aspect ratio of the predicted frame to the actual frame is linear [24]. At this point, both high-quality and low-quality anchor frames will adversely affect the regression loss. This study selected WIoU (Wise-IoU) as the loss function rather than the initial CIoU in order to increase the accuracy of sugarcane smut detection. The WIoU loss function removed the penalty term for aspect ratio from the CIoU. This solved the bounding box regression balancing problem between high-quality and low-quality anchor frames, improving the model’s overall detection performance. The WIoU loss is calculated as follows:
L W I o U = r R W I o U L I o U
r = β δ α β δ
R W I o U = e x p x x g t 2 + y y g t 2 W g 2 + H g 2
L I o U = 1 I o U
where r is the gradient gain, β is the anomaly, α and δ are hyperparameters (here, the values are 1.9 and 3), W g and H g are the width and height of the smallest bounding box, x and y denote the coordinates of the predicted box, and x g t and y g t denote the coordinates of the ground truth box.

2.3.5. Proposed Model

After the above analysis, the YOLOv5 algorithm was improved to cope with the identification and detection of sugarcane smut in complex environments. The following were the specific improvements. The EfficientnetV2 lightweight network took the role of the original backbone, reducing computational complexity and model parameters while maintaining accuracy. Target leakage was fixed when the CBAM module was added to the backbone network, allowing the model to locate and identify targets more accurately. Additionally, to improve feature extraction and detection performance for global large targets, the C3STR module took the role of the C3 module in the large-size detection head. The WIoU loss function was ultimately used in place of the CIoU one to improve the bounding box regression’s accuracy. The overall structure of the enhanced lightweight network model YOLOv5s-ECCW is shown in Figure 4.

2.4. Model Training Environment Parameter Configuration

The experiments in this paper used identical software and hardware environments, as shown in Table 2. The algorithm’s training parameters were configured as follows. The input image size was set to 640 × 640, and the batch size was set to 16. The epoch number was 300 and the original learning rate was 0.01. The decay learning rate was 0.0005 and the momentum was 0.937. The network parameters were optimized using the SGD optimizer. The training adopted the warm-up strategy. The learning rate linearly climbed to the initial learning rate in the first three training rounds and then slowly decreased. The remaining hyperparameters adopted the default values from the hyp.scratch.yaml file in YOLOv5.

2.5. Model Evaluation

Some specific metrics were needed to comprehensively evaluate the detection performance of the improved YOLOv5 model. FPS, [email protected], precision, and recall were mainly used to assess the model’s detection accuracy. Model size, GFLOPs, and parameter were utilized to evaluate the model’s lightweight status. Among them, mean average precision (mAP) with an IoU threshold of 0.5 was the primary evaluation index for model identification accuracy. The calculation of the above metrics is shown in Equations (5)–(7).
P r e c i s i o n = T P T P + F P
R e c a l l = T P T P + F N
m A P = i = 1 N A P i N
where T P , F P , and F N stand for true positive, false positive, and false negative, respectively. The A P i is the average precision value at the i-th species. N is the total number of species.

3. Results

3.1. Comparison of Different Backbone Networks Based on YOLOv5

Performance comparisons with alternative backbone networks were conducted in order to demonstrate the superiority of EfficientnetV2. The training and validation procedure for creating models based on various backbone networks is displayed in Figure 5. Regarding the convergent speed and mAP, the Swin transformer performed better than other backbone networks, as shown in Figure 5. The EfficientnetV2 outperformed the MobilenetV3 and ShufflenetV2, while having somewhat lower mAP and convergence speed than the Swin transformer. Table 3 displays each backbone network’s overall performance on the test set. Regarding model accuracy, the EfficientnetV2 had lower detection performance than the Swin transformer. Using the Swin transformer to replace the original network resulted in larger parameters, computations, and weights, making the model less lightweight. Compared with EfficientnetV2, models using MobilenetV3 or ShufflenetV2 had fewer parameters, calculations, and weight sizes but significantly lower accuracy. The EfficientnetV2 network had higher P, R, and mAP values than the other networks, at 97.7%, 93.5%, and 97.5%, respectively. The model using the EfficientnetV2 network had 3.90 M parameters, 5.1 G FLOPs, and an 8 MB model weight. This improvement can ensure high accuracy based on achieving a lightweight model. Therefore, selecting the EfficientnetV2 network for a lightweight model was practical and feasible.

3.2. Comparison of Adding a C3STR Module in Different Positions

To validate the detection effect of the C3STR module in the large-size detection head, a comparison study was performed. In Table 4, the C3STR module took the place of the C3 module in the large-size and medium-size detection heads, as indicated by the terms “large-C3STR” and “medium-C3STR”. The term “all-C3STR” denotes that three C3 modules—one for the small-, one for the medium-, and one for the large-size detecting head—are replaced by one C3STR module. As shown by the data in Table 4, the large-C3STR exhibited the highest mAP value (97.8%) of the three replacement strategies. The results showed that the C3STR module was a more effective replacement for the C3 module in the large-size detection head than the other two strategies. Furthermore, it can improve the recognition accuracy of sugarcane smut. It was discovered that the model replacing the C3 module in the large-size detection head had the best detection accuracy after comprehensively comparing all of the indicators. In comparison with the model prior to replacement, it also significantly reduced the computation load, parameter count, and weight size generated by the model.

3.3. Ablation Experiment

The efficacy of every improvement component proposed was confirmed using ablation experiments. The experiments were conducted by adding only one improvement part at a time, with all other network settings unchanged. A total of five sets of networks were verified. Table 5 demonstrates the model comparison results. When the EfficientnetV2 replaced the YOLOv5 model’s backbone, the value of mAP decreased by 0.1%. At the expense of a slight loss in detection accuracy, the modified model’s parameters and computation were significantly reduced. The model’s weight was also decreased, making the network lightweight. Based on this, the bounding box loss function was modified to WIOU. The [email protected] was improved by 0.2%, showing that WIOU outperformed the original CIOU. Then, the CBAM module was added to the EfficientnetV2 network. The attention mechanism enhanced precision and recall by 0.3% and 2.2% compared with the backbone network using only EfficientnetV2, which indicated its effectiveness in extracting key features of sugarcane smut. Finally, the C3 module in the large-size detection head was replaced with the C3STR module. While lowering the amount of parameters, computation, and model weight, the [email protected] was improved by 0.1%.
Figure 6 shows the Min-Max standard histogram, which was used to visualize model performance with multiple indicators. To measure performance, the opposite values of the three indicators (GFLOPs, parameter, and weight size) were taken based on the principle that higher ordinate values indicate better performance. This may result in the model having a value of 0 for some of the indicators, which is normal. Figure 6 illustrates more intuitively that, following the model’s lightweight development, precision and recall decreased by different magnitudes. After analysis, it was discovered that a lightweight model somewhat lowered the network’s complexity, thus affecting the model’s ability to extract features. This was an unavoidable problem in the lightweight process. This study recognized only one sugarcane smut category, so the recall was less important for smut recognition. In contrast, the [email protected] of the model was more important. At the same time, this study aimed to develop a lightweight model. With the mean average precision (mAP) improved, a slight decrease in recall was negligible. The YOLOv5s-EfficientnetV2-CBAM-C3STR-WIOU (YOLOv5s-ECCW), marked by the green block in Figure 6, showed a more significant lead in these indicators. The improved model could maintain greater accuracy while being lightweight, and its performance was further enhanced.

3.4. Analysis of YOLOv5s-ECCW

The confusion matrix is a two-dimensional matrix that measures predicted results and actual situation, providing a clear and intuitive picture of the classification results during model training. Recall, F1 score, and precision can be calculated by counting prediction results between different categories. The confusion matrix (Figure 7) was an important tool used in this experiment to analyze the efficiency of the YOLOv5s-ECCW detection method. According to the confusion matrix analysis, the experiment included one category of smut. The YOLOv5s-ECCW model achieved 94% accuracy in classifying the smut, with a false-negative rate of 6%, which represented the rate of incorrectly predicting smut as the background. Figure 8 shows the P_curve, F1_curve, R_curve, and PR_curve during the YOLOv5s-ECCW model’s training process. These curves reflected the learning performance of the model for sugarcane smut. The F1_curve indicated the model’s accuracy in identifying sugarcane smut. The YOLOv5s-ECCW algorithm obtained an F1 value of 95% at a threshold of 0.496. The model’s accuracy and recall were inversely related. The connection between precision and recall was evident in the PR_curve. The performance of the model was represented by the area enclosed by the curve with the horizontal and vertical axes (AP value). Since there was just one detection category in this study, the AP value was the [email protected] value. When the enclosed area was larger, the model performed better at detecting sugarcane smut.
The YOLOv5 algorithm provided excellent visualization effects. After each training epoch, test_batch*_pred.jpg in the results could be used to see the predicted bounding box for each epoch. In this experiment, a batch size of 16 was used, so 16 images were captured at a time for training. Figure 9 illustrates the predicted results on the validation set during model training. According to Figure 9, it can be concluded that the model performed well on the validation set after training for multiple epochs. Specifically, the model learned more easily when the target object was larger and fewer targets were in the image. However, when the target object was smaller and there were too many targets in the image, the model found it challenging to learn, especially in the situation of fewer target features and high environmental interference. The YOLOv5s-ECCW model improved significantly across several training iterations, resulting in better accuracy metrics. Specifically, the model achieved a mAP of 96.4% and an F1 score of 95%.
The training trend of each loss function is shown in Figure 10. The convergence of the model can be evaluated using the trend of the loss functions. The model included four types of loss functions. As the number of training rounds increased, all loss functions gradually decreased and later tended to oscillate within a certain interval. If a loss function stabilizes, the model was likely to converge. As training epochs increased, the model’s loss functions steadily stabilized, indicating that the model had learned the optimal results. Furthermore, the loss function no longer decreased, indicating that the effect of model learning remained steady. The model achieved good convergence after 300 rounds of training. The confidence error obj_loss and regression error box_loss converged to approximately 0.035 and 0.044, respectively. The target category loss function cls_loss converged to 0. And the overall error total_loss converged to approximately 0.079. The model showed a strong ability to learn the features of sugarcane smut, as evidenced by the low values of its loss function.

3.5. Comparative Experiments of Different Algorithms

The efficacy of the YOLOv5s-ECCW lightweight model in detecting sugarcane smut was assessed by contrasting it with SSD, YOLOv4, YOLOv5, and YOLOv8. The outcomes are displayed in Table 6. The weights generated by the YOLOv4 and SSD network models were larger, failing to meet the need for lightweighting. They had the lowest [email protected], making them unsuitable for sugarcane smut detection. The YOLOv8 model generated the smallest weights and had a faster detection speed than the algorithm proposed in this study. But it was not as accurate as the improved algorithm in detecting sugarcane smut. The weights generated by the YOLOv5s-ECCW model were only 46.5% of the original YOLOv5 model, although the [email protected] increased by 0.2% in comparison with the base model. The model’s memory consumption could be effectively decreased by its lightweight operation. Although its detection speed was not the fastest, the improved model was capable of retaining higher accuracy when the FPS reached 74.1 on the GPU server. This suggests that the model meets real-time detection requirements.

3.6. Visual Analysis of YOLOv5s-ECCW

YOLOv5s-ECCW, YOLOv4, YOLOv5, and YOLOv8 were used to recognize sugarcane smut in the inference test images to confirm the actual identification effect of the enhanced model. Figure 11 displays the results of the recognition. YOLOv4, YOLOv5, and YOLOv8 exhibited poor detection results, detecting two, one, and one smut targets, respectively. However, YOLOv5s-ECCW correctly and comprehensively detected four smut targets (Figure 11A). In Figure 11B, YOLOv5s-ECCW correctly identified three smut targets. In Figure 11C, YOLOv4 and YOLOv5 were unable to detect the targets, YOLOv8 detected one but missed two smut targets, and YOLOv5s-ECCW detected two targets. In Figure 11D, YOLOv8 recognized two smut targets, one of which was a duplicate, while YOLOv5s-ECCW missed one smut target but detected three targets. In Figure 11E, YOLOv5, YOLOv8, and YOLOv5s-ECCW correctly and comprehensively detected three smut targets in the image, except for YOLOv4. Overall, YOLOv5s-ECCW can quickly and accurately detect smut targets in images.

4. Discussion

This research improved the YOLOv5 object detection algorithm by introducing a lightweight network, attention mechanisms, and a convolutional module and by optimizing the loss function. As a result, there was a 54% reduction in parameter, a 70.3% decrease in computational complexity, and a 53.5% reduction in memory usage. Meanwhile, our YOLOv5s-ECCW model achieved an average precision (mAP) of 97.8% and a processing speed of 74.1 frames per second on the sugarcane smut disease dataset. This demonstrates that the model can quickly and accurately identify smut disease targets in natural environments. Currently, studies on using YOLOv5 for automatic identification of sugarcane smut disease are limited. Most researchers have used the YOLO object detection method to identify other aspects of sugarcane production, such as sugarcane stem node detection [1,25,26,27], sugarcane aphid detection [28] and sugarcane billet classification [29]. Therefore, this study bridges a gap in identifying sugarcane smut with the YOLO object detection algorithm. Additionally, this achievement not only provides agricultural researchers with an effective tool to evaluate the resistance of sugarcane varieties but also lays a solid foundation for future applications in real-time agricultural disease detection, thereby promoting further development in related research.

5. Conclusions

In the process of selecting sugarcane varieties for resistance to smut disease, the traditional method for the manual identification of sugarcane smut is simple but inefficient. Therefore, exploring an efficient and accurate method for sugarcane smut identification is particularly important. This study proposed YOLOv5s-ECCW, a lightweight sugarcane smut detection model, to identify sugarcane smut in natural environments. YOLOv5s-ECCW is based on the YOLOv5 framework, with EfficientNetV2 as the backbone network. CBAM is added to the backbone network, the C3STR module replaces the C3 module, and the WIoU loss function replaces the CIoU loss function. After optimizing various parameters, YOLOv5s-ECCW achieves a detection accuracy of 97.8% with 4.9 GFLOPs in the identification of sugarcane smut. Compared with existing models, YOLOv5s-ECCW exhibits superior accuracy and computational efficiency. In short, the proposed YOLOv5s-ECCW model efficiently identifies sugarcane smut disease with low energy consumption.
The use of RGB images in this study was justified by simplicity and a lower cost of acquiring visible light images. Utilizing spectral bands such as infrared may offer sufficient information to differentiate between diseases in certain situations. For example, [30] used hyperspectral imaging and deep convolutional networks to detect smut symptoms before they appeared. However, this approach may greatly raise the expenses of image acquisition. And most mobile communication devices cannot capture images beyond the visible range, which limits their application for crop disease identification. This study selected RGB cameras combined with deep learning for sugarcane smut detection since they were less expensive and easier to use.
The results of this study showed that the pre-improved YOLOv5 and improved YOLOv5s-ECCW models outperformed SSD [31,32], YOLOv4 [33], and YOLOv8 [34] in detecting sugarcane smut. The same conclusion has been gained by many researchers for different diseases of different crops. Ref. [35] utilized the YOLOv5 model to detect bacterial spot disease in bell peppers, achieving a mean average precision of 90.7%. Ref. [36] suggested a lightweight YOLOv5 model for real-time strawberry disease detection, with a mean average precision of 94.7%. Refs. [37,38] identified apple leaf diseases using an improved YOLOv5 model. The final average accuracies were 7.9% and 8.25% higher than those of the original YOLOv5. Ref. [39] developed a lightweight YOLOv5 model to detect wheat fusarium head blight (FHB). On the wheat FHB dataset, the provided model achieved a mean average precision (mAP) of 97.15%.
The focus of this research was to design a model that can efficiently identify smut disease targets in natural environments with low energy consumption. This will help agricultural researchers quickly assess the disease resistance of sugarcane varieties and accelerate the efficiency of disease-resistant breeding. However, the application of this model is not limited to this purpose. When combined with mobile devices, the model could enable real-time detection of sugarcane smut disease, thereby assisting farmers in promptly identifying the condition and implementing control measures. In the future, we will keep improving the model to realize better detection performance. In addition, we intend to deploy the model on mobile devices for real-time sugarcane smut detection in natural environments. This will make it easier for farmers and agricultural experts to recognize sugarcane smut. This mobile-deployed model will improve agricultural production efficiency while also reducing the impact of the disease on crop yield and quality.

Author Contributions

Conceptualization, M.Y. (Min Yu), Z.W. and M.Y. (Meixin Yan); methodology, M.Y. (Min Yu), F.L., X.S., Z.W., J.L. and W.H.; software, M.Y. (Min Yu), Q.H. and H.F.; validation, M.Y. (Min Yu), X.S., X.Z. (Xia Zhou) and Z.W.; formal analysis, M.Y. (Min Yu) and X.Z. (Xia Zhou); investigation, J.L., Q.H., W.H., H.H., X.C., Y.Y., D.H., Q.L. and M.Y. (Meixin Yan); resources, M.Y. (Meixin Yan); data curation, M.Y. (Min Yu), X.Z. (Xia Zhou), J.L., Q.H., W.H., H.H., X.C., Y.Y., D.H. and Q.L.; writing—original draft, M.Y. (Min Yu); Writing—review and editing, F.L., X.Z. (Xiaoqiu Zhang), G.Z., H.F. and M.Y. (Meixin Yan); visualization, F.L., X.S., X.Z. (Xia Zhou), Z.W. and Y.Y.; supervision, X.Z. (Xia Zhou), Z.W. and G.Z.; project administration, X.S., J.L., Q.H., W.H. and H.H.; funding acquisition, M.Y. (Meixin Yan). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology Major Project of Guangxi (Guike AA22117004, Guike AA22117002) and Fund of GXAAS (grant number 2021YT007).

Data Availability Statement

Data are available on request from the authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, D.; Su, R.; Xiong, Y.; Wang, Y.; Wang, W. Sugarcane-Seed-Cutting System Based on Machine Vision in Pre-Seed Mode. Sensors 2022, 22, 8430. [Google Scholar] [CrossRef] [PubMed]
  2. Yan, M.; Zhu, G.; Lin, S.; Xian, X.; Chang, C.; Xi, P.; Shen, W.; Huang, W.; Cai, E.; Jiang, Z.; et al. The Mating-Type Locus b of the Sugarcane Smut Sporisorium scitamineum Is Essential for Mating, Filamentous Growth and Pathogenicity. Fungal Genet. Biol. 2015, 86, 1–8. [Google Scholar] [CrossRef] [PubMed]
  3. Yan, M.; Cai, E.; Zhou, J.; Chang, C.; Xi, P.; Shen, W.; Li, L.; Jiang, Z.; Deng, Y.; Zhang, L. A Dual-Color Imaging System for Sugarcane Smut Fungus Sporisorium scitamineum. Plant Dis. 2016, 100, 2357–2362. [Google Scholar] [CrossRef]
  4. Barbedo, J.G.A.; Koenigkan, L.V.; Santos, T.T. Identifying Multiple Plant Diseases Using Digital Image Processing. Biosyst. Eng. 2016, 147, 104–116. [Google Scholar] [CrossRef]
  5. Padilla, D.A.; Magwili, G.V.; Marohom, A.L.A.; Co, C.M.G.; Gaño, J.C.C.; Tuazon, J.M.U. Portable Yellow Spot Disease Identifier on Sugarcane Leaf via Image Processing Using Support Vector Machine. In Proceedings of the 5th International Conference on Control, Automation and Robotics (ICCAR), Beijing, China, 19–22 April 2019; pp. 901–905. [Google Scholar]
  6. Thilagavathi, K.; Kavitha, K.; Praba, R.D.; Arina, S.V.; Sahana, R.C. Detection of Diseases in Sugarcane Using Image Processing Techniques. Biosci. Biotechnol. Res. Commun. 2020, 13, 109–115. [Google Scholar] [CrossRef]
  7. Tamilvizhi, T.; Surendran, R.; Anbazhagan, K.; Rajkumar, K. Quantum Behaved Particle Swarm Optimization-Based Deep Transfer Learning Model for Sugarcane Leaf Disease Detection and Classification. Math. Probl. Eng. 2022, 2022, 3452413. [Google Scholar] [CrossRef]
  8. Aruna, R.; Devi, M.S.; Anand, A.; Dutta, U.; Sagar, C.N.S. Inception Nesterov Momentum Adam L2 Regularized Learning Rate CNN for Sugarcane Disease Classification. In Proceedings of the 2023 Third International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT), Bhilai, India, 5–6 January 2023; pp. 1–4. [Google Scholar]
  9. Kumar, P.A.; Nandhini, D.; Amutha, S.; Syed Ibrahim, S.P. Detection and Identification of Healthy and Unhealthy Sugarcane Leaves Using Convolutional Neural Network System. Sādhanā 2023, 48, 251. [Google Scholar] [CrossRef]
  10. Öğrekçi, S.; Ünal, Y.; Dudak, M.N. A Comparative Study of Vision Transformers and Convolutional Neural Networks: Sugarcane Leaf Diseases Identification. Eur. Food Res. Technol. 2023, 249, 1833–1843. [Google Scholar] [CrossRef]
  11. Li, X.; Li, X.; Zhang, S.; Zhang, G.; Zhang, M.; Shang, H. SLViT: Shuffle-Convolution-Based Lightweight Vision Transformer for Effective Diagnosis of Sugarcane Leaf Diseases. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 101401. [Google Scholar] [CrossRef]
  12. Kukreja, V.; Bordoloi, D.; Mehta, S.; Choudhary, A. The Future of Crop Health: CNN-Based Smut Disease Detection in Sugarcane. In Proceedings of the 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), Gwalior, India, 14–16 March 2024; pp. 1–5. [Google Scholar]
  13. Kumpala, I.; Wichapha, N.; Prasomsab, P. Sugar Cane Red Stripe Disease Detection Using YOLO CNN Deep Learning Technique. Eng. Access 2022, 8, 192–197. [Google Scholar]
  14. Amarasingam, N.; Gonzalez, F.; Salgadoe, A.S.A.; Sandino, J.; Powell, K. Detection of White Leaf Disease in Sugarcane Crops Using UAV-Derived RGB Imagery with Existing Deep Learning Models. Remote Sens. 2022, 14, 6137. [Google Scholar] [CrossRef]
  15. Wu, X.H.; Li, X.; Kong, S.; Zhao, Y.; Peng, L. Application of EfficientNetV2 and YoloV5 for Tomato Leaf Disease Identification. In Proceedings of the 2022 Asia Conference on Algorithms, Computing and Machine Learning (CACML), Hangzhou, China, 25–27 March 2022; pp. 150–158. [Google Scholar]
  16. Qin, Y.; Kou, Z.; Han, C.; Wang, Y. Intelligent Gangue Sorting System Based on Dual-Energy X-ray and Improved YOLOv5 Algorithm. Appl. Sci. 2023, 14, 98. [Google Scholar] [CrossRef]
  17. Zhang, L.; Fan, J.; Qiu, Y.; Jiang, Z.; Hu, Q.; Xing, B.; Xu, J. Marine Zoobenthos Recognition Algorithm Based on Improved Lightweight YOLOv5. Ecol. Inform. 2024, 80, 102467. [Google Scholar] [CrossRef]
  18. Yin, T.; Chen, W.; Liu, B.; Li, C.; Du, L. Light “You Only Look Once”: An Improved Lightweight Vehicle-Detection Model for Intelligent Vehicles under Dark Conditions. Mathematics 2023, 12, 124. [Google Scholar] [CrossRef]
  19. Gao, J.; Zhang, J.; Zhang, F.; Gao, J. LACTA: A Lightweight and Accurate Algorithm for Cherry Tomato Detection in Unstructured Environments. Expert Syst. Appl. 2024, 238, 122073. [Google Scholar] [CrossRef]
  20. Chen, L.; Yao, H.; Fu, J.; Ng, C.T. The Classification and Localization of Crack Using Lightweight Convolutional Neural Network with CBAM. Eng. Struct. 2023, 275, 115291. [Google Scholar] [CrossRef]
  21. Jiang, Y.; Yang, K.; Zhu, J.; Qin, L. YOLO-Rlepose: Improved YOLO Based on Swin Transformer and Rle-Oks Loss for Multi-Person Pose Estimation. Electronics 2024, 13, 563. [Google Scholar] [CrossRef]
  22. Yang, W.; Wu, H.; Tang, C.; Lv, J. ST-CA YOLOv5: Improved YOLOv5 Based on Swin Transformer and Coordinate Attention for Surface Defect Detection. In Proceedings of the 2023 International Joint Conference on Neural Networks (IJCNN), Gold Coast, Australia, 18–23 June 2023; pp. 1–8. [Google Scholar]
  23. Zhao, Q.; Wei, H.; Zhai, X. Improving Tire Specification Character Recognition in the YOLOv5 Network. Appl. Sci. 2023, 13, 7310. [Google Scholar] [CrossRef]
  24. Wang, W.; Chen, J.; Huang, Z.; Yuan, H.; Li, P.; Jiang, X.; Wang, X.; Zhong, C.; Lin, Q. Improved YOLOv7-Based Algorithm for Detecting Foreign Objects on the Roof of a Subway Vehicle. Sensors 2023, 23, 9440. [Google Scholar] [CrossRef]
  25. Chen, W.; Ju, C.; Li, Y.; Hu, S.; Qiao, X. Sugarcane Stem Node Recognition in Field by Deep Learning Combining Data Expansion. Appl. Sci. 2021, 11, 8663. [Google Scholar] [CrossRef]
  26. Day, C.; Busch, A. Automatic Detection of Sugarcane Billet Nodes and Eyes Using Machine Vision. In Proceedings of the 2023 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Port Macquarie, Australia, 28 November–1 December 2023; pp. 320–324. [Google Scholar]
  27. Yu, K.; Tang, G.; Chen, W.; Hu, S.; Li, Y.; Gong, H. MobileNet-YOLO v5s: An Improved Lightweight Method for Real-Time Detection of Sugarcane Stem Nodes in Complex Natural Environments. IEEE Access 2023, 11, 104070–104083. [Google Scholar] [CrossRef]
  28. Xu, W.; Xu, T.; Thomasson, J.A.; Chen, W.; Karthikeyan, R.; Tian, G.; Shi, Y.; Ji, C.; Su, Q. A Lightweight SSV2-YOLO Based Model for Detection of Sugarcane Aphids in Unstructured Natural Environments. Comput. Electron. Agric. 2023, 211, 107961. [Google Scholar] [CrossRef]
  29. Busch, A.; Dawson, Z.; Dedini, J.; Scott, J. Quality Classification and Segmentation of Sugarcane Billets Using Machine Vision. In Proceedings of the 2022 International Conference on Digital Image Computing: Techniques and Applications (DICTA), Sydney, Australia, 30 November–2 December 2022; pp. 1–7. [Google Scholar]
  30. Bao, D.; Zhou, J.; Bhuiyan, S.A.; Zia, A.; Ford, R.; Gao, Y. Early Detection of Sugarcane Smut Disease in Hyperspectral Images. In Proceedings of the 2021 36th International Conference on Image and Vision Computing New Zealand (IVCNZ), Tauranga, New Zealand, 9–10 December 2021; pp. 1–6. [Google Scholar]
  31. Ramcharan, A.; McCloskey, P.; Baranowski, K.; Mbilinyi, N.; Mrisho, L.; Ndalahwa, M.; Legg, J.; Hughes, D.P. A Mobile-Based Deep Learning Model for Cassava Disease Diagnosis. Front. Plant Sci. 2019, 10, 272. [Google Scholar] [CrossRef] [PubMed]
  32. Yin, Z.B.; Liu, F.Y.; Geng, H.; Xi, Y.J.; Zeng, D.B.; Si, C.J.; Shi, M.D. A High-Precision Jujube Disease Spot Detection Based on SSD During the Sorting Process. PLoS ONE 2024, 19, e0296314. [Google Scholar] [CrossRef] [PubMed]
  33. Aldakheel, E.A.; Zakariah, M.; Alabdalall, A.H. Detection and Identification of Plant Leaf Diseases Using YOLOv4. Front. Plant Sci. 2024, 15, 1355941. [Google Scholar] [CrossRef] [PubMed]
  34. Ye, R.; Shao, G.; He, Y.; Gao, Q.; Li, T. YOLOv8-RMDA: Lightweight YOLOv8 Network for Early Detection of Small Target Diseases in Tea. Sensors 2024, 24, 2896. [Google Scholar] [CrossRef]
  35. Mathew, M.P.; Mahesh, T.Y. Leaf-Based Disease Detection in Bell Pepper Plant Using YOLO v5. Signal Image Video Process. 2022, 16, 1–7. [Google Scholar] [CrossRef]
  36. Chen, S.; Liao, Y.; Lin, F.; Huang, B. An Improved Lightweight YOLOv5 Algorithm for Detecting Strawberry Diseases. IEEE Access 2023, 11, 12345–12356. [Google Scholar] [CrossRef]
  37. Xu, W.; Wang, R. ALAD-YOLO: A Lightweight and Accurate Detector for Apple Leaf Diseases. Front. Plant Sci. 2023, 14, 1204569. [Google Scholar] [CrossRef]
  38. Lv, M.; Su, W.H. YOLOV5-CBAM-C3TR: An Optimized Model Based on Transformer Module and Attention Mechanism for Apple Leaf Disease Detection. Front. Plant Sci. 2024, 14, 1323301. [Google Scholar] [CrossRef]
  39. Gao, C.; Guo, W.; Yang, C.; Gong, Z.; Yue, J.; Fu, Y.; Feng, H. A Fast and Lightweight Detection Model for Wheat Fusarium Head Blight Spikes in Natural Environments. Comput. Electron. Agric. 2024, 216, 108484. [Google Scholar] [CrossRef]
Figure 1. Focus operation diagram. A value was obtained at each pixel interval in an image to concentrate width and height information in the channel space. The input channels were expanded by 4, resulting in a spliced image with 12 channels instead of the original RGB 3-channel model.
Figure 1. Focus operation diagram. A value was obtained at each pixel interval in an image to concentrate width and height information in the channel space. The input channels were expanded by 4, resulting in a spliced image with 12 channels instead of the original RGB 3-channel model.
Agronomy 14 02327 g001
Figure 2. Structure of the CBAM module. The channel attention module and spatial attention module sequentially refine feature maps. ⨁ denotes the addition of two features. Agronomy 14 02327 i001 denotes the Sigmoid activation function. ⨂ denotes the multiplication of the input feature maps by the corresponding attention module.
Figure 2. Structure of the CBAM module. The channel attention module and spatial attention module sequentially refine feature maps. ⨁ denotes the addition of two features. Agronomy 14 02327 i001 denotes the Sigmoid activation function. ⨂ denotes the multiplication of the input feature maps by the corresponding attention module.
Agronomy 14 02327 g002
Figure 3. Structure of the C3 and C3STR module. (a) Original C3 module. (b) C3STR module. The C3STR module utilizes the Swin transformer (STR) to reduce the model’s parameters. The STR module is a pairwise combination of two different Swin transformer blocks (W-MSA, SW-MSA). The dashed box shows the detailed structure of the STR module.
Figure 3. Structure of the C3 and C3STR module. (a) Original C3 module. (b) C3STR module. The C3STR module utilizes the Swin transformer (STR) to reduce the model’s parameters. The STR module is a pairwise combination of two different Swin transformer blocks (W-MSA, SW-MSA). The dashed box shows the detailed structure of the STR module.
Agronomy 14 02327 g003
Figure 4. Overall structure of YOLOv5s-ECCW network. Backbone: EfficientnetV2, the convolutional block attention mechanism (CBAM module), and SPP. Neck: FPN + PAN with C3STR. Head: three detection heads detect small, medium, and large objects, respectively. Firstly, 640 × 640 RGB images are given as the input, then the image features are extracted and fused through backbone and neck. Finally, three detection heads with three different sizes are the output.
Figure 4. Overall structure of YOLOv5s-ECCW network. Backbone: EfficientnetV2, the convolutional block attention mechanism (CBAM module), and SPP. Neck: FPN + PAN with C3STR. Head: three detection heads detect small, medium, and large objects, respectively. Firstly, 640 × 640 RGB images are given as the input, then the image features are extracted and fused through backbone and neck. Finally, three detection heads with three different sizes are the output.
Agronomy 14 02327 g004
Figure 5. Comparison of mAPs of different backbone networks. The horizontal axis represents training epochs. The vertical axis represents the mean accuracy precision (mAP) at an IoU threshold of 0.5. The lines represent the continuous change in the mAP value as the number of training rounds increases.
Figure 5. Comparison of mAPs of different backbone networks. The horizontal axis represents training epochs. The vertical axis represents the mean accuracy precision (mAP) at an IoU threshold of 0.5. The lines represent the continuous change in the mAP value as the number of training rounds increases.
Agronomy 14 02327 g005
Figure 6. Detection performance of different improved models. This bar chart contains a variety of evaluation indicators to describe the detection performance of different improved models. The symbol ‘-’ in the figure indicates that the opposite number of the parameter is taken.
Figure 6. Detection performance of different improved models. This bar chart contains a variety of evaluation indicators to describe the detection performance of different improved models. The symbol ‘-’ in the figure indicates that the opposite number of the parameter is taken.
Agronomy 14 02327 g006
Figure 7. Confusion matrix. The horizontal axis represents the ground truth classes and the vertical axis represents the predicted classes. Each cell element represents the proportion of the number of the predicted class to the total number of the true class. The diagonal elements represent correctly classified outcomes. All other off-diagonal elements along a column are wrong predictions.
Figure 7. Confusion matrix. The horizontal axis represents the ground truth classes and the vertical axis represents the predicted classes. Each cell element represents the proportion of the number of the predicted class to the total number of the true class. The diagonal elements represent correctly classified outcomes. All other off-diagonal elements along a column are wrong predictions.
Agronomy 14 02327 g007
Figure 8. Confidence and accuracy metrics of the YOLOv5s-ECCW. (A) The F1 curve. The model had an F1 score of 0.95 in the training set. (B) The P-R curve. The [email protected] value of the training set was 0.964. (C) The P curve. The model’s accuracy consistently exceeded 80% when the confidence level reached 0.1 or higher. (D) The R curve. The recall remained high for confidence levels below 0.8 but gradually decreased beyond that threshold (0.8).
Figure 8. Confidence and accuracy metrics of the YOLOv5s-ECCW. (A) The F1 curve. The model had an F1 score of 0.95 in the training set. (B) The P-R curve. The [email protected] value of the training set was 0.964. (C) The P curve. The model’s accuracy consistently exceeded 80% when the confidence level reached 0.1 or higher. (D) The R curve. The recall remained high for confidence levels below 0.8 but gradually decreased beyond that threshold (0.8).
Agronomy 14 02327 g008
Figure 9. Validation set prediction results. This is an example of the prediction results on the validation set during model training. It contains 16 images randomly combined from the validation set. Each image contains the target prediction category with its confidence level.
Figure 9. Validation set prediction results. This is an example of the prediction results on the validation set during model training. It contains 16 images randomly combined from the validation set. Each image contains the target prediction category with its confidence level.
Agronomy 14 02327 g009
Figure 10. The trend of each loss function in the training process. This model has four loss functions. The box_loss denotes a regression error. The obj_loss represents a confidence error. The cls_loss represents a target category loss function. Total_loss represents the sum of the first three losses.
Figure 10. The trend of each loss function in the training process. This model has four loss functions. The box_loss denotes a regression error. The obj_loss represents a confidence error. The cls_loss represents a target category loss function. Total_loss represents the sum of the first three losses.
Agronomy 14 02327 g010
Figure 11. Recognition results. A visualization of the results of detecting images outside the dataset using four object detection algorithms (YOLOv4,YOLOv5,YOLOv8,YOLOv5s-ECCW). Each predicted bounding box shows the predicted label of the detected smut and the confidence of the predicted result. To make the prediction frames in images clearer, some of the detection images are cropped in size.
Figure 11. Recognition results. A visualization of the results of detecting images outside the dataset using four object detection algorithms (YOLOv4,YOLOv5,YOLOv8,YOLOv5s-ECCW). Each predicted bounding box shows the predicted label of the detected smut and the confidence of the predicted result. To make the prediction frames in images clearer, some of the detection images are cropped in size.
Agronomy 14 02327 g011
Table 1. Parameters of EfficientnetV2-s network structure.
Table 1. Parameters of EfficientnetV2-s network structure.
StageOperatorStrideChannelsLayers
0Conv3×32241
1Fused-MBConv1, k3×31242
2Fused-MBConv4, k3×32484
3Fused-MBConv4, k3×32644
4MBConv4, k3×3, SE0.2521286
5MBConv6, k3×3, SE0.2511609
6MBConv6, k3×3, SE0.25227215
7Conv1×1 & Pooling & FC-17921
Note: the numbers after MBConv and Fused MBConv denote the expanded multiples of the input feature channels; SE0.25 means that the first convolution operation uses the squeeze-and-excitation network (SE) module, and the number of nodes in the first fully connected layer is 0.25 times the number of feature matrix channels in MBConv.
Table 2. Software and hardware environment resource configuration.
Table 2. Software and hardware environment resource configuration.
ConfigurationParameter
Operating systemUbuntu20.04
CPUIntel(R) Core (TM) i5-8265U CPU @ 1.60 GHz
GPUNVIDIA GeForce RTX 3090 24 G
Accelerated environmentCUDA 11.3
LanguagePython 3.8
FrameworkPytorch 1.10.0
Table 3. Comparison of results from different backbone networks.
Table 3. Comparison of results from different backbone networks.
ModelsParameter
(M)
GFLOPsWeight Size
(MB)
Precision
(%)
Recall
(%)
[email protected]
(%)
YOLOv5s-
MobilenetV3
1.392.503.0095.9086.1095.00
YOLOv5s-
Swin Transformer
13.3746.7027.0098.1096.3097.90
YOLOv5s-
ShufflenetV2
0.841.901.9085.9091.7093.60
YOLOv5s-
EfficientnetV2
3.905.108.0097.7093.5097.50
Table 4. Comparison results of adding a C3STR module in different positions.
Table 4. Comparison results of adding a C3STR module in different positions.
ModelsParameter
(MB)
GFLOPsWeight Size
(MB)
Precision
(%)
Recall
(%)
[email protected]
(%)
middle-C3STR3.744.907.7096.5094.7097.00
all-C3STR3.054.706.3096.6091.9096.10
large-C3STR3.254.906.7097.0094.3097.80
Table 5. Ablation experiment.
Table 5. Ablation experiment.
YOLOv5sEfficientnetV2WIOUCBAMC3STRParameter
(M)
GFLOPsWeight
Size
(MB)
Precision
(%)
Recall
(%)
[email protected]
(%)
7.0616.5014.4097.4095.3097.60
3.905.108.0097.7093.5097.50
3.905.108.0097.2093.1097.70
3.915.108.0097.5095.3097.70
3.254.906.7097.0094.3097.80
Note: ”√” represents use this module, “−” represents do not use this module.
Table 6. Comparison of different detection algorithm models.
Table 6. Comparison of different detection algorithm models.
ModelsBackbone NetworkWeight Size[email protected]FPS
YOLOv4CSPDarkNet53244.2087.1034.10
YOLOv5CSPDarkNet5314.4097.6073.50
SSDVGG1690.6075.6054.90
YOLOv8Darknet-536.3097.3093.50
YOLOv5s-ECCW (Ours)EfficientnetV26.7097.8074.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, M.; Li, F.; Song, X.; Zhou, X.; Zhang, X.; Wang, Z.; Lei, J.; Huang, Q.; Zhu, G.; Huang, W.; et al. YOLOv5s-ECCW: A Lightweight Detection Model for Sugarcane Smut in Natural Environments. Agronomy 2024, 14, 2327. https://doi.org/10.3390/agronomy14102327

AMA Style

Yu M, Li F, Song X, Zhou X, Zhang X, Wang Z, Lei J, Huang Q, Zhu G, Huang W, et al. YOLOv5s-ECCW: A Lightweight Detection Model for Sugarcane Smut in Natural Environments. Agronomy. 2024; 14(10):2327. https://doi.org/10.3390/agronomy14102327

Chicago/Turabian Style

Yu, Min, Fengbing Li, Xiupeng Song, Xia Zhou, Xiaoqiu Zhang, Zeping Wang, Jingchao Lei, Qiting Huang, Guanghu Zhu, Weihua Huang, and et al. 2024. "YOLOv5s-ECCW: A Lightweight Detection Model for Sugarcane Smut in Natural Environments" Agronomy 14, no. 10: 2327. https://doi.org/10.3390/agronomy14102327

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop