1. Introduction
Maize is recognized as one of the crucial crops globally and is an essential raw material for both light and chemical industries [
1]. According to the Food and Agriculture Organization (FAO) of the United Nations, maize is cultivated in over 160 countries worldwide. Forecasts suggest that by 2025, the worldwide area dedicated to maize cultivation will approach 180 million hectares, surpassing 150 million tonnes of production. However, maize frequently suffers from various diseases during its growth. These diseases significantly diminish crop yields and directly threaten farmers’ economic interests. One of the main factors contributing to reduced maize yields is leaf diseases. Consequently, the rapid and accurate identification of leaf diseases is critical during maize growth. Traditionally, this diagnostic process usually involves hiring experts for on-site diagnosis or educating farmers to identify various diseases by hiring them to give lectures and then making their judgments based on what they have learned. This approach is time-consuming and labor-intensive, and it is prone to misdiagnosis due to human judgment errors, leading to potential crop losses.
Recently, deep learning technology in recognizing plant diseases has achieved significant advancements. This advancement offers new perspectives and methodologies for precision agriculture and crop protection. Convolutional neural networks (CNNs) have become a key technology in several fields because of their superior ability to process and parse complex visual data. The wide application of CNNs for image classification tasks provides a reliable basis for disease and pest recognition. Fang et al. [
2] introduced the Hard Coordinate Attention mechanism (HCA-MFFNet) method developed explicitly for maize leaf disease recognition, using hard coordinated attention (HCA) to extract features from various spatial scales and using depthwise separable convolutional layers to minimize the number of parameters. Ahila et al. [
3] used an improved LeNet for maize leaf disease classification, which recognizes three types of diseases and one health category. Zhang et al. [
4] proposed a tomato leaf disease recognition model utilizing the Asymptotic Non-Local Means algorithm (ANLM) and Multi-channel Automatic Orientation Recurrent Attention Network (M-AORANet), which solves the problem of noise interference and tomato leaf feature extraction. Zhang et al. [
5] proposed an improved GoogLeNet and Cifar10 model that can recognize eight types of maize leaf diseases. In practice, image classification tasks cannot locate lesion areas accurately, making object detection techniques an essential tool in agriculture to provide richer information and assist in enhancing crops’ quality and productivity.
Current classical object detection methods mainly include one-stage detection networks, such as You Only Look Once (YOLO) [
6] series, Single Shot MultiBox Detector (SSD) [
7] and RetinaNet [
8]; as well as two-stage detection networks, including Fast Region-based Convolutional Neural Networks (Fast R-CNN) [
9] and Faster R-CNN [
10]. Zhang et al. [
11] designed a multi-feature fusion Faster R-CNN (MF
3 R-CNN) to solve the problem of soybean leaf disease detection in complex scenarios. The two-stage model, which includes two separate stages for generating and detecting candidate frames, makes the overall architecture complex and usually exhibits poor real-time performance. Therefore, single-stage object detection algorithms are gradually becoming the mainstream choice in agriculture. Sun et al. [
12] introduced a Mobile End AppleNet-based SSD algorithm (MEAN-SSD) model designed specifically for apple leaf disease detection. Liu et al. [
13] enhanced the feature layers of the YOLOv3 model using image pyramids for multi-scale feature detection, enabling accurate and rapid detection of the location and type of diseases and pests in tomatoes. Li et al. [
14] proposed an improved YOLOv4 model incorporating depthwise convolution and a hybrid attention mechanism for detecting powdery mildew in strawberry leaves. Qi et al. [
15] enhanced the Squeeze Excitation (SE) module for the YOLOv5 model, achieving the extraction of critical features, and effectively detecting tomato virus diseases. The YOLO series is widely used in research and practice for plant disease detection, achieving timely and accurate detection and advancing the development of intelligent agricultural technologies. However, in the complex maize experimental field, with the evolution of the regional characteristics of maize leaf disease, the detection challenges increase, and the detection of maize leaf disease faces limitations [
16].
Currently, maize leaf disease detection tasks rely heavily on annotated data, which are time-consuming and labor-intensive to label. Semi-supervised learning offers a strategy to reduce dependence on extensive manually labeled datasets. It achieves this by merging a limited quantity of labeled data with a substantial volume of unlabeled data for model training. Semi-supervised learning has been widely used in technologies, including image classification [
17], speech recognition [
18], and natural language processing [
19]. It has also demonstrated its potential in practical application areas such as agriculture. This technique requires only a small amount of marker data for effective disease identification. Yang et al. [
20] utilized a combination of semi-supervised learning and image processing techniques to recognize young green tea leaves. Omidi et al. [
21] utilized a technique of semi-supervised clustering to classify whether or not a walnut tree was infected with symptoms. Current research efforts have focused on semi-supervised classification techniques; however, in the agricultural field, there is an urgent need for object detection techniques to recognize diseased areas on leaves. In recent research, Tseng et al. [
22] proposed a semi-supervised object detection method using wheat as an example. Although their research and this work are both focused on agriculture, the focus of this work is on maize leaf diseases.
Consequently, when analyzing the characteristics of existing object detection algorithms and maize leaf disease images, it was found that complex conditions such as sufficient lighting and shaded areas in the experimental field increase the difficulty of detection. This leads to misdetection due to the similarity of maize leaf diseases, thereby affecting the accuracy of maize leaf disease detection based on deep learning models. On the other hand, due to differences between maize leaf disease images and images from other domains, existing semi-supervised object detection models struggle to assign pseudo-labels for maize leaf disease accurately.
To address these issues, we propose a semi-supervised one-stage object detection framework for maize leaf disease. In this framework, the WAP strategy and AgroYOLO detector are proposed. By leveraging a large amount of unlabeled data, the WAP strategy accurately assigns pseudo-labels generated by a teacher model, thereby improving the quality of pseudo-labels for the input student model. Additionally, AgroYOLO detector further enhances the detection accuracy of maize leaf diseases. To the best of our knowledge, the semi-supervised object detection algorithm has not been applied to the task of maize leaf disease detection. In plant disease detection, semi-supervised object detection technology is still nascent, requiring further research and exploration. In this context, this research fills the gaps in existing technologies and contributes new insights and methods to this field. The main contributions of this work are summarized as follows:
- (1)
Agronomic Teacher is designed, a semi-supervised one-stage object detection framework for maize leaf disease, reducing the dependency on extensive labeled data.
- (2)
The WAP strategy is proposed to enhance the reliability of pseudo-labels assignment based on objectness scores and classification scores from the teacher model.
- (3)
AgroYOLO detector is developed, which combines Agro-Backbone network to adequately extract detailed features of leaf diseases, and Agro-Neck network to enhance the fusion multi-scale maize leaf disease features capability.
- (4)
The experimental results demonstrate that Agronomic Teacher outperforms other supervised and semi-supervised object detection algorithms on the MaizeData and PascalVOC datasets.
In
Section 1, the motivation and objectives are delineated.
Section 2 details the samples used in this study and Agronomic Teacher.
Section 3 provides experimental results and discussion. Finally,
Section 4 presents the research conclusions and future studies.
3. Results and Discussion
3.1. Model Evaluation Indicators
Key performance indicators such as precision, recall, and mean average precision (mAP) are used to evaluate the proposed semi-supervised maize leaf disease detection model comprehensively. Additionally, metrics, including the number of model parameters, floating point operations (FLOPs), and frames per second (FPS), are introduced to assess the model’s practicality in the experimental field. These metrics evaluate the model’s processing speed, resource consumption, and computational efficiency, providing a crucial basis for further optimization.
measures the accuracy of a detector’s optimistic predictions relative to all the samples it predicted as positive. Recall measures the model’s ability to identify positive samples correctly. It represents the proportion of actual positive samples correctly identified as positive. Precision and recall can be described in the following manner:
In Equations (
11) and (
12), true positives (
) refer to the number of instances in which the model correctly identifies positive samples as positive. False positives (
) are the number of instances where the model incorrectly predicts negative samples as positive. False negatives (
) represent the number of positive samples mistakenly judged as negative by the model.
Average precision (
) is obtained by calculating the area under the precision–recall curve. AP is calculated as follows:
mAP calculates the area under the precision–recall curve for each category and averages it over all categories. Equation (
14) can be used to determine mAP.
mAP (0.5) is a metric for evaluating the average precision of a model, with the intersection over union (IoU) threshold set at 0.5. The IoU is determined as described in Equation (
15).
represents the overlapping region between the predicted and actual bounding boxes.
denotes the combined area of the predicted and actual bounding boxes. This implies that a prediction is considered accurate only if the overlapping area
is greater than 50% of the combined area
.
FPS refers to the time required by the algorithm to process each frame of the image. The number of model parameters refers to the total number of parameters constituting a machine learning model. FLOPs represent the total number of floating-point operations needed to complete a certain task.
3.2. Preparation for Experiments
The experimental setup in this research is conducted within the Ubuntu 20.04 operating system environment, utilizing the Agronomic Teacher model trained on the PyTorch 1.9.0 framework. The CUDA version employed is 11.1, and Python version 3.8.13 is utilized for scripting. The relevant hyperparameter settings for this experiment are shown in
Table 3.
3.3. Version Selection of YOLO Series
A comprehensive evaluation was conducted on the performance of YOLO series models on the 30% annotated PascalVOC dataset, including different scales and versions of models such as YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, YOLOv7l, and YOLOv8s.
Table 4 displays their specific details. In the YOLOv5 series, YOLOv5n has the lowest number of parameters and FLOPs, indicating that it is the minimum in model size and computational complexity. Due to its simpler structure, it has an advantage in processing speed, exhibiting the highest FPS. However, the trade-off for this design is its lowest mAP (0.5), indicating that while it maintains high-speed detection, its accuracy is relatively lower. Conversely, the YOLOv5x model has the highest number of parameters and the largest FLOPs, meaning it possesses more robust capabilities in understanding and processing images. The mAP (0.5) value of YOLOv5x is the highest, reaching 53%. However, this high accuracy comes at the cost of higher computational expenses and more parameters, resulting in a comparatively lower FPS. While YOLOv7l and YOLOv8s show higher mAP (0.5) values, YOLOv7l has a relatively high number of parameters and FLOPs, and YOLOv8s has 68.97 FPS. This high parameter and computational complexity mean that these models require more computational resources and processing time, leading to a relatively lower number of frames processed per second.
Therefore, to achieve a balance between detection accuracy and processing speed, making it more suitable for rapid and efficient detection in the experimental field, YOLOv5s is chosen as the base model for the task of maize leaf diseases.
3.4. Experiment of Parameters Selection for the WAP Strategy
An analysis was conducted on the impact of the weight of the objectness scores,
, and the weight of the classification scores,
, on model performance within the WAP strategy. To ensure the fairness of the experiment, the adjustment of weights was set as the only variable to accurately assess the specific impact of different weight ratios on the model’s objective and classification score performance. The experiment was conducted on the MaizeData dataset, where the range of
decreased from 1.0 to 0.0, and, correspondingly,
increased from 0.0 to 1.0. The results are shown in
Figure 10.
The experimental results showed that when the weight combination was = 0.2 and = 0.8, the model achieved the best mAP (0.5) value at 42.3%. Specifically, the performance slightly improved when increased from 0.0 to 0.2 but began declining as it increased to 0.4 and 0.6. The performance slightly recovered at = 0.8 and = 1.0 but did not exceed the highest value. This result emphasizes the importance of finding the right balance between the weights of the objectness scores and classification scores. Excessively high or low objectness scores and classification scores weights could affect the model’s performance. Therefore, is set to 0.2 in the WAP strategy and to 0.8.
3.5. Experimental Comparison before and after Model Improvement
YOLOv5s and Agronomic Teacher are trained on a 30% labeled training set and evaluated on a testing set.
Figure 11 and
Figure 12 display the original manually annotated images and the detection results before and after algorithm improvement on the testing set.
Figure 11 illustrates that under complex conditions of shadow and intense light, YOLOv5s fails to capture maize disease information accurately and may even fail to recognize the disease. Meanwhile, as seen in
Figure 12, YOLOv5s is observed to detect errors when identifying similar lesions, including missed detections or misclassifications of disease types. However, Agronomic Teacher, by integrating the WAP strategy and the AgroYOLO detector, effectively mitigates these issues, making the detection results more closely aligned with human annotations.
Specifically speaking, the proposed model utilizes 30% annotated data and leverages 70% unlabeled data, thus expanding the training dataset. By employing the WAP strategy to allocate pseudo-labels precisely, the model can better learn the diversity and complexity of samples. Utilizing more unlabeled data helps the AgroYOLO detector better understand these complex variations and more effectively captures the features of local changes, thereby enhancing detection accuracy. Furthermore, Agro-Backbone is employed for feature extraction in the Agronomic Teacher algorithm. This backbone integrates the SPPSEDC module and CC2f module, enabling it to capture rich semantic information from maize leaf disease images. Additionally, Agro-Neck enhances the capability of multi-scale feature fusion using the GD mechanism and CC2f module, finely combining semantic-rich deep features with high-resolution shallow features. This feature fusion strategy effectively retains details frequently lost in deep network layers, achieving precise detection of disease spots.
Consequently, Agronomic Teacher demonstrates higher accuracy and reliability in maize leaf disease detection, which is of significant value for precise disease monitoring and management.
3.6. Ablation Experiment Results
To verify whether all the improvements of the Agronomic Teacher enhance the model’s performance, a series of ablation experiments were conducted on two datasets: the 30% labeled MaizeData and the 30% labeled PascalVOC datasets. Each of the individual and combined innovation is incorporated into the baseline model. The results of the ablation experiments can be seen in
Table 5.
On the 30% labeled Maizedata, when using the WAP strategy alone, compared to the baseline model, the precision and mAP (0.5) improve by 1.8% and 1.6%. However, recall decreases slightly by 0.5%. This may be due to insufficient training on unlabeled data, causing the model to miss some targets, thus affecting recall. After adopting Agro-Backbone network, precision, recall, and mAP (0.5) improved by 4.8%, recall by 1.7%, and mAP (0.5) by 2.8%. The detection performance of the model is improved by employing the SPPSEDC and CC2f modules to efficiently extract vital high-level features from the input image, including lesions and texture on the leaf. Furthermore, using Agro-Neck network alone results in improvements of 2.0% in precision, 2.2% in recall, and 2.2% in mAP (0.5). In conjunction with the GD mechanism and the C2f module, the fusion of features extracted by the improved backbone network contributes to the model’s extraction of more comprehensive semantic information and contextual relationships. This further enhances the model’s ability to perform precise detection under varying lighting and angle conditions.
In an experimental study of ablation in two-by-two combinations for the WAP strategy, Agro-Backbone network, and Agro-Neck network, improvements in precision, recall, and mAP (0.5) across three sets of experiments are observed. Using the WAP strategy alone may increase model precision and mAP (0.5) but decrease recall. However, when combining the WAP strategy with Agro-Backbone network and Agro-Neck network, it maintains the level of precision and mAP (0.5) and also compensates for the decrease in recall. Compared to the baseline model, when using the WAP strategy and Agro-Backbone network, there are improvements of 0.7% in precision, 5.8% in recall, and 5.0% in mAP (0.5). The WAP strategy provides a robust mechanism for pseudo-labels assignment by addressing the relative importance of objectness scores and classification scores. Agro-Backbone effectively assists the model in extracting critical features from images. Combining the two fully leverages the advantages of unlabeled data, demonstrating enhanced feature extraction capabilities and improved recognition of targets. When combining the WAP strategy with Agro-Neck network, compared to the YOLOv5s model, the precision increased by 0.3%, the recall increased by 3.1%, and the mAP (0.5) improved by 3.4%. This combination fully leverages the advantages of unlabeled data in the WAP strategy and the feature fusion capability of Agro-Neck, enhancing the model’s detection performance. When using a combination of Agro-Backbone network and Agro-Neck network, there is an improvement of 5.9% in precision, 0.6% in recall, and 3.1% in mAP (0.5). The advantages of Agro-Backbone network in feature extraction, as well as the advantages of Agro-Neck network in feature fusion, are fully utilized to improve the detection performance of the model. The simultaneous use of the WAP strategy, Agro-Backbone network, and Agro-Neck network has the advantage that they can better meet the requirements of the maize experimental field to ensure high-precision disease identification and ultimately improve crop quality and yield. The cumulative enhancements in precision increased by 3.8%, in recall improved by 3.8%, and in mAP (0.5) increased by 6.5%. In the context of maize leaf disease detection, all proposed methods employed in this work have yielded positive outcomes. As observed from
Table 5, the results of the ablation experiment conducted on the PascalVOC dataset exhibit a similar trend to that of the MaizeData dataset. This further validates the effectiveness of these proposed methods in practical applications.
3.7. Comparison Experiment Results
We conducted a series of comparative experiments to validate the effectiveness of Agronomic Teacher. These supervised experiments were conducted on the MaizeData and PascalVOC datasets, utilizing different annotation ratios, including 10%, 15%, 20%, 25% and 30%. Conversely, the semi-supervised experiments entirely used the remaining 90%, 85%, 80%, 75%, and 70% of unlabeled data. The supervised experiments’ results are presented in the following
Table 6 and
Table 7. The results of semi-supervised comparative experiments are depicted in
Figure 13 and
Figure 14.
3.7.1. Comparison Experiment Results in Supervised Learning
On the MaizeData dataset, it is observed that the performance of the Agronomic Teacher algorithm outperformed the performance of supervised algorithms such as YOLOv5s, YOLOv8s, YOLOv7l, and Gold-YOLO-s at different annotation ratios when the mAP threshold was set to 0.5. Specifically, taking the example of a MaizeData dataset annotated with 30% commentary, as shown in
Table 7, compared to Agronomic Teacher, YOLOv5s exhibits lower parameter count and FLOPs. However, the proposed technique outperforms this algorithm by 6.5% in mAP (0.5). Despite YOLOv5x having five times the number of parameters and seven times the FLOPs compared to ours, it is noteworthy that mAP (0.5) improved by 2.7%. This further emphasizes the outstanding performance of the proposed approach in maize leaf disease detection. In contrast, compared to the proposed technique, YOLOv7l boasts 3.8 times the FLOPs, while the proposed method’s parameter count is only half of YOLOv7l. Excitingly, the proposed method’s mAP (0.5) saw a significant increase of 5.8%, indicating an impressive balance between model accuracy and computational efficiency in the proposed approach. Despite having similar parameters to Agronomic Teacher, the Gold-YOLO-s model has approximately 1.6 times the FLOPs of the proposed method. However, under the mAP (0.5) condition, Agronomic Teacher achieves a remarkable performance improvement of 4.9%. Compared to YOLOv8s, the proposed algorithm exhibits slightly lower FLOPs with a marginally higher parameter count. It notably outperforms YOLOv8s in mAP (0.5), improving by 4.2%.
In comparison with models such as YOLOv5s, YOLOv7l, Gold-YOLO-s, and YOLOv8s, the comprehensive analysis of annotated datasets containing 10%, 15%, 20%, and 25% of the data, as shown in
Table 6, revealed that Agronomic Teacher consistently exhibits superior performance across various annotation ratios. Specifically, in the 10% annotated dataset, Agronomic Teacher achieved performance improvements of 1.3%, 2.8%, 0.4%, and 0.5% over the above models, respectively. In the 15% dataset, these increments were 4.4%, 4.7%, 3.8%, and 2.3%, respectively. For the 20% dataset, the improvements were 5.3%, 4.5%, 4.4%, and 2.3%, respectively. Finally, in the 25% dataset, increases of 6.4%, 7.3%, 5.8%, and 4.9% were observed. In comparison with YOLOv5x, on the MaizeData dataset with annotation rates of 10% and 15%, YOLOv5x achieved mAP (0.5), surpassing proposed algorithm by 0.3% and 2.9%, respectively. However, as the annotation rate increased to 20%, 25%, and 30%, the proposed algorithm exhibited its advantages, outperforming YOLOv5x by margins of 1.5%, 1.7%, and 2.7%, respectively.
In order to demonstrate the applicability of the proposed method, it was compared with other supervised algorithms on the widely recognized PascalVOC dataset, and a comparative detailed analysis was conducted. The experimental results are presented in
Table 6 and
Table 7. Below is provided a detailed explanation of the experiments conducted on the PascalVOC dataset with a 30% annotation rate, as shown in
Table 7. Compared to YOLOv5s, the proposed model demonstrated substantial improvements, achieving a 6.9% increase in precision, a 6.8% boost in recall, and an impressive 8.2% enhancement in mAP (0.5). However, it is noteworthy that Agronomic Teacher has relatively higher parameter and FLOPs, approximately 2.4 times and 1.7 times, respectively. This indicates that while there have been significant performance improvements, it also demands more computational resources. Compared with the proposed model, the YOLOv5x model’s parameter count and floating-point model size are roughly 5 times and 10 times greater, respectively. Despite this, Agronomic Teacher significantly enhanced recall and mAP (0.5). Specifically, the proposed model improved recall by 2.5% and mAP (0.5) by 0.6%. It is important to note that although there was a slight decrease in precision by 3.5%, overall, the proposed model achieved a good balance between performance and resource consumption. Compared to YOLOv7l, which has approximately five times and two times the FLOPs and parameter count of the proposed model, respectively, the proposed model significantly reduces the demand for computational resources while maintaining a similar level of performance. Compared to existing detection methods, it demonstrates advantages in achieving efficient and high-performance object detection while exhibiting higher detection accuracy. Compared to the recently developed YOLOv8s, Agronomic Teacher possesses a marginally higher parameter count and slightly lower FLOPs. It demonstrates a notable enhancement in performance metrics, with a 2.6% increase in precision and a 2.9% rise in recall. Furthermore, there is a significant 2.5% improvement in mAP (0.5).
On the PascalVOC dataset annotated at 10%, the proposed model demonstrated a significant improvement in mAP (0.5) compared to existing models, as shown in
Table 6. Specifically, there was an increase of 2.4% compared to YOLOv5s and a 2.2% increase relative to Gold-YOLO-s. However, compared to YOLOv8s, performance was slightly decreased, with a reduction of 4.7% in mAP (0.5). However, in the 10% annotated subset of the PascalVOC dataset, the proposed model’s performance slightly declined compared to YOLOv8s, with a decrease in mAP (0.5) by 4.7%. This performance difference stems from the limitation in the dataset scale. Despite the PascalVOC dataset containing more categories, the relatively small 10% data volume resulted in a smaller dataset scale. Additionally, the random partitioning of training, testing, and validation sets may introduce class imbalance issues, particularly pronounced in scenarios with limited data. Consequently, the proposed model might not have fully captured the characteristics of each category within the PascalVOC dataset, leading to its inferior performance compared to YOLOv8s on the 10% PascalVOC dataset. Compared to high parameter and high FLOPs models like YOLOv5 and YOLOv7l, the proposed model does not exhibit an advantage in terms of accuracy. Specifically, it falls short in mAP (0.5) by 2.9% compared to YOLOv5x and is 4.7% lower than YOLOv7l. Our proposed algorithm demonstrated considerable advantages on the PascalVOC dataset annotated at 15%. It showed a significant improvement in mAP (0.5) compared to YOLOv5s, YOLOv5x, YOLOv7l, Gold-YOLO-s, and YOLOv8 models. Specifically, the increases in mAP (0.5) were 8.3%, 3.0%, 0.2%, 9.0%, and 2.7%, respectively, compared to these models. On Pascal VOC datasets annotated at 20% and 25%, the proposed model demonstrated a significant performance improvement compared to YOLOv5s, YOLOv5x, Gold-YOLO-s, and YOLOv8. Specifically, on the dataset annotated at 20%, there was an increase in mAP (0.5) by 7.2%, 2.8%, 8.7%, and 0.5% compared to these models, respectively. On the dataset annotated at 25%, the increases in mAP (0.5) were 8.9%, 1.8%, 8.4%, and 2.3%, respectively. YOLOv7l possesses twice the parameter count and 3.8 times the FLOPs compared to the proposed model. However, the proposed model does not achieve higher accuracy regarding mAP (0.5). Specifically, it lags by 4.1% and 2.3% on the PascalVOC datasets annotated at 20% and 25% ratios, respectively.
3.7.2. Comparison Experiment Results in Semi-Supervised Learning
A detailed comparison and analysis were conducted between Agronomic Teacher and Efficient Teacher [
25] to investigate further and validate the proposed method’s effectiveness. Efficient Teacher is also a one-stage semi-supervised object detection model. Regarding pseudo-label allocation, the Pseudo Label Assigner (PLA) assigns the pseudo-labels generated by the dense detector. For pseudo-labels with high classification scores, only the objectness loss is calculated. The PLA computes the regression loss for those with high objectness scores when the objectness score exceeds 0.99. Extensive experiments were conducted on multiple maize leaf disease detection datasets to assess its performance and adaptability. We tested it on various public datasets to ensure the effectiveness of the proposed method under different environments and conditions.
The results of the comparison experiments on the MaizeData dataset are shown in
Figure 13. Our model outperformed the Efficient Teacher model by 0.4%, 1.7%, 2.8%, 5.6%, and 6.7% in terms of mAP (0.5) across datasets with annotation ratios of 10%, 15%, 20%, 25%, and 30%, respectively. This progress is primarily attributed to implementing the WAP strategy, which judiciously allocates different weight ratios to objectness scores and classification scores, resulting in the generation of more accurate pseudo-labels. Additionally, a specially designed Agro-Backbone network has demonstrated enhanced capabilities in capturing the subtle features of maize leaf diseases. Complementing this, Agro-Neck network effectively integrates features across various scales. These improvements demonstrate the effectiveness of the proposed approach for semi-supervised object detection, especially in demonstrating the superior ability of the proposed model to identify and detect maize leaf diseases accurately. Similarly, for the PascalVOC dataset, the experiment results are shown in
Figure 14. The proposed methods also improved over Efficient Teacher in mAP (0.5) by 1.8%, 4.2%, 4.4%, 1.2%, and 2.4%. This further proves its generalizability and efficiency in handling various image detection tasks.
4. Conclusions
This paper proposes a semi-supervised object detection method based on a single-stage detector, which effectively utilizes limited labeled data and abundant unlabeled data to improve maize leaf disease recognition accuracy. For a large amount of unlabeled data, the proposed WAP strategy accurately and reasonably allocates pseudo-labels generated by the teacher model. This strategy fully utilizes the weighted objectivity and classification scores, effectively allocating pseudo-labels for maize leaf disease. Additionally, the proposed AgroYOLO detector further improves detection performance. In Agro-Backbone network of this detector, the proposed SPPSEDC module replaces the SPPF module and the CC2f module to enhance the feature extraction ability of local lesion information. In Agro-Neck network of this detector, the GD mechanism is utilized instead of the traditional neck network, combined with the C2f module to improve feature fusion capability, thereby enhancing maize leaf disease detection accuracy. The experimental results show that Agronomic Teacher improves the mAP (0.5) metrics by 1.3%, 4.4%, 5.3%, 6.4%, and 6.5% on the MaizeData dataset, respectively, compared to the baseline model. Agronomic Teacher on the PascalVOC dataset improves the mAP (0.5) metric by 2.4%, 8.3%, 7.2%, 8.9%, and 8.2%, respectively, compared to the baseline model. The experimental results demonstrate that the proposed algorithm effectively improves the performance of maize leaf disease detection by utilizing a large amount of unlabeled data with some generalization.
Although the proposed methods effectively leverage abundant unlabeled data in scenarios where labeled data are limited, providing advantages over other supervised and semi-supervised detection algorithms, there is still substantial untapped potential to explore. The future aim is to integrate this method with practical robotic applications, further extending the proposed algorithm to various agricultural tasks such as weed detection, crop growth monitoring, fruit harvesting, and sorting, among others. Hopefully, this work will contribute to enhancing the efficiency of disease recognition for farmers.