Semi-Supervised One-Stage Object Detection for Maize Leaf Disease

Liu, Jiaqi; Hu, Yanxin; Su, Qianfu; Guo, Jianwei; Chen, Zhiyu; Liu, Gang

doi:10.3390/agriculture14071140

Open AccessArticle

Semi-Supervised One-Stage Object Detection for Maize Leaf Disease

by

Jiaqi Liu

¹

,

Yanxin Hu

¹,

Qianfu Su

²,

Jianwei Guo

¹,

Zhiyu Chen

^1,3 and

Gang Liu

^1,3,*

¹

School of Computer Science and Engineering, Changchun University of Technology, Changchun 130102, China

²

Institute of Plant Protection, Jilin Academy of Agricultural Sciences (Northeast Agricultural Research Center of China), Changchun 130033, China

³

Jilin Province Data Service Industry Public Technology Research Centre, Changchun 130102, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(7), 1140; https://doi.org/10.3390/agriculture14071140

Submission received: 13 June 2024 / Revised: 8 July 2024 / Accepted: 12 July 2024 / Published: 14 July 2024

(This article belongs to the Special Issue Advanced Image Processing in Agricultural Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Maize is one of the most important crops globally, and accurate diagnosis of leaf diseases is crucial for ensuring increased yields. Despite the continuous progress in computer vision technology, detecting maize leaf diseases based on deep learning still relies on a large amount of manually labeled data, and the labeling process is time-consuming and labor-intensive. Moreover, the detectors currently used for identifying maize leaf diseases have relatively low accuracy in complex experimental fields. Therefore, the proposed Agronomic Teacher, an object detection algorithm that utilizes limited labeled and abundant unlabeled data, is applied to maize leaf disease recognition. In this work, a semi-supervised object detection framework is built based on a single-stage detector, integrating the Weighted Average Pseudo-labeling Assignment (WAP) strategy and AgroYOLO detector combining Agro-Backbone network with Agro-Neck network. The WAP strategy uses weight adjustments to set objectness and classification scores as evaluation criteria for pseudo-labels reliability assignment. Agro-Backbone network accurately extracts features of maize leaf diseases and obtains richer semantic information. Agro-Neck network enhances feature fusion by utilizing multi-layer features for collaborative combinations. The effectiveness of the proposed method is validated on the MaizeData and PascalVOC datasets at different annotation ratios. Compared to the baseline model, Agronomic Teacher leverages abundant unlabeled data to achieve a 6.5% increase in mAP (0.5) on the 30% labeled MaizeData. On the 30% labeled PascalVOC dataset, the mAP (0.5) improved by 8.2%, demonstrating the method’s potential for generalization.

Keywords:

maize disease detection; semi-supervised learning; object detection

1. Introduction

Maize is recognized as one of the crucial crops globally and is an essential raw material for both light and chemical industries [1]. According to the Food and Agriculture Organization (FAO) of the United Nations, maize is cultivated in over 160 countries worldwide. Forecasts suggest that by 2025, the worldwide area dedicated to maize cultivation will approach 180 million hectares, surpassing 150 million tonnes of production. However, maize frequently suffers from various diseases during its growth. These diseases significantly diminish crop yields and directly threaten farmers’ economic interests. One of the main factors contributing to reduced maize yields is leaf diseases. Consequently, the rapid and accurate identification of leaf diseases is critical during maize growth. Traditionally, this diagnostic process usually involves hiring experts for on-site diagnosis or educating farmers to identify various diseases by hiring them to give lectures and then making their judgments based on what they have learned. This approach is time-consuming and labor-intensive, and it is prone to misdiagnosis due to human judgment errors, leading to potential crop losses.

Recently, deep learning technology in recognizing plant diseases has achieved significant advancements. This advancement offers new perspectives and methodologies for precision agriculture and crop protection. Convolutional neural networks (CNNs) have become a key technology in several fields because of their superior ability to process and parse complex visual data. The wide application of CNNs for image classification tasks provides a reliable basis for disease and pest recognition. Fang et al. [2] introduced the Hard Coordinate Attention mechanism (HCA-MFFNet) method developed explicitly for maize leaf disease recognition, using hard coordinated attention (HCA) to extract features from various spatial scales and using depthwise separable convolutional layers to minimize the number of parameters. Ahila et al. [3] used an improved LeNet for maize leaf disease classification, which recognizes three types of diseases and one health category. Zhang et al. [4] proposed a tomato leaf disease recognition model utilizing the Asymptotic Non-Local Means algorithm (ANLM) and Multi-channel Automatic Orientation Recurrent Attention Network (M-AORANet), which solves the problem of noise interference and tomato leaf feature extraction. Zhang et al. [5] proposed an improved GoogLeNet and Cifar10 model that can recognize eight types of maize leaf diseases. In practice, image classification tasks cannot locate lesion areas accurately, making object detection techniques an essential tool in agriculture to provide richer information and assist in enhancing crops’ quality and productivity.

Current classical object detection methods mainly include one-stage detection networks, such as You Only Look Once (YOLO) [6] series, Single Shot MultiBox Detector (SSD) [7] and RetinaNet [8]; as well as two-stage detection networks, including Fast Region-based Convolutional Neural Networks (Fast R-CNN) [9] and Faster R-CNN [10]. Zhang et al. [11] designed a multi-feature fusion Faster R-CNN (MF³ R-CNN) to solve the problem of soybean leaf disease detection in complex scenarios. The two-stage model, which includes two separate stages for generating and detecting candidate frames, makes the overall architecture complex and usually exhibits poor real-time performance. Therefore, single-stage object detection algorithms are gradually becoming the mainstream choice in agriculture. Sun et al. [12] introduced a Mobile End AppleNet-based SSD algorithm (MEAN-SSD) model designed specifically for apple leaf disease detection. Liu et al. [13] enhanced the feature layers of the YOLOv3 model using image pyramids for multi-scale feature detection, enabling accurate and rapid detection of the location and type of diseases and pests in tomatoes. Li et al. [14] proposed an improved YOLOv4 model incorporating depthwise convolution and a hybrid attention mechanism for detecting powdery mildew in strawberry leaves. Qi et al. [15] enhanced the Squeeze Excitation (SE) module for the YOLOv5 model, achieving the extraction of critical features, and effectively detecting tomato virus diseases. The YOLO series is widely used in research and practice for plant disease detection, achieving timely and accurate detection and advancing the development of intelligent agricultural technologies. However, in the complex maize experimental field, with the evolution of the regional characteristics of maize leaf disease, the detection challenges increase, and the detection of maize leaf disease faces limitations [16].

Currently, maize leaf disease detection tasks rely heavily on annotated data, which are time-consuming and labor-intensive to label. Semi-supervised learning offers a strategy to reduce dependence on extensive manually labeled datasets. It achieves this by merging a limited quantity of labeled data with a substantial volume of unlabeled data for model training. Semi-supervised learning has been widely used in technologies, including image classification [17], speech recognition [18], and natural language processing [19]. It has also demonstrated its potential in practical application areas such as agriculture. This technique requires only a small amount of marker data for effective disease identification. Yang et al. [20] utilized a combination of semi-supervised learning and image processing techniques to recognize young green tea leaves. Omidi et al. [21] utilized a technique of semi-supervised clustering to classify whether or not a walnut tree was infected with symptoms. Current research efforts have focused on semi-supervised classification techniques; however, in the agricultural field, there is an urgent need for object detection techniques to recognize diseased areas on leaves. In recent research, Tseng et al. [22] proposed a semi-supervised object detection method using wheat as an example. Although their research and this work are both focused on agriculture, the focus of this work is on maize leaf diseases.

Consequently, when analyzing the characteristics of existing object detection algorithms and maize leaf disease images, it was found that complex conditions such as sufficient lighting and shaded areas in the experimental field increase the difficulty of detection. This leads to misdetection due to the similarity of maize leaf diseases, thereby affecting the accuracy of maize leaf disease detection based on deep learning models. On the other hand, due to differences between maize leaf disease images and images from other domains, existing semi-supervised object detection models struggle to assign pseudo-labels for maize leaf disease accurately.

To address these issues, we propose a semi-supervised one-stage object detection framework for maize leaf disease. In this framework, the WAP strategy and AgroYOLO detector are proposed. By leveraging a large amount of unlabeled data, the WAP strategy accurately assigns pseudo-labels generated by a teacher model, thereby improving the quality of pseudo-labels for the input student model. Additionally, AgroYOLO detector further enhances the detection accuracy of maize leaf diseases. To the best of our knowledge, the semi-supervised object detection algorithm has not been applied to the task of maize leaf disease detection. In plant disease detection, semi-supervised object detection technology is still nascent, requiring further research and exploration. In this context, this research fills the gaps in existing technologies and contributes new insights and methods to this field. The main contributions of this work are summarized as follows:

(1): Agronomic Teacher is designed, a semi-supervised one-stage object detection framework for maize leaf disease, reducing the dependency on extensive labeled data.
(2): The WAP strategy is proposed to enhance the reliability of pseudo-labels assignment based on objectness scores and classification scores from the teacher model.
(3): AgroYOLO detector is developed, which combines Agro-Backbone network to adequately extract detailed features of leaf diseases, and Agro-Neck network to enhance the fusion multi-scale maize leaf disease features capability.
(4): The experimental results demonstrate that Agronomic Teacher outperforms other supervised and semi-supervised object detection algorithms on the MaizeData and PascalVOC datasets.

In Section 1, the motivation and objectives are delineated. Section 2 details the samples used in this study and Agronomic Teacher. Section 3 provides experimental results and discussion. Finally, Section 4 presents the research conclusions and future studies.

2. Materials and Methods

2.1. Experimental Samples

The data were collected from several maize experimental fields in Changchun City, Dehui City, and Baicheng City in Jilin Province, China. Figure 1 shows the experimental field. Maize plant images of various leaf diseases are captured using the Canon 5D Mark III. The maximum pixel size of these images is 19.9 million, with a resolution of 5408 × 3680.

Due to constraints such as environmental conditions, lighting, and shooting angles in the experimental field, the data collected may not cover all situations. By integrating maize leaf disease datasets from different sources, deep learning methods can learn more features during training, thereby improving detection capabilities. Therefore, the maize leaf disease dataset has two sources: one is a self-collected dataset from experimental fields, and the other is a maize leaf disease image dataset from the PlantVillage repository [23]. Ultimately, these two datasets are merged to form a comprehensive dataset named MaizeData, used for further research and analysis. The MaizeData consists of five common maize leaf diseases: Cercospora zeaemaydis Tehon and Daniels (CD), Puccinia polysora (Pp), Common Rust (CR), Blight (Bt), and Mycosphaerella maydis Lindau (ML), in jpg format, as shown in Figure 2.

In addition to the MaizeData dataset, to further validate the proposed method generalizability, the widely used public object detection PascalVOC dataset [24] is utilized in computer vision. The Computer Vision Group released this dataset at the University of Oxford and it includes 20 categories, such as airplanes and bicycles. Table 1 displays comprehensive labeled data information for the MaizeData and PascalVOC datasets.

2.2. Samples Preprocessing

Regarding annotation, LabelImg 1.8.0 tool was used to label leaf diseases on maize plants, add bounding boxes, and generate YOLO format annotation files. The image annotation is shown in Figure 3.

To further enhance the adaptability of MaizeData, image augmentation techniques, including mirroring, brightness adjustment, cropping, cutout, noise injection, translation, and rotation, are adopted. Generating diversified training samples through randomly transforming original images helps the deep learning methods capture a broader range of features and scenes, thereby improving generalization and robustness. Introducing randomness and diversity reduces the deep learning methods’ reliance on specific training samples, thus lowering the risk of overfitting. These techniques significantly expand the scale of the dataset, and examples of enhanced images can be found in Figure 4.

The MaizeData and PascalVOC datasets are divided into training set, validation set, and testing set following a 7:2:1 ratio. To comprehensively assess the performance of the proposed model across datasets of varying scales, the training dataset was further divided into ten distinct sub-datasets based on label proportions of 10%, 15%, 20%, 25%, and 30%. Table 2 shows detailed information about the data.

2.3. Methods

This section details Agronomic Teacher, a semi-supervised maize leaf disease detection framework, and the structure is as shown in Figure 5. The framework is based on a mutual learning approach between the teacher model (

M_{t e a c h e r}

) and the student model (

M_{s t u d e n t}

). After a series of experiments, the YOLOv5s model is chosen as the baseline model, with specific experimental details described in Section 3.3; its structure shown in Figure 6. The semi-supervised learning approach augments the training set with a large amount of unlabeled data and a small amount of labeled data. The WAP strategy utilizes objective scores and classification scores to allocate pseudo-labels, thereby improving the accuracy of pseudo-label assignment. To enhance the accuracy of maize leaf disease detection in complex experimental fields, the AgroYOLO detector is proposed, structured as shown in Figure 7. The AgroYOLO detector consists of Agro-Backbone network and Agro-Neck network. Agro-Backbone network enhances the feature extraction capability of maize leaf disease lesions, while Agro-Neck network improves the feature fusion capability of maize leaf diseases. Improvements to the backbone and neck networks complement each other and enhance the overall performance of the AgroYOLO detector in maize leaf disease detection tasks.

2.3.1. Weighted Average Pseudo-Labeling Assignment Strategy

For a small amount of labeled data, the mosaic data enhancement method generates the data

D_{l}

. Training is conducted using supervised learning, and the labeled loss

L_{s}

is calculated:

L_{s} = \sum_{h, w} L_{s}^{cls} (h, w) + \sum_{h, w} L_{s}^{reg} (h, w) + \sum_{h, w} L_{s}^{obj} (h, w)

(1)

L_{s}^{cls} (h, w) = CE (X_{(h, w)}^{cls}, Y_{(h, w)}^{cls})

(2)

L_{s}^{reg} (h, w) = CIoU (X_{(h, w)}^{reg}, Y_{(h, w)}^{reg})

(3)

L_{s}^{obj} (h, w) = CE (X_{(h, w)}^{obj}, Y_{(h, w)}^{obj})

(4)

Here,

L_{s}^{cls} (h, w)

refers to classification loss.

L_{s}^{reg} (h, w)

denotes regression loss.

L_{s}^{obj} (h, w)

signifies objectness loss.

L_{s}^{cls} (h, w)

and

L_{s}^{obj} (h, w)

are quantified using cross-entropy (CE) as the metric, while

L_{s}^{reg} (h, w)

employs the complete intersection over union (CIoU) as the measure. X represents the predicted values at the spatial location (

h, w

), and Y corresponds to the true values.

For a large amount of unlabeled data, the mosaic data augmentation and strong data augmentation methods are chosen to enhance the samples and obtain data

D_{u}

. During training, the

M_{t e a c h e r}

receives both

D_{l}

and

D_{u}

. The pseudo-labels generated by the

M_{t e a c h e r}

are utilized to guide the training of the

M_{s t u d e n t}

. During the self-training procedure, these labels play a crucial role. Additionally, the

M_{t e a c h e r}

is updated using the EMA technique to generate more accurate pseudo-labels. The pseudo-labels obtained after non-maximum suppression (NMS) are categorized to minimize the number of misleading pseudo-labels. Based on the high and low thresholds, they are divided into two categories. Reliable pseudo-labels are those exceeding the high threshold for default supervised training. Pseudo-labels between the high and low thresholds are considered uncertain pseudo-labels. For uncertain pseudo-labels, the Pseudo Label Assigner [25] classifies them into two categories: high classification scores and high objectness scores.

Due to the significant differences between maize leaf disease images and images from other domains, for the second type of pseudo-labels, the proposed WAP strategy makes full use of objectness scores and classification scores to compute the confidence and relevance of each of the unlabeled data, and can more accurately assign pseudo-labels generated by the utility detector:

\bar{x} = \frac{\sum_{i = 1}^{n} w_{i} x_{i}}{\sum_{i = 1}^{n} w_{i}} = \frac{w_{1} x_{1} + w_{2} x_{2}}{w_{1} + w_{2}}

(5)

x_{1}

represents the objectness scores, and

x_{2}

represents the classification scores, balanced using

w_{1}

and

w_{2}

. Through several experiments, the details are described in Section 3.4. When the weighted score exceeds 0.65, these pseudo-labels have good regressivity, and WAP calculates the regression loss. By carefully considering the relative importance of different scores, the WAP strategy allows for flexible adjustment of weights assigned to objectness scores and classification scores within the model. While maintaining the reliability of pseudo-labels, this weighting approach mitigates the uncertainty of unreliable bounding box predictions. It provides a robust mechanism for pseudo-labels assignment, enhancing the accuracy and reliability of semi-supervised object detection. The unlabeled loss

L_{u}

is calculated:

L_{u} = \sum_{h, w} L_{u}^{cls} (h, w) + \sum_{h, w} L_{u}^{reg} (h, w) + \sum_{h, w} L_{u}^{obj} (h, w)

(6)

L_{u}^{c l s} (h, w) = H (S (h, w), T_{2}) \cdot C E (X_{(h, w)}^{c l s}, {\hat{Y}}_{(h, w)}^{c l s})

(7)

\begin{matrix} L_{u}^{r e g} & = H (S (h, w), T_{2}) \cdot CIoU (X_{(h, w)}^{reg}, {\hat{Y}}_{(h, w)}^{r e g}) \\ + (\bar{x} > 0.65) \cdot CIoU (X_{(h, w)}^{reg}, {\hat{Y}}_{(h, w)}^{r e g}) \end{matrix}

(8)

\begin{matrix} L_{u}^{o b j} & = H (S (h, w), T_{2}) \cdot C E (X_{(h, w)}^{o b j}, {\hat{Y}}_{(h, w)}^{o b j}) \\ + L (T_{1}, S (h, w), T_{2}) \cdot C E (X_{(h, w)}^{o b j}, {\hat{o b j}}_{(h, w)}) \end{matrix}

(9)

The function

H (x, y)

expresses the condition that outputs 1 if x is less than y; in all other cases, it returns 0. Similarly, the function

L (x, y, z)

returns 1 if x is less than y, y is less than z, and 0 otherwise.

S (h, w)

is the classification score of the pseudo-labels.

{o \hat{b} j}_{(h, w)}

is the objectness score of the pseudo-labels at position (h,w).

{\hat{Y}}_{(h, w)}^{c l s}

,

{\hat{Y}}_{(h, w)}^{r e g}

, and

{\hat{Y}}_{(h, w)}^{o b j}

represent the classification score, regression score, and objectness score at position (h,w) obtained through the WAP strategy.

T_{1}

represents high threshold and

T_{2}

represents low threshold. When the pseudo-labels score exceeds the threshold

T_{2}

, the classification loss, regression loss, and obj loss are calculated for all its reliable labels. Only the obj loss is calculated when the score is between

T_{1}

and

T_{2}

. When

\bar{x}

is greater than 0.65, these pseudo-labels have good regression results and only participate in calculating regression loss. In semi-supervised object detection,

λ

is used to balance supervised loss and semi-supervised loss. The total loss

L_{t o t a l}

for the AgroYOLO detector is

L_{t o t a l} = L_{s} + λ L_{u}

(10)

2.3.2. Agro-Backbone Network

During the maize leaf disease detection process, Agro-Backbone structure is proposed, as shown in Figure 8a. Figure 8b provides a detailed explanation of the working principle of the Spatial Pyramid Pooling with Squeeze Excitation and Depthwise Separable Convolution (SPPSEDC) module, and the CC2f module is shown in Figure 8c.

In complex maize disease detection tasks, the original Spatial Pyramid Pooling-Fast (SPPF) module cannot adequately capture detailed features, and it is not good at identifying similar leaf disease features. The proposed SPPSEDC module is based on the original SPPF module by incorporating the Squeeze Excitation (SE) block [26] and a Depthwise Separable Convolution mechanism [27] to improve the multi-scale feature extraction capability. With the precise channel feature calibration of the SE block, the module can highlight key leaf disease signatures while suppressing background noise, thus improving the quality of the feature representation. While conventional convolution considers spatial and depth dimensions when processing feature maps, Depthwise Separable Convolution significantly reduces model parameters and computation by applying filters to each input channel independently, and subsequently integrating the output features using a 1 × 1 convolution kernel. The SPPSEDC module implements Spatial Pyramid Pooling using MaxPooling layers of different sizes, effectively extending the perceptual range of the model and facilitating the integration of a broader range of spatial features. It enhances the capability of the model to detect lesions of various sizes and shapes on maize leaves.

In the maize leaf disease detection task, especially under various lighting conditions, the diversity in the quality of input images poses challenges to feature extraction. The CSP module performs poorly in extracting local features. The C2f [28] module is quoted to replace all the CSP modules of the backbone network. The C2f module is combined with Conv to form the Conv-C2f (CC2f) module. The CC2f module enables the network to comprehend image features at different levels, from coarse to fine.

2.3.3. Agro-Neck Network

The neck network adopted by YOLOv5 consists of FPN [29] and PANet [30]. FPN fuses the data by upward sampling, and PANet transmits the data in bottom-up pyramid mode to improve the feature extraction effect. However, the feature fusion method can only fuse the features of the neighboring layers. The information of other layers can only be obtained indirectly, and this transmission mode may lead to the problem of information loss. The Gather-and-Distribute (GD) mechanism [31] is quoted to efficiently utilize multi-level features extracted from Agro-Backbone network and conduct feature fusion. Although the GD mechanism can effectively integrate the characteristics of maize leaf diseases, it also increases parameter counts. Therefore, the GD mechanism is combined with the C2f module to form a novel Agro-Neck network structure, as depicted in Figure 9. The network structure not only retains the advantages of the GD mechanism in feature processing but also effectively reduces the overall number of parameters of the model by introducing the C2f module, enhancing the capability to process multi-scale features. The GD mechanism systematically collects and fuses features from different layers through the Feature Alignment Module (FAM) and Information Fusion Module (IFM) and then redistributes the information to each layer through the Information Injection Module (Inject). The C2f module further optimizes the refinement of the features, reduces the parameters, and strengthens the model’s capability of identifying the five maize leaf diseases.

By fully utilizing a large amount of unlabeled data and reasonably assigning pseudo-labels using the WAP strategy, the AgroYOLO detector improved the recognition of maize leaf disease capability. Based on these improvements, a novel semi-supervised algorithm for maize leaf disease detection is proposed. The pseudocode for the algorithm is displayed in Algorithm 1.

Algorithm 1 Pseudocode for the Agronomic Teacher.

Input:: labeled data $D_{l}$ , unlabeled data $D_{u}$ , AgroYOLO model weight
Output:: Class probabilities $P_{c}$ and Predicted Bounding Box Coordinates
1:: Initialize:
2:: $D_{total}$ = $D_{train}$ (70%) + $D_{Validation}$ (20%) + $D_{test}$ (10%);
3:: $D_{train}$ = $D_{l}$ (30%) + $D_{u}$ (70%);
4:: $M_{teacher}$ and $M_{student}$ (AgroYOLO as shown in Figure 7);
5:: $N_{teacher}$ = Total epochs for $M_{teacher}$ training;
6:: $N_{student}$ = Total epochs for $M_{student}$ training;
7:: batch_size = 16;
8:: Training the $M_{teacher}$ Stage:
9:: for $epoch = 1$ to $N_{teacher}$ do
10:: Apply the AgroYOLO algorithm to batch;
11:: Load AgroYOLO model weights;
12:: Calculate the loss function;
13:: Evaluate the $M_{t e a c h e r}$ on $D_{validation}$ ;
14:: end for
15:: Generating Pseudo Labels Stage:
16:: $L_{pseudo} \leftarrow \emptyset$ ;
17:: for image in $D_{u}$ do
18:: Generate pseudo-labels using $M_{teacher}$
19:: end for
20:: Training the $M_{student}$ Stage:
21:: for $epoch = 1$ to $N_{student}$ do
22:: for batch_size in $D_{train} \cup L_{pseudo}$ do
23:: Update the $M_{s t u d e n t}$ ;
24:: Update $M_{student}$ by combined minimizing $L_{s}$ and $L_{u}$ ;
25:: end for
26:: Evaluate $M_{student}$ on $D_{Validation}$ ;
27:: end for
28:: Save optimal checkpoint bestweight.pt;
29:: Testing Stage:
30:: for test image in $D_{test}$ do
31:: Predict y = f( $P_{c}$ , $B_{w}$ , $B_{h}$ , $B_{x}$ , $B_{y}$ );
32:: Display class and $P_{c}$ ;
33:: end for
34:: return $M_{student}$ ;

3. Results and Discussion

3.1. Model Evaluation Indicators

Key performance indicators such as precision, recall, and mean average precision (mAP) are used to evaluate the proposed semi-supervised maize leaf disease detection model comprehensively. Additionally, metrics, including the number of model parameters, floating point operations (FLOPs), and frames per second (FPS), are introduced to assess the model’s practicality in the experimental field. These metrics evaluate the model’s processing speed, resource consumption, and computational efficiency, providing a crucial basis for further optimization.

P r e c i s i o n

measures the accuracy of a detector’s optimistic predictions relative to all the samples it predicted as positive. Recall measures the model’s ability to identify positive samples correctly. It represents the proportion of actual positive samples correctly identified as positive. Precision and recall can be described in the following manner:

P r e c i s i o n = \frac{T P}{T P + F P}

(11)

R e c a l l = \frac{T P}{T P + F N}

(12)

In Equations (11) and (12), true positives (

T P

) refer to the number of instances in which the model correctly identifies positive samples as positive. False positives (

F P

) are the number of instances where the model incorrectly predicts negative samples as positive. False negatives (

F N

) represent the number of positive samples mistakenly judged as negative by the model.

Average precision (

A P

) is obtained by calculating the area under the precision–recall curve. AP is calculated as follows:

A P = \frac{1}{m} \sum_{i = 1}^{m} p_{i} = \frac{1}{m} \times P_{1} + \frac{1}{m} \times P_{2} + \dots + \frac{1}{m} \times P_{m}

(13)

mAP calculates the area under the precision–recall curve for each category and averages it over all categories. Equation (14) can be used to determine mAP.

m A P = \frac{1}{n} \sum_{j = 1}^{n} p_{j}

(14)

mAP (0.5) is a metric for evaluating the average precision of a model, with the intersection over union (IoU) threshold set at 0.5. The IoU is determined as described in Equation (15).

S_{1}

represents the overlapping region between the predicted and actual bounding boxes.

S_{2}

denotes the combined area of the predicted and actual bounding boxes. This implies that a prediction is considered accurate only if the overlapping area

S_{1}

is greater than 50% of the combined area

S_{2}

.

I o U = \frac{S_{1}}{S_{2}}

(15)

FPS refers to the time required by the algorithm to process each frame of the image. The number of model parameters refers to the total number of parameters constituting a machine learning model. FLOPs represent the total number of floating-point operations needed to complete a certain task.

3.2. Preparation for Experiments

The experimental setup in this research is conducted within the Ubuntu 20.04 operating system environment, utilizing the Agronomic Teacher model trained on the PyTorch 1.9.0 framework. The CUDA version employed is 11.1, and Python version 3.8.13 is utilized for scripting. The relevant hyperparameter settings for this experiment are shown in Table 3.

3.3. Version Selection of YOLO Series

A comprehensive evaluation was conducted on the performance of YOLO series models on the 30% annotated PascalVOC dataset, including different scales and versions of models such as YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, YOLOv5x, YOLOv7l, and YOLOv8s. Table 4 displays their specific details. In the YOLOv5 series, YOLOv5n has the lowest number of parameters and FLOPs, indicating that it is the minimum in model size and computational complexity. Due to its simpler structure, it has an advantage in processing speed, exhibiting the highest FPS. However, the trade-off for this design is its lowest mAP (0.5), indicating that while it maintains high-speed detection, its accuracy is relatively lower. Conversely, the YOLOv5x model has the highest number of parameters and the largest FLOPs, meaning it possesses more robust capabilities in understanding and processing images. The mAP (0.5) value of YOLOv5x is the highest, reaching 53%. However, this high accuracy comes at the cost of higher computational expenses and more parameters, resulting in a comparatively lower FPS. While YOLOv7l and YOLOv8s show higher mAP (0.5) values, YOLOv7l has a relatively high number of parameters and FLOPs, and YOLOv8s has 68.97 FPS. This high parameter and computational complexity mean that these models require more computational resources and processing time, leading to a relatively lower number of frames processed per second.

Therefore, to achieve a balance between detection accuracy and processing speed, making it more suitable for rapid and efficient detection in the experimental field, YOLOv5s is chosen as the base model for the task of maize leaf diseases.

3.4. Experiment of Parameters Selection for the WAP Strategy

An analysis was conducted on the impact of the weight of the objectness scores,

w_{1}

, and the weight of the classification scores,

w_{2}

, on model performance within the WAP strategy. To ensure the fairness of the experiment, the adjustment of weights was set as the only variable to accurately assess the specific impact of different weight ratios on the model’s objective and classification score performance. The experiment was conducted on the MaizeData dataset, where the range of

w_{1}

decreased from 1.0 to 0.0, and, correspondingly,

w_{2}

increased from 0.0 to 1.0. The results are shown in Figure 10.

The experimental results showed that when the weight combination was

w_{1}

= 0.2 and

w_{2}

= 0.8, the model achieved the best mAP (0.5) value at 42.3%. Specifically, the performance slightly improved when

w_{1}

increased from 0.0 to 0.2 but began declining as it increased to 0.4 and 0.6. The performance slightly recovered at

w_{1}

= 0.8 and

w_{1}

= 1.0 but did not exceed the highest value. This result emphasizes the importance of finding the right balance between the weights of the objectness scores and classification scores. Excessively high or low objectness scores and classification scores weights could affect the model’s performance. Therefore,

w_{1}

is set to 0.2 in the WAP strategy and

w_{2}

to 0.8.

3.5. Experimental Comparison before and after Model Improvement

YOLOv5s and Agronomic Teacher are trained on a 30% labeled training set and evaluated on a testing set. Figure 11 and Figure 12 display the original manually annotated images and the detection results before and after algorithm improvement on the testing set.

Figure 11 illustrates that under complex conditions of shadow and intense light, YOLOv5s fails to capture maize disease information accurately and may even fail to recognize the disease. Meanwhile, as seen in Figure 12, YOLOv5s is observed to detect errors when identifying similar lesions, including missed detections or misclassifications of disease types. However, Agronomic Teacher, by integrating the WAP strategy and the AgroYOLO detector, effectively mitigates these issues, making the detection results more closely aligned with human annotations.

Specifically speaking, the proposed model utilizes 30% annotated data and leverages 70% unlabeled data, thus expanding the training dataset. By employing the WAP strategy to allocate pseudo-labels precisely, the model can better learn the diversity and complexity of samples. Utilizing more unlabeled data helps the AgroYOLO detector better understand these complex variations and more effectively captures the features of local changes, thereby enhancing detection accuracy. Furthermore, Agro-Backbone is employed for feature extraction in the Agronomic Teacher algorithm. This backbone integrates the SPPSEDC module and CC2f module, enabling it to capture rich semantic information from maize leaf disease images. Additionally, Agro-Neck enhances the capability of multi-scale feature fusion using the GD mechanism and CC2f module, finely combining semantic-rich deep features with high-resolution shallow features. This feature fusion strategy effectively retains details frequently lost in deep network layers, achieving precise detection of disease spots.

Consequently, Agronomic Teacher demonstrates higher accuracy and reliability in maize leaf disease detection, which is of significant value for precise disease monitoring and management.

3.6. Ablation Experiment Results

To verify whether all the improvements of the Agronomic Teacher enhance the model’s performance, a series of ablation experiments were conducted on two datasets: the 30% labeled MaizeData and the 30% labeled PascalVOC datasets. Each of the individual and combined innovation is incorporated into the baseline model. The results of the ablation experiments can be seen in Table 5.

On the 30% labeled Maizedata, when using the WAP strategy alone, compared to the baseline model, the precision and mAP (0.5) improve by 1.8% and 1.6%. However, recall decreases slightly by 0.5%. This may be due to insufficient training on unlabeled data, causing the model to miss some targets, thus affecting recall. After adopting Agro-Backbone network, precision, recall, and mAP (0.5) improved by 4.8%, recall by 1.7%, and mAP (0.5) by 2.8%. The detection performance of the model is improved by employing the SPPSEDC and CC2f modules to efficiently extract vital high-level features from the input image, including lesions and texture on the leaf. Furthermore, using Agro-Neck network alone results in improvements of 2.0% in precision, 2.2% in recall, and 2.2% in mAP (0.5). In conjunction with the GD mechanism and the C2f module, the fusion of features extracted by the improved backbone network contributes to the model’s extraction of more comprehensive semantic information and contextual relationships. This further enhances the model’s ability to perform precise detection under varying lighting and angle conditions.

In an experimental study of ablation in two-by-two combinations for the WAP strategy, Agro-Backbone network, and Agro-Neck network, improvements in precision, recall, and mAP (0.5) across three sets of experiments are observed. Using the WAP strategy alone may increase model precision and mAP (0.5) but decrease recall. However, when combining the WAP strategy with Agro-Backbone network and Agro-Neck network, it maintains the level of precision and mAP (0.5) and also compensates for the decrease in recall. Compared to the baseline model, when using the WAP strategy and Agro-Backbone network, there are improvements of 0.7% in precision, 5.8% in recall, and 5.0% in mAP (0.5). The WAP strategy provides a robust mechanism for pseudo-labels assignment by addressing the relative importance of objectness scores and classification scores. Agro-Backbone effectively assists the model in extracting critical features from images. Combining the two fully leverages the advantages of unlabeled data, demonstrating enhanced feature extraction capabilities and improved recognition of targets. When combining the WAP strategy with Agro-Neck network, compared to the YOLOv5s model, the precision increased by 0.3%, the recall increased by 3.1%, and the mAP (0.5) improved by 3.4%. This combination fully leverages the advantages of unlabeled data in the WAP strategy and the feature fusion capability of Agro-Neck, enhancing the model’s detection performance. When using a combination of Agro-Backbone network and Agro-Neck network, there is an improvement of 5.9% in precision, 0.6% in recall, and 3.1% in mAP (0.5). The advantages of Agro-Backbone network in feature extraction, as well as the advantages of Agro-Neck network in feature fusion, are fully utilized to improve the detection performance of the model. The simultaneous use of the WAP strategy, Agro-Backbone network, and Agro-Neck network has the advantage that they can better meet the requirements of the maize experimental field to ensure high-precision disease identification and ultimately improve crop quality and yield. The cumulative enhancements in precision increased by 3.8%, in recall improved by 3.8%, and in mAP (0.5) increased by 6.5%. In the context of maize leaf disease detection, all proposed methods employed in this work have yielded positive outcomes. As observed from Table 5, the results of the ablation experiment conducted on the PascalVOC dataset exhibit a similar trend to that of the MaizeData dataset. This further validates the effectiveness of these proposed methods in practical applications.

3.7. Comparison Experiment Results

We conducted a series of comparative experiments to validate the effectiveness of Agronomic Teacher. These supervised experiments were conducted on the MaizeData and PascalVOC datasets, utilizing different annotation ratios, including 10%, 15%, 20%, 25% and 30%. Conversely, the semi-supervised experiments entirely used the remaining 90%, 85%, 80%, 75%, and 70% of unlabeled data. The supervised experiments’ results are presented in the following Table 6 and Table 7. The results of semi-supervised comparative experiments are depicted in Figure 13 and Figure 14.

3.7.1. Comparison Experiment Results in Supervised Learning

On the MaizeData dataset, it is observed that the performance of the Agronomic Teacher algorithm outperformed the performance of supervised algorithms such as YOLOv5s, YOLOv8s, YOLOv7l, and Gold-YOLO-s at different annotation ratios when the mAP threshold was set to 0.5. Specifically, taking the example of a MaizeData dataset annotated with 30% commentary, as shown in Table 7, compared to Agronomic Teacher, YOLOv5s exhibits lower parameter count and FLOPs. However, the proposed technique outperforms this algorithm by 6.5% in mAP (0.5). Despite YOLOv5x having five times the number of parameters and seven times the FLOPs compared to ours, it is noteworthy that mAP (0.5) improved by 2.7%. This further emphasizes the outstanding performance of the proposed approach in maize leaf disease detection. In contrast, compared to the proposed technique, YOLOv7l boasts 3.8 times the FLOPs, while the proposed method’s parameter count is only half of YOLOv7l. Excitingly, the proposed method’s mAP (0.5) saw a significant increase of 5.8%, indicating an impressive balance between model accuracy and computational efficiency in the proposed approach. Despite having similar parameters to Agronomic Teacher, the Gold-YOLO-s model has approximately 1.6 times the FLOPs of the proposed method. However, under the mAP (0.5) condition, Agronomic Teacher achieves a remarkable performance improvement of 4.9%. Compared to YOLOv8s, the proposed algorithm exhibits slightly lower FLOPs with a marginally higher parameter count. It notably outperforms YOLOv8s in mAP (0.5), improving by 4.2%.

In comparison with models such as YOLOv5s, YOLOv7l, Gold-YOLO-s, and YOLOv8s, the comprehensive analysis of annotated datasets containing 10%, 15%, 20%, and 25% of the data, as shown in Table 6, revealed that Agronomic Teacher consistently exhibits superior performance across various annotation ratios. Specifically, in the 10% annotated dataset, Agronomic Teacher achieved performance improvements of 1.3%, 2.8%, 0.4%, and 0.5% over the above models, respectively. In the 15% dataset, these increments were 4.4%, 4.7%, 3.8%, and 2.3%, respectively. For the 20% dataset, the improvements were 5.3%, 4.5%, 4.4%, and 2.3%, respectively. Finally, in the 25% dataset, increases of 6.4%, 7.3%, 5.8%, and 4.9% were observed. In comparison with YOLOv5x, on the MaizeData dataset with annotation rates of 10% and 15%, YOLOv5x achieved mAP (0.5), surpassing proposed algorithm by 0.3% and 2.9%, respectively. However, as the annotation rate increased to 20%, 25%, and 30%, the proposed algorithm exhibited its advantages, outperforming YOLOv5x by margins of 1.5%, 1.7%, and 2.7%, respectively.

In order to demonstrate the applicability of the proposed method, it was compared with other supervised algorithms on the widely recognized PascalVOC dataset, and a comparative detailed analysis was conducted. The experimental results are presented in Table 6 and Table 7. Below is provided a detailed explanation of the experiments conducted on the PascalVOC dataset with a 30% annotation rate, as shown in Table 7. Compared to YOLOv5s, the proposed model demonstrated substantial improvements, achieving a 6.9% increase in precision, a 6.8% boost in recall, and an impressive 8.2% enhancement in mAP (0.5). However, it is noteworthy that Agronomic Teacher has relatively higher parameter and FLOPs, approximately 2.4 times and 1.7 times, respectively. This indicates that while there have been significant performance improvements, it also demands more computational resources. Compared with the proposed model, the YOLOv5x model’s parameter count and floating-point model size are roughly 5 times and 10 times greater, respectively. Despite this, Agronomic Teacher significantly enhanced recall and mAP (0.5). Specifically, the proposed model improved recall by 2.5% and mAP (0.5) by 0.6%. It is important to note that although there was a slight decrease in precision by 3.5%, overall, the proposed model achieved a good balance between performance and resource consumption. Compared to YOLOv7l, which has approximately five times and two times the FLOPs and parameter count of the proposed model, respectively, the proposed model significantly reduces the demand for computational resources while maintaining a similar level of performance. Compared to existing detection methods, it demonstrates advantages in achieving efficient and high-performance object detection while exhibiting higher detection accuracy. Compared to the recently developed YOLOv8s, Agronomic Teacher possesses a marginally higher parameter count and slightly lower FLOPs. It demonstrates a notable enhancement in performance metrics, with a 2.6% increase in precision and a 2.9% rise in recall. Furthermore, there is a significant 2.5% improvement in mAP (0.5).

On the PascalVOC dataset annotated at 10%, the proposed model demonstrated a significant improvement in mAP (0.5) compared to existing models, as shown in Table 6. Specifically, there was an increase of 2.4% compared to YOLOv5s and a 2.2% increase relative to Gold-YOLO-s. However, compared to YOLOv8s, performance was slightly decreased, with a reduction of 4.7% in mAP (0.5). However, in the 10% annotated subset of the PascalVOC dataset, the proposed model’s performance slightly declined compared to YOLOv8s, with a decrease in mAP (0.5) by 4.7%. This performance difference stems from the limitation in the dataset scale. Despite the PascalVOC dataset containing more categories, the relatively small 10% data volume resulted in a smaller dataset scale. Additionally, the random partitioning of training, testing, and validation sets may introduce class imbalance issues, particularly pronounced in scenarios with limited data. Consequently, the proposed model might not have fully captured the characteristics of each category within the PascalVOC dataset, leading to its inferior performance compared to YOLOv8s on the 10% PascalVOC dataset. Compared to high parameter and high FLOPs models like YOLOv5 and YOLOv7l, the proposed model does not exhibit an advantage in terms of accuracy. Specifically, it falls short in mAP (0.5) by 2.9% compared to YOLOv5x and is 4.7% lower than YOLOv7l. Our proposed algorithm demonstrated considerable advantages on the PascalVOC dataset annotated at 15%. It showed a significant improvement in mAP (0.5) compared to YOLOv5s, YOLOv5x, YOLOv7l, Gold-YOLO-s, and YOLOv8 models. Specifically, the increases in mAP (0.5) were 8.3%, 3.0%, 0.2%, 9.0%, and 2.7%, respectively, compared to these models. On Pascal VOC datasets annotated at 20% and 25%, the proposed model demonstrated a significant performance improvement compared to YOLOv5s, YOLOv5x, Gold-YOLO-s, and YOLOv8. Specifically, on the dataset annotated at 20%, there was an increase in mAP (0.5) by 7.2%, 2.8%, 8.7%, and 0.5% compared to these models, respectively. On the dataset annotated at 25%, the increases in mAP (0.5) were 8.9%, 1.8%, 8.4%, and 2.3%, respectively. YOLOv7l possesses twice the parameter count and 3.8 times the FLOPs compared to the proposed model. However, the proposed model does not achieve higher accuracy regarding mAP (0.5). Specifically, it lags by 4.1% and 2.3% on the PascalVOC datasets annotated at 20% and 25% ratios, respectively.

3.7.2. Comparison Experiment Results in Semi-Supervised Learning

A detailed comparison and analysis were conducted between Agronomic Teacher and Efficient Teacher [25] to investigate further and validate the proposed method’s effectiveness. Efficient Teacher is also a one-stage semi-supervised object detection model. Regarding pseudo-label allocation, the Pseudo Label Assigner (PLA) assigns the pseudo-labels generated by the dense detector. For pseudo-labels with high classification scores, only the objectness loss is calculated. The PLA computes the regression loss for those with high objectness scores when the objectness score exceeds 0.99. Extensive experiments were conducted on multiple maize leaf disease detection datasets to assess its performance and adaptability. We tested it on various public datasets to ensure the effectiveness of the proposed method under different environments and conditions.

The results of the comparison experiments on the MaizeData dataset are shown in Figure 13. Our model outperformed the Efficient Teacher model by 0.4%, 1.7%, 2.8%, 5.6%, and 6.7% in terms of mAP (0.5) across datasets with annotation ratios of 10%, 15%, 20%, 25%, and 30%, respectively. This progress is primarily attributed to implementing the WAP strategy, which judiciously allocates different weight ratios to objectness scores and classification scores, resulting in the generation of more accurate pseudo-labels. Additionally, a specially designed Agro-Backbone network has demonstrated enhanced capabilities in capturing the subtle features of maize leaf diseases. Complementing this, Agro-Neck network effectively integrates features across various scales. These improvements demonstrate the effectiveness of the proposed approach for semi-supervised object detection, especially in demonstrating the superior ability of the proposed model to identify and detect maize leaf diseases accurately. Similarly, for the PascalVOC dataset, the experiment results are shown in Figure 14. The proposed methods also improved over Efficient Teacher in mAP (0.5) by 1.8%, 4.2%, 4.4%, 1.2%, and 2.4%. This further proves its generalizability and efficiency in handling various image detection tasks.

4. Conclusions

This paper proposes a semi-supervised object detection method based on a single-stage detector, which effectively utilizes limited labeled data and abundant unlabeled data to improve maize leaf disease recognition accuracy. For a large amount of unlabeled data, the proposed WAP strategy accurately and reasonably allocates pseudo-labels generated by the teacher model. This strategy fully utilizes the weighted objectivity and classification scores, effectively allocating pseudo-labels for maize leaf disease. Additionally, the proposed AgroYOLO detector further improves detection performance. In Agro-Backbone network of this detector, the proposed SPPSEDC module replaces the SPPF module and the CC2f module to enhance the feature extraction ability of local lesion information. In Agro-Neck network of this detector, the GD mechanism is utilized instead of the traditional neck network, combined with the C2f module to improve feature fusion capability, thereby enhancing maize leaf disease detection accuracy. The experimental results show that Agronomic Teacher improves the mAP (0.5) metrics by 1.3%, 4.4%, 5.3%, 6.4%, and 6.5% on the MaizeData dataset, respectively, compared to the baseline model. Agronomic Teacher on the PascalVOC dataset improves the mAP (0.5) metric by 2.4%, 8.3%, 7.2%, 8.9%, and 8.2%, respectively, compared to the baseline model. The experimental results demonstrate that the proposed algorithm effectively improves the performance of maize leaf disease detection by utilizing a large amount of unlabeled data with some generalization.

Although the proposed methods effectively leverage abundant unlabeled data in scenarios where labeled data are limited, providing advantages over other supervised and semi-supervised detection algorithms, there is still substantial untapped potential to explore. The future aim is to integrate this method with practical robotic applications, further extending the proposed algorithm to various agricultural tasks such as weed detection, crop growth monitoring, fruit harvesting, and sorting, among others. Hopefully, this work will contribute to enhancing the efficiency of disease recognition for farmers.

Author Contributions

Conceptualization, Y.H. and Z.C.; methodology, Y.H.; software, Y.H.; validation, J.L., Y.H. and J.G.; formal analysis, Z.C.; investigation, Z.C.; resources, J.G.; data curation, Q.S.; writing—original draft preparation, J.L.; writing—review and editing, J.L.; visualization, G.L.; supervision, G.L. and Z.C.; project administration, G.L.; funding acquisition, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Scientific Research Project of Jilin Provincial Education Department (JJKH20230764KJ).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ranum, P.; Peña-Rosas, J.P.; Garcia-Casal, M.N. Global maize production, utilization, and consumption. Ann. N. Y. Acad. Sci. 2014, 1312, 105–112. [Google Scholar] [CrossRef] [PubMed]
Fang, S.; Wang, Y.; Zhou, G.; Chen, A.; Cai, W.; Wang, Q.; Hu, Y.; Li, L. Multi-channel feature fusion networks with hard coordinate attention mechanism for maize disease identification under complex backgrounds. Comput. Electron. Agric. 2022, 203, 107486. [Google Scholar] [CrossRef]
Ahila Priyadharshini, R.; Arivazhagan, S.; Arun, M.; Mirnalini, A. Maize leaf disease classification using deep convolutional neural networks. Neural Comput. Appl. 2019, 31, 8887–8895. [Google Scholar] [CrossRef]
Zhang, Y.; Huang, S.; Zhou, G.; Hu, Y.; Li, L. Identification of tomato leaf diseases based on multi-channel automatic orientation recurrent attention network. Comput. Electron. Agric. 2023, 205, 107605. [Google Scholar] [CrossRef]
Zhang, X.; Qiao, Y.; Meng, F.; Fan, C.; Zhang, M. Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 2018, 6, 30370–30377. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Proceedings, Part I 14. Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28; Curran Associates, Inc.: Red Hook, NY, USA, 2015. [Google Scholar]
Zhang, K.; Wu, Q.; Chen, Y. Detecting soybean leaf disease from synthetic image using multi-feature fusion faster R-CNN. Comput. Electron. Agric. 2021, 183, 106064. [Google Scholar] [CrossRef]
Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A novel real-time detector for apple leaf diseases using improved light-weight convolutional neural networks. Comput. Electron. Agric. 2021, 189, 106379. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network. Front. Plant Sci. 2020, 11, 898. [Google Scholar] [CrossRef] [PubMed]
Li, Y.; Wang, J.; Wu, H.; Yu, Y.; Sun, H.; Zhang, H. Detection of powdery mildew on strawberry leaves based on DAC-YOLOv4 model. Comput. Electron. Agric. 2022, 202, 107418. [Google Scholar] [CrossRef]
Qi, J.; Liu, X.; Liu, K.; Xu, F.; Guo, H.; Tian, X.; Li, M.; Bao, Z.; Li, Y. An improved YOLOv5 model based on visual attention mechanism: Application to recognition of tomato virus disease. Comput. Electron. Agric. 2022, 194, 106780. [Google Scholar] [CrossRef]
Diao, Z.; Guo, P.; Zhang, B.; Zhang, D.; Yan, J.; He, Z.; Zhao, S.; Zhao, C.; Zhang, J. Navigation line extraction algorithm for corn spraying robot based on improved YOLOv8s network. Comput. Electron. Agric. 2023, 212, 108049. [Google Scholar] [CrossRef]
Xu, H.; Xiao, H.; Hao, H.; Dong, L.; Qiu, X.; Peng, C. Semi-supervised learning with pseudo-negative labels for image classification. Knowl.-Based Syst. 2023, 260, 110166. [Google Scholar] [CrossRef]
Zhu, H.; Gao, D.; Cheng, G.; Povey, D.; Zhang, P.; Yan, Y. Alternative pseudo-labeling for semi-supervised automatic speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 3320–3330. [Google Scholar] [CrossRef]
Søgaard, A. Semi-Supervised Learning and Domain Adaptation in Natural Language Processing; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Yang, J.; Chen, Y. Tender Leaf Identification for Early-Spring Green Tea Based on Semi-Supervised Learning and Image Processing. Agronomy 2022, 12, 1958. [Google Scholar] [CrossRef]
Omidi, R.; Pourreza, A.; Moghimi, A.; Zuniga-Ramirez, G.; Jafarbiglu, H.; Maung, Z.; Westphal, A. A Semi-supervised approach to cluster symptomatic and asymptomatic leaves in root lesion nematode infected walnut trees. Comput. Electron. Agric. 2022, 194, 106761. [Google Scholar] [CrossRef]
Tseng, G.; Sinkovics, K.; Watsham, T.; Rolnick, D.; Walters, T.C. Semi-Supervised Object Detection for Agriculture. In Proceedings of the 2nd AAAI Workshop on AI for Agriculture and Food Systems, Washington, DC, USA, 13–14 February 2023. [Google Scholar]
Hughes, D.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Everingham, M.; Winn, J. The PASCAL visual object classes challenge 2012 (VOC2012) development kit. Pattern Anal. Stat. Model. Comput. Learn. Tech. Rep 2012, 2007, 5. [Google Scholar]
Xu, B.; Chen, M.; Guan, W.; Hu, L. Efficient Teacher: Semi-Supervised Object Detection for YOLOv5. arXiv 2023, arXiv:2302.07577. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
Ultralytics. YOLOv8. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 January 2023).
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path aggregation network for instance segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8759–8768. [Google Scholar]
Wang, C.; He, W.; Nie, Y.; Guo, J.; Liu, C.; Wang, Y.; Han, K. Gold-YOLO: Efficient object detector via gather-and-distribute mechanism. In Advances in Neural Information Processing Systems 36; Curran Associates, Inc.: Red Hook, NY, USA, 2024. [Google Scholar]
Ultralytics. YOLOv5. 2021. Available online: https://github.com/ultralytics/yolov5 (accessed on 9 May 2020).
Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]

Figure 1. Experimental field environment of the collected data.

Figure 2. Examples of maize leaf disease images.

Figure 3. Label fabrication on MaizeData.

Figure 4. The original image alongside its augmented counterparts.

Figure 5. Agronomic Teacher framework. The teacher model (AgroYOLO) receives unlabeled data, while the student model (AgroYOLO) receives both labeled and unlabeled data. The teacher model uses the WAP strategy to consider the importance of different scores in the pseudo-labels and flexibly assign weighting ratios, using the prediction results to guide the training of the student model. The student model is updated using the exponential moving average (EMA) approach to obtain the teacher model.

Figure 6. The YOLOv5 network consists of backbone, neck, and head. The CBS module includes Convolution (Conv), Batch Normalization (BN), and SiLU activation function. The CSP1_n part comprises the CBS module, Residual unit (Resunit), and Concatenation (Concat). In contrast, the CSP2_n part no longer uses Resunit, which comprises the CBS module and Concat. The SPP module is used for multi-scale fusion.

Figure 7. The AgroYOLO network consists of Agro-Backbone network, Agro-Neck, and the detection head.

Figure 8. Agro-Backbone network consists of the Conv module, the CC2f module, and the SPPSEDC module. Panel (a) shows Agro-Backbone network structure, (b) shows the SPPSEDC module, and (c) depicts the CC2f module.

Figure 9. Agro-Neck network. Low-FAM and Low-IFM are low-stage feature alignment modules and low-stage information fusion modules in low-stage branch, respectively. High-FAM and High-IFM are high-stage feature alignment modules and high-stage information fusion modules, respectively.

Figure 10. Choosing

w_{1}

and

w_{2}

threshold on MaizeData.

Figure 10. Choosing

w_{1}

and

w_{2}

threshold on MaizeData.

Figure 11. Comparison between before and after model improvement in the complex experimental field with sufficient light and shaded areas.

Figure 12. Comparison of detection performance in experimental field before and after model improvements.

Figure 13. Comparison experiment of Efficient Teacher and Agronomic Teacher on the Maizedata dataset using labeled data at 10%, 15%, 20%, 25%, and 30%.

Figure 14. Comparison experiment of Efficient Teacher and Agronomic Teacher on the PascalVOC dataset using labeled data at 10%, 15%, 20%, 25%, and 30%.

Table 1. MaizeData and PascalVOC datasets label information.

Dataset	Labels Name	Number of Labels
MaizeData	CD	11,658
	Pp	10,287
	CR	8442
	Bt	8316
	ML	12,835
	total	51,538
PascalVOC	aeroplane	642
	bicycle	807
	bird	1175
	boat	791
	bottle	1291
	bus	526
	car	3185
	cat	759
	chair	2806
	cow	685
	diningtable	109
	dog	1068
	horse	801
	motorbike	759
	person	10,674
	pottedplant	1217
	sheep	664
	sofa	821
	train	630
	tvmonitor	728
	total	30,138

Table 2. The detailed image information about the training set, validation set, and testing set on MaizeData and PascalVOC datasets.

Dataset	Ratio	Labeled Training Count	Unlabeled Training Count	Validation Count	Testing Count
MaizeData	10%	925	8326	2644	1321
	15%	1387	7864	2644	1321
	20%	1850	7401	2644	1321
	25%	2312	6939	2644	1321
	30%	2775	6476	2644	1321
PascalVOC	10%	697	6276	1992	997
	15%	1046	5927	1992	997
	20%	1393	5580	1992	997
	25%	1743	5230	1992	997
	30%	2091	4882	1992	997

Table 3. Experimental parameters information.

Experimental Parameter Name	Setting
input image size	640 × 640
learning rate	0.01
optimiser weight decay	0.0005
momentum factor	0.937
epoch	300
batch_size	16

Table 4. Performance comparison of the different models of YOLO on 30% PascalVOC dataset.

Model	Precision	Recall	mAP (0.5)	Parameters	FLOPs (G)	FPS
YOLOv5n	49.1%	39.7%	39.1%	1,790,977	4.2	208.33
YOLOv5s	59.2%	42.6%	45.4%	7,073,569	16.0	147.06
YOLOv5m	64.3%	45.9%	49.9%	20,948,097	48.2	112.36
YOLOv5l	68.5%	47.5%	52.0%	46,240,609	108.1	92.59
YOLOv5x	69.6%	46.9%	53.0%	86,345,665	204.4	65.36
YOLOv7l	68.6%	51.9%	55.8%	37,299,042	105.4	70.42
YOLOv8s	63.5%	46.5%	51.1%	11,133,324	28.5	68.97

Table 5. Results of ablation experiments on the 30% MaizeData and PascalVOC datasets. “+WAP” indicates improved semi-supervised learning; “+Agro-Backbone” indicates an improved neck network; “+Agro-Neck” indicates an improved backbone network.

Ratio	Dataset	+WAP	+Agro-Backbone	+Agro-Neck	Precision	Recall	mAP (0.5)	Parameters	FLOPs (G)
30%	MaizeData				50.0%	39.0%	35.8%	7,033,114	15.8
		√			51.8%	38.5%	37.4%	7,373,690	16.5
			√		54.8%	40.7%	38.6%	11,956,090	22.4
				√	52.0%	41.2%	38.0%	12,182,394	20.4
		√	√		50.7%	44.8%	40.8%	12,306,170	23.0
		√		√	50.3%	42.1%	39.2%	12,542,202	21.1
			√	√	55.9%	39.6%	38.9%	17,114,874	26.9
		√	√	√	53.8%	42.8%	42.3%	17,474,682	27.6
	PascalVOC				59.2%	42.6%	45.4%	7,073,569	16.0
		√			61.8%	47.7%	50.7%	7,419,425	16.6
			√		58.3%	43.6%	46.7%	12,006,817	22.5
				√	61.3%	44.0%	47%	12,242,081	20.5
		√	√		64.5%	47.9%	52.6%	12,352,673	23.1
		√		√	64.2%	49.1%	53.1%	12,582,657	21.2
			√	√	66.7%	43.1%	47.9%	17,155,329	27.0
		√	√	√	66.1%	49.4%	53.6%	17,515,137	27.7

Table 6. Results of comparison experiments on the MaizeData and PascalVOC datasets using labeled data at 10%, 15%, 20%, and 25%.

Ratio	Dataset	Model	Precision	Recall	mAP (0.5)	Parameters	FLOPs (G)
10%	MaizeData	YOLOv5s [32]	39.3%	31.5%	24.8%	7,023,610	15.8
		YOLOv5x [32]	43.0%	28.9%	26.4%	86,200,330	204.1
		YOLOv7l [33]	32.5%	32.6%	23.3%	37,218,132	105.2
		Gold-YOLO-s [31]	35.7%	32.0%	25.7%	23,169,132	50.9
		YOLOv8s [28]	34.8%	29.6%	25.6%	11,127,519	28.4
		ours	33.5%	33.9%	26.1%	17,474,682	27.6
	PascalVOC	YOLOv5s	36.2%	29%	25.1%	7,073,569	16.0
		YOLOv5x	46.3%	30.3%	30.4%	86,345,665	204.4
		YOLOv7l	40.6%	36.1%	32.2%	37,299,042	105.4
		Gold-YOLO-s	36.5%	34.7%	25.3%	21,515,353	46.1
		YOLOv8s	43.8%	31.4%	32.2%	11,133,324	28.5
		ours	38.5%	29.7%	27.5%	17,515,137	27.7
15%	MaizeData	YOLOv5s	40.4%	32.0%	26.4%	7,023,610	15.8
		YOLOv5x	46.4%	34.3%	32.0%	86,200,330	204.1
		YOLOv7l	41.5%	31.2%	26.1%	37,218,132	105.2
		Gold-YOLO-s	38.0%	33.0%	27.0%	23,169,132	50.9
		YOLOv8s	42.2%	30.3%	28.5%	11,127,519	28.4
		ours	42.6%	33.9%	30.8%	17,474,682	27.6
	PascalVOC	YOLOv5s	50.7%	32.4%	32.8%	7,073,569	16.0
		YOLOv5x	51.2%	38.5%	38.1%	86,345,665	204.4
		YOLOv7l	56%	39.8%	40.9%	37,299,042	105.4
		Gold-YOLO-s	42.7%	39%	32.1%	21,515,353	46.1
		YOLOv8s	51.7%	36.5%	38.4%	11,133,324	28.5
		ours	49.4%	40.9%	41.1%	17,515,137	27.7
20%	MaizeData	YOLOv5s	41.9%	35.4%	29.6%	7,023,610	15.8
		YOLOv5x	43.4%	36.3%	33.4%	86,200,330	204.1
		YOLOv7l	42.4%	35.7%	30.4%	37,218,132	105.2
		Gold-YOLO-s	47.7%	33.0%	30.5%	23,169,132	50.9
		YOLOv8s	44.3%	34.5%	32.6%	11,127,519	28.4
		ours	47.3%	38.6%	34.9%	17,474,682	27.6
	PascalVOC	YOLOv5s	52.6%	36.3%	37.9%	7,073,569	16.0
		YOLOv5x	54.3%	41.2%	42.3%	86,345,665	204.4
		YOLOv7l	61%	47.4%	49.2%	37,299,042	105.4
		Gold-YOLO-s	42.6%	42%	36.4%	21,515,353	46.1
		YOLOv8s	57.3%	42.5%	44.6%	11,133,324	28.5
		ours	55.8%	43.1%	45.1%	17,515,137	27.7
25%	MaizeData	YOLOv5s	44.7%	38.0%	32.1%	7,023,610	15.8
		YOLOv5x	49.1%	37.9%	36.8%	86,200,330	204.1
		YOLOv7l	36.7%	40.7%	31.2%	37,218,132	105.2
		Gold-YOLO-s	42.1%	37.0%	32.7%	23,169,132	50.9
		YOLOv8s	46.3%	35.6%	33.6%	11,127,519	28.4
		ours	51.4%	39.9%	38.5%	17,474,682	27.6
	PascalVOC	YOLOv5s	54.7%	42.1%	41.7%	7,073,569	16.0
		YOLOv5x	67.7%	43.8%	48.8%	86,345,665	204.4
		YOLOv7l	65.1%	50.2%	52.9%	37,299,042	105.4
		Gold-YOLO-s	51.8%	44%	42.2%	21,515,353	46.1
		YOLOv8s	61.8%	44.7%	48.3%	11,133,324	28.5
		ours	58.5%	50.3%	50.6%	17,515,137	27.7

Table 7. Results of comparison experiments on the MaizeData and PascalVOC datasets using labeled data at 30%.

Ratio	Dataset	Model	Precision	Recall	mAP (0.5)	Parameters	FLOPs (G)
30%	MaizeData	YOLOv5s	50.0%	39.0%	35.8%	7,023,610	15.8
		YOLOv5x	53.7%	38.7%	39.6%	86,200,330	204.1
		YOLOv7l	46.5%	41.7%	36.5%	37,218,132	105.2
		Gold-YOLO-s	46.9%	41.0%	37.4%	23,169,132	50.9
		YOLOv8s	50.0%	38.1%	38.1%	11,127,519	28.4
		ours	53.8%	42.8%	42.3%	17,474,682	27.6
	PascalVOC	YOLOv5s	59.2%	42.6%	45.4%	7,073,569	16.0
		YOLOv5x	69.6%	46.9%	53.0%	86,345,665	204.4
		YOLOv7l	68.6%	51.9%	55.8%	37,299,042	105.4
		Gold-YOLO-s	55.8%	47.0%	46.8%	21,515,353	46.1
		YOLOv8s	63.5%	46.5%	51.1%	11,133,324	28.5
		ours	66.1%	49.4%	53.6%	17,515,137	27.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, J.; Hu, Y.; Su, Q.; Guo, J.; Chen, Z.; Liu, G. Semi-Supervised One-Stage Object Detection for Maize Leaf Disease. Agriculture 2024, 14, 1140. https://doi.org/10.3390/agriculture14071140

AMA Style

Liu J, Hu Y, Su Q, Guo J, Chen Z, Liu G. Semi-Supervised One-Stage Object Detection for Maize Leaf Disease. Agriculture. 2024; 14(7):1140. https://doi.org/10.3390/agriculture14071140

Chicago/Turabian Style

Liu, Jiaqi, Yanxin Hu, Qianfu Su, Jianwei Guo, Zhiyu Chen, and Gang Liu. 2024. "Semi-Supervised One-Stage Object Detection for Maize Leaf Disease" Agriculture 14, no. 7: 1140. https://doi.org/10.3390/agriculture14071140

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Semi-Supervised One-Stage Object Detection for Maize Leaf Disease

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Samples

2.2. Samples Preprocessing

2.3. Methods

2.3.1. Weighted Average Pseudo-Labeling Assignment Strategy

2.3.2. Agro-Backbone Network

2.3.3. Agro-Neck Network

3. Results and Discussion

3.1. Model Evaluation Indicators

3.2. Preparation for Experiments

3.3. Version Selection of YOLO Series

3.4. Experiment of Parameters Selection for the WAP Strategy

3.5. Experimental Comparison before and after Model Improvement

3.6. Ablation Experiment Results

3.7. Comparison Experiment Results

3.7.1. Comparison Experiment Results in Supervised Learning

3.7.2. Comparison Experiment Results in Semi-Supervised Learning

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI