1. Introduction
Rice blast, one of the most common diseases in rice, has a significant impact on yield and quality. Rice blast is a rice disease caused by a fungus. As traditional methods for detecting rice blast primarily rely on manual observation and diagnosis, their accuracy and efficiency are low. With the development of deep learning technology, more and more researchers have begun to utilize deep learning techniques to improve rice blast detection efficiency and accuracy. However, due to the difficulty in obtaining annotated data, most deep learning methods rely on supervised learning with labeled data instead of unlabeled data. Therefore, semi-supervised and unsupervised learning methods have become a new hotspot in the early detection of rice blast.
In recent years, researchers have proposed numerous deep-learning-based methods for rice blast detection [
1,
2,
3]. The application of deep learning techniques has significantly enhanced the accuracy, automation, and efficiency of rice blast detection [
4]. Among them, convolutional neural network (CNN) and transfer learning are the most commonly used [
5]. CNN can automatically learn features from images without the need for manual definition. Due to its ability to adapt to various input image variations without manual feature definition, CNN has been widely used in rice blast detection. To improve the performance of CNN, researchers have proposed various enhancement methods, such as the introduction of transfer learning, data augmentation and adaptive dilated convolutions, which effectively enhance the accuracy and robustness of CNN [
6]. Sethy et al. used 11 CNN models to evaluate 5932 field images of rice blast, bacterial leaf blight, brown spot disease, and black spot disease in 2020 [
7]. In 2016, Xie et al. proposed a CNN-based method for detecting rice blast that used a network model with 16 convolutional layers and three fully connected layers [
8]. The model was trained and optimized to achieve an accuracy of up to 92.8%.
In addition to CNN, other deep learning methods have also been applied in rice blast detection. For example, auto-encoders and variational auto-encoders can be used for unsupervised learning and feature extraction. Ma et al. proposed an unsupervised method for rice blast detection based on variational autoencoders in 2018 [
9]. They used variational auto-encoders to extract features from unlabeled images and achieved good results in rice blast detection without labeled data. Due to the limitations of unsupervised learning methods, semi-supervised learning methods have also been widely used. Semi-supervised learning methods utilize a small amount of labeled data and a large amount of unlabeled data to train the models and improve their accuracy.
Apart from deep learning methods, some traditional image processing techniques also have been applied to rice blast detection. For example, chromaticity-based methods and morphological methods can be used for image segmentation and feature extraction of rice blast. In 2017, Majumdar et al. proposed a chromaticity-based method for rice blast detection that can identify the rice blast by computing a color histogram of the affected areas [
10]. The method can also classify different stages of the disease, achieving good results in experiments. Moreover, recurrent neural network (RNN) technology has also been applied to rice blast detection. RNN is a neural network with memory capabilities that can handle sequence data. Taking image sequences as inputs, Lipton et al. used RNN to extract features and classify rice blast in 2015. Compared to CNN, RNN performs better in handling sequence data [
11]. Verma et al. proposed the Long Short-Term Memory-Simple Recurrent Neural Network (LSTM-SRNN) method, which has dynamic learning capabilities, to predict diseased or healthy rice plants in 2021 [
12]. Kim et al. used Long Short-Term Memory Networks (LSTMs) to predict the occurrence of rice blast one year in advance [
13]. They evaluated the predictive performance of the LSTM model by changing input variables such as rice blast scores, temperature, relative humidity, and sunlight duration. The application of deep learning techniques provides various new solutions for early detection of rice blast [
14,
15].
In addition to supervised learning, people also pay attention to the application of unsupervised and semi-supervised learning in rice blast detection. Unsupervised learning methods mainly include auto encoders and generative adversarial networks (GANs) [
16]. Auto encoders can automatically learn features from unlabeled data and then use them for rice blast detection [
17]. A GAN is a generative model that can generate images from noise. In rice blast detection, researchers use GANs to generate rice blast images and then use these images to train classifiers. Semi-supervised learning methods leverage the information from both labeled and unlabeled data for learning simultaneously, so that the performance of the model can be improved even in the presence of a large amount of unlabeled data. Semi-supervised learning methods have been applied in rice blast detection to some extent [
18].
Although the methods proposed in the academic community have achieved certain results in rice blast detection, there is still a lack of semi-supervised learning based on unmanned aerial vehicle (UAV) images for early detection of rice blast [
19]. This paper aims to propose an early detection method for rice blast based on UAV imaging with the help of a semi-supervised contrastive unsupervised transformation iterative network [
20]. The method combines the advantages of semi-supervised learning and unordered transformation networks, effectively utilizing unlabeled data to improve the detection performance. The method first uses labeled data to train a classifier, then uses this classifier to classify unlabeled data, and finally uses the classification results to train an unordered transformation network [
21]. The unordered transformation network is an unsupervised learning method that learns a high-quality feature representation from unlabeled data, further enhancing the accuracy of rice blast detection. As a result, labeled and unlabeled data are used together to train the classifier, further improving the performance of the model [
22].
The main contributions of this study are as follows. (1) The researchers proposed a method based on a semi-supervised contrastive unsupervised transformation iterative network for the early detection of rice blast. (2) This research combines semi-supervised learning (some data labeled while some not) with an unsupervised transformation network (an unsupervised learning method). This combination helps to improve the model’s precision. (3) By training a classifier with labeled data and then using it to classify the unlabeled data, rice blast detection performance has been improved. (4) The methodology of this study aims to provide a novel approach to applying unmanned aerial vehicle (UAV) images to the early detection of rice blast.
2. Materials and Methods
2.1. Experimental Site
The experimental site is located at the Rice Breeding Demonstration Base in Shanghang Chadi Town, Longyan City, China (longitude 116.575 E, latitude 25.02 N), as shown in
Figure 1. The drill seeding method was used to sow seeds, with a sowing area of 0.02 square meters for each variety. In this study, more than 1000 varieties with different blast resistance were sown. Additionally, a protective row of the inducer variety “Minghui 86” was planted around each plot. During rice cultivation, rice seedlings should be planted under moist conditions with a water depth of approximately 10–15 cm, so as to ensure sufficient water supply for seedling growth. However, at the stage of yellow ripeness, the water depth is reduced to a shallow level. The fertilizing amount should follow local filed standards. The amounts of N, P and K fertilizers were, respectively, 162.6 kg/ha, 90.6 kg/ha, and 225.0 kg/ha. To obtain different degrees of disease for model training, the process of natural field induction mainly included the following content: (1) Rice varieties with different blast resistance were selected. (2) Environmental conditions (humidity and temperature) in our experimental site were conducive to the growth and spread of rice blast. (3) Rice plants were monitored during specific time periods to observe the appearance and development of rice blast. Rice blast disease typically occurs and develops during the rice growing season, especially in the early stages of rice growth. The “special time periods” here referred to the jointing stage, the grain filling stage, and the periods just before and after maturity.
2.2. UAV Images Collection
A commercial unmanned aerial vehicle (DJI Mavic2 Pro) is used to collect high-resolution RGB images (5472 × 3648 pixels). The UAV operations were carried out between 2 pm and 4 pm during the yellow ripening period from early July to mid October 2022. The weather was sunny, respectively with temperature and humidity of 30 °C and 72%, and there was not much wind during image collection. The flying altitude of UAV is 5 m, and the camera exposure time is 0.2 ms. Ground resolution of RGB images is 1 mm/pixel. The UAV flight route was planned with forward and lateral overlaps respectively. The forward and lateral overlaps of UAV fight route are 60% and 75% respectively. A total of 1702 high-resolution images based on UAV are collected in this study.
2.3. Grading Standard for Rice Blast Levels
The disease levels of rice blast is determined according to the standards of the International Rice Research Institute’s (IRRI) [
23], as shown in
Table 1. The typical levels of rice blast is shown in
Figure 2. Generally, when the disease level is 1, it indicates that it is healthy and does not need to be labeled. When the disease level reaches 9, the leaf surface will turn yellow. It will die when the disease level reaches 10.
2.4. Network Model
The semi-supervised contrastive unsupervised transformation iterative network model for early detection of rice blast (
Figure 3) combines the advantages of semi-supervised learning and utilizes unlabeled image data for training. It also incorporates the contrastive unsupervised transformation technique to achieve image transformation and augmentation across different domains, thereby improving the generalization capability and precision of the model. The precision here refers to the ratio of the number of samples correctly classified by the model to the total number of samples, usually expressed as a percentage. It measures the proportion of samples correctly classified by the model in the entire dataset. An efficient single-stage object detection model was initially used for rice blast detection [
24]. This model, called RiceblastYolo, consists of Backbone, Feature Pyramid Network (FPN), Detect Header, Anchor Boxes, activation functions and loss functions [
25,
26,
27,
28]. RiceblastYolo is an adaptation of the YOLO framework, tailored to meet the requirements for early detection of rice blast. It excels in the rapid and precise detection of rice blast, making it particularly suitable for extensive image monitoring tasks. Moreover, this method employs semi-supervised learning techniques to enhance the performance of the model, which not only utilizes labeled data but also handles unlabeled data by generating synthetic labels to augment the training dataset. This innovation significantly improves the accuracy and automation of rice blast detection. Furthermore, RiceblastYolo incorporates the use of generative adversarial network (GAN) technology to make the model better adapt to various image variations, consequently elevating the accuracy and robustness of detection.
Rice blast images are generated by an optimized contrastive unsupervised transformation network, and the optimized images, including real labeled images and fake labeled images, are used to train the model. As an iterative learning strategy is used to continuously increase the training data and adjust the model parameters, a highly accurate and robust early detection model for rice blast is ultimately established. The advantage of this method is that it can save the time and cost of manual labeling. This method can enhance the diversity and generalization ability of data through contrastive unsupervised transformation, so as to improve the accuracy and robustness of the model. The specific steps of the proposed semi-supervised contrastive unsupervised transformation iterative network for early detection of rice blast are listed as follows:
- (1)
Construct a semi-supervised rice blast detection model, named RiceblastYolo, with labeled and unlabeled data.
- (2)
Use this basic model to perform object detection on unlabeled data, generate soft-labeled data, and use a contrastive-unpaired-translation method based on generative adversarial networks (GANs) to generate more realistic fake labeled data. Prior knowledge is used to filter out the unreliable fake labeled data, where multiple models detect the same image and only the intersection of their detection results is retained.
- (3)
Merge these fake labeled data with the existing labeled data to create a new training dataset. Retrain the object detection model with the merged dataset.
- (4)
Repeat steps 2–3 until a strongly generalized rice blast detection model is obtained.
2.5. Semi-Supervised Learning
Rice blast images are first collected by drones and manually annotated. RiceblastYolo, an initial detection method known for its speed, efficiency and accuracy, is trained. To utilize the unlabeled data, semi-supervised learning with contrastive unpaired translation (CUT) is used to generate fake labels [
29]. CUT is an image translation technique that can convert images from different domains into semantically similar ones. In semi-supervised learning, CUT can be used to convert unlabeled data into images similar to the labeled data and generate fake labels based on the transformed images. In this approach, labeled data serve as the target domain images, while unlabeled data serve as the source domain images. The generator converts unlabeled data into fake images that resemble the labeled data, and then these fake images are used along with the labeled data to train the rice blast detection model and generate fake labels (Algorithm 1).
Algorithm 1: Generating soft labels |
Input: Training dataset D, Detection model M, confidence threshold conf, IOU threshold iou Output: Soft labels L 1. for each image I in D do 2. B_I← ground-truth bounding boxes in image I 3. O_I← detection output from model M on image I 4. O_I ← NMS(O_I, th_{nms}) 5. for each b in B_I do 6. c_{max} ← O_I 7. for each o in O_I do 8. IOU ← compute IOU between b and o 9. if IOU > th_{iou} and o_{conf} > c_{max} then 10.c_{max}> ← o_{conf} 11.c_{cls} ← o_{cls} 12.if c_{max} > th_{conf} then 13. L_I ← {(b, c_{cls}, c_{max})} 14. return L |
2.6. Optimized Contrastive Unpaired Network
In traditional CUT, there is only one generator and one discriminator. This means that only one generator is responsible for transforming two rice blast images of different severity levels into target domain images, and only one discriminator needs to distinguish between real target domain images and generator-generated images. The traditional CUT network may not fully fuse the features of different severity levels of rice blast images, which makes the features of fused images not rich enough to highlight early disease characteristics. With the additional generator and discriminator, we can individually transform the rice blast images of different severity levels into target domain images and fuse them to better use their features. In a contrastive unpaired translation (CUT) network, there are two generators (G1 and G2) and two discriminators (D1 and D2), as shown in
Figure 4.
Here, G1 is responsible for transforming the input image and generating the translated image, while G2 is responsible for reconstructing the translated image. D1 is used to discriminate the authenticity of the input image, while D2 is used to discriminate the authenticity of the reconstructed image. Specifically, G1 consists of an encoder and a decoder, which encode the input image to extract image features and then decode it to generate the translated image. Similarly, G2 also consists of an encoder and a decoder, which encode the translated image to extract image features and then decode it to generate the reconstructed image. Both D1 and D2 include convolutional layers and fully connected layers, so that they can discriminate the features of the input or reconstructed images and output the probability results.
The loss functions of generators G and G2 are the same, including both the GAN loss and the NCE loss. However, the functions of D and D2 are different. During the training process, both of the discriminators aim to minimize the difference between real data and generated data. Due to the different sample distributions generated by the two generators, the optimal decision boundary for each discriminator may be different. Therefore, although the goals of two discriminators are the same, they may differ slightly in the specific functions. On this basis, the loss function of the discriminators is designed and derived:
In the given context, represents the expected value obtained in samples x from the data distribution . This expected value is denoted as , where f is a function applied to x. This expected value can be estimated by the average value of the samples, denoted as , which is obtained by averaging n samples ×1, ×2, …, xn from distribution . This expected value is commonly used to measure the generative capability of a generative model, since the goal of the generative model is to generate the samples similar to .
In addition, the normalized cross-entropy (NCE) loss is used to prevent the generated images from falling into dead zones of the representation space, which are the regions without training data points. The generator is forced to map the source domain and target domain images into different regions of the representation space, thereby achieving the image translation task. Specifically, in contrastive unpaired translation, the role of the NCE loss is to make the generated images similar to those in the source and target domain. The generator loss function
LG is derived as follows:
In the equation, corresponds to the GAN loss, while represents the NCE loss. The hyperparameter λ is introduced to effectively balance the weight of these two loss terms. The generator outputs G(z) and G2(z) represent the generated results obtained from netG and netG2, respectively.
2.7. Optimization of Early Rice Blast Detection Models
Due to the fact that early rice blast lesions usually occupy small pixel areas in the input images, improving the resolution of input images is a straightforward approach to bypass this issue. However, it is a challenge to recognize the multi-scale features of images captured by unmanned aerial vehicles. Therefore, four kinds of FPN structures are used here: normal FPN, bidirectional FPN (Bifpn) [
30], swin transformer [
31], and the combination of path aggregation (Pafpn) [
32] with the YOLO head detector (
Figure 5). It is observed that the input layer only accepts feature fusion from one output layer, indicating that the input layer has little effect on the feature fusion of the output layer. The effect of this feature fusion structure has little change before and after the connection. Since both Bifpn and swin transformer are relatively complex models, they typically require a large amount of training data to fully exploit their advantages. In cases of small datasets or challenging annotations, problems such as overfitting or performance degradation may arise. Additionally, multi-feature fusion methods, which operate on feature maps of different scales, may be less sensitive to small objects than single-feature approaches. This is because the information of small objects is usually distributed in higher-resolution feature maps, while multi-feature fusion may lead to the loss or blurring of fine details related to small objects. To meet the detection requirements for early rice blast lesions, the smaller target size and proportions will be considered. Firstly, the size and aspect ratios of the previous anchor boxes, as well as the parameters of the prediction branches, are adjusted to meet the detection requirements for small targets. Then, the depth and width of the model are increased to improve the receptive field and detection accuracy of the model. Specifically, the size of the anchor boxes is adjusted to (10, 13), (16, 30), (33, 23), and the aspect ratios are set to (0.65, 1.0, 1.5), making them more suitable for detecting small targets.
As shown in
Figure 5a,b, the normal fpn and Bifpn cannot meet the feature detection requirements for early rice blast. As shown in
Figure 5c, the swin transformer can achieve the feature integration from shallow to deep through multi-size feature receptive fields. It is only suitable for the fusion of different-scale features, and the features with different sizes are added directly. Such feature fusion still has obvious shortcomings: the upper sampling layer will lose the features of the lower layer. Therefore, a conventional idea is to increase the weight parameters. The feature fusion can be enhanced by the following methods. In the Pafpn framework, a multi-scale list
P = {
P3,
P4,
P5,
P6,
P7} is obtained. As for the improved Pafpn, shown in
Figure 5d, an additional weight is added for each input. Using the improved Pafpn as the backbone network, it effectively enhances the fine-grained recognition of target features, which is meaningful for the subsequent segmentation of target region instances. In Formulas (6)–(11), the fused features of each level
P3–7 are described.
Among them, Ptd is the intermediate feature of each level, ε is the small value to avoid numerical instability, which is set to 0.0001, and wi is ensured by applying a rule after each wi.
2.8. Evaluation of Model Performance
Eight levels of rice blast infections are tested in each model. In this paper, the data are classified into the training set, validation set, and test set. The training set does not contain the test set. The validation set is taken from the training set, but it is not involved in the training. Such a data division can evaluate the model objectively [
33]. One hundred images are randomly selected from the dataset as the test set, and then the test results of the improved method will be compared with those of other techniques. According to Formulas (12)–(14), three indicators, that is, precision, recall and
F1, are obtained through multiple controlled experiments to measure the effectiveness of the model in rice blast detection.
Among them, TP refers to the number of positive samples predicted to be positive, FN refers to the number of positive samples predicted to be negative, and FP refers to the number of negative samples predicted to be positive. F1 refers to the performance of precision and recall, which are balanced by harmonizing the average value. When both precision and recall are high, the value of F1 will approach 1, indicating that the model is more accurate in predicting both positive and negative cases.
4. Discussion
YOLO is a popular object detection model that achieves real-time detection by dividing the image into a grid and predicting bounding boxes and class probabilities within each grid cell. It is known for its efficiency and speed in object detection tasks [
34,
35,
36]. YOLACT is a real-time instance segmentation model that combines the advantages of both semantic segmentation and object detection. It introduces a mask branch to predict instance masks along with class labels, resulting in more precise segmentation of objects [
37]. Mask R-CNN is a widely used model for instance segmentation that extends the Faster R-CNN framework by adding a mask prediction branch. In addition to bounding boxes and class labels, it also generates high-quality instance masks, enabling pixel-level segmentation of objects [
38]. Faster R-CNN is a popular object detection model that uses a region proposal network (RPN) to generate potential object proposals and a subsequent detection network to classify and refine the proposals. It is known for its accuracy and robustness in object detection tasks [
39]. These models have been widely used in the field of computer vision and have demonstrated strong performance in various object detection and segmentation tasks. They are used for the early detection of rice blast. As shown in
Table 4, RiceBlastYolo achieved a precision of 99.51% at IOU 0.5 and an average precision of 98.75% in the IOU range of 0.5 to 0.9. This indicates that RiceBlastYolo performs better in detecting and categorizing the severity of rice blast infection, particularly for disease of levels 3 and 4. The precision rate of RiceBlastYolo is 98.23%, and the recall rate of RiceBlastYolo is 99.99%, which were higher than those of YOLO, YOLACT, YOLACT++, Mask R-CNN, and Faster R-CNN. As RiceBlastYolo is specifically designed for early-stage rice blast detection, it is superior to these models mentioned above in terms of accuracy and precision. RiceBlastYolo has become a promising solution for effectively detecting and categorizing rice blast infections in their early stage.
Although the performance of the Yolov5m model is excellent, it requires a large amount of annotated data for training. In contrast, the RiceBlastYolo model adopts a semi-supervised learning approach and improves its performance by iteratively generating and utilizing unlabeled data. Consequently, the RiceBlastYolo model takes advantage of semi-supervised learning and uses unlabeled data to enhance its performance and generalization ability. In terms of rice blast detection, RiceBlastYolo is superior to Yolov5m.
5. Conclusions
By introducing the cycle-consistent adversarial networks with a dual discriminator and dual generator method, this study successfully improves the model performance and the quality of generated images. The source domain is set as rice blast images of high disease level, and the generator is trained to produce fake images similar to actual rice blast images of high disease level. Additionally, the target domain is set as rice blast images of high disease level or middle disease level, which helps the generator to create fake images similar to actual rice blast images. In this way, fake images of high quality are successfully used for model training. In the context of semi-supervised learning, the training parameters of four feature pyramid networks (FPNs) are compared. It is found that the improved Pafpn presents the smallest object loss and box loss values, indicating its proficiency in fitting target features and convergence. The previous improvements have realized the highest accuracy and provided guidance for the subsequent training. The fake images generated by CUT are combined with the soft-label generation algorithm, and the generated soft labels are merged into the model training set. Through iterative network generation, a detection network with the highest accuracy has been successfully developed. An F1 score is calculated based on a threshold of 0.361. When an F1 score of the model is 0.99, it indicates that the prediction has extraordinary accuracy and balance. Significant progress has been made in rice blast detection by using the CUT method to generate high-quality fake images and using soft labels for model training. To validate the effectiveness of this method, an external data validation is conducted and compared with other existing models. It is worth noting that the model demonstrates comprehensive detection capabilities for disease levels 2–4 and provides effective validation for its early-stage rice blast detection. Comparing with popular models such as YOLO, YOLACT, YOLACT++, Mask R-CNN, and Faster R-CNN, RiceBlastYolo is superior, with higher accuracy in early-stage rice blast detection. For instance, at an IOU of 0.5, RiceBlastYolo achieves a precision of 99.51%, and it consistently outperforms other models within the IOU threshold range from 0.5 to 0.9, with an average precision of 98.23%. These results highlight the superior performance of RiceBlastYolo in detecting and categorizing the severity of rice blast infection. This study successfully uses the CUT method to generate high-quality fake images and uses soft labels for model training, making significant advancements in rice blast detection. The research not only provides an effective approach for semi-supervised learning, but also demonstrates the feasibility and effectiveness of this method through external data verification and comparison with other existing models. These findings highlight the practical significance and potential applicability of this approach in early detection and control of rice blast. The research offers new methods and ideas for early detection and control of rice blast, which has important practical value. Future work should focus on refining the model’s performance, extending the dataset to cover a wider range of disease levels, and exploring real-time implementation. Additionally, the research can pave the way for the development of more efficient and accurate systems for early detection and control of rice blast, which has significant practical value in agriculture. This research introduces innovative methods and ideas in the field and offers new possibilities for improving rice blast detection and control.