Figure 1.
Recognition results based on our method. The red boxes are the bounding boxes given by our method.
Figure 1.
Recognition results based on our method. The red boxes are the bounding boxes given by our method.
Figure 2.
Samples of Kaggle and Plantdoc datasets.
Figure 2.
Samples of Kaggle and Plantdoc datasets.
Figure 3.
Illustration of client and server framework.
Figure 3.
Illustration of client and server framework.
Figure 4.
Screenshot of the detection effect running locally on the mobile terminal. Red boxes are the bounding boxes given by our method.
Figure 4.
Screenshot of the detection effect running locally on the mobile terminal. Red boxes are the bounding boxes given by our method.
Figure 5.
This figure presents the dataset collection sites (Science and Technology Park of the West Campus of China Agricultural University) on Google Map.
Figure 5.
This figure presents the dataset collection sites (Science and Technology Park of the West Campus of China Agricultural University) on Google Map.
Figure 6.
Samples from our dataset. (A) macrophthalmia (maize); (B) black Sigatoka (maize); (C) tumor black powder (maize); (D) black Sigatoka (wheat); (E) green dwarf (wheat); (F) yellow leaf (wheat); (G) rice fever (rice); (H) stripe blight (rice); (I) bacterial streak (rice); (J) late blight (potato); (K) black shin (potato); (L) blight (potato).
Figure 6.
Samples from our dataset. (A) macrophthalmia (maize); (B) black Sigatoka (maize); (C) tumor black powder (maize); (D) black Sigatoka (wheat); (E) green dwarf (wheat); (F) yellow leaf (wheat); (G) rice fever (rice); (H) stripe blight (rice); (I) bacterial streak (rice); (J) late blight (potato); (K) black shin (potato); (L) blight (potato).
Figure 7.
Illustration of the processing of resizing. Given a image, we generated five 224 × 224 images using the following procedure: Center crop: Extract a 224 × 224 image from the center of the input image. This is done by removing 2 pixels from the left and right borders and 2 pixels from the top and bottom borders of the image. Corner crops: Extract four 224 × 224 images from the four corners of the input image. Top-left crop: Remove 5 pixels from the right border and 5 pixels from the bottom border. Top-right crop: Remove 5 pixels from the left border and 5 pixels from the bottom border. Bottom-left crop: Remove 5 pixels from the right border and 5 pixels from the top border. Bottom-right crop: Remove 5 pixels from the left border and 5 pixels from the top border. The red boxes are the sliding windows.
Figure 7.
Illustration of the processing of resizing. Given a image, we generated five 224 × 224 images using the following procedure: Center crop: Extract a 224 × 224 image from the center of the input image. This is done by removing 2 pixels from the left and right borders and 2 pixels from the top and bottom borders of the image. Corner crops: Extract four 224 × 224 images from the four corners of the input image. Top-left crop: Remove 5 pixels from the right border and 5 pixels from the bottom border. Top-right crop: Remove 5 pixels from the left border and 5 pixels from the bottom border. Bottom-left crop: Remove 5 pixels from the right border and 5 pixels from the top border. Bottom-right crop: Remove 5 pixels from the left border and 5 pixels from the top border. The red boxes are the sliding windows.
Figure 8.
Illustration of the AugMix method. AugMix combines multiple data augmentation operations (such as rotation, translation, shearing, etc.) by mixing their results with different probabilities.
Figure 8.
Illustration of the AugMix method. AugMix combines multiple data augmentation operations (such as rotation, translation, shearing, etc.) by mixing their results with different probabilities.
Figure 9.
Mosaic schematic illustration: image (A) (top left), image (B) (top right), image (C) (bottom left), image (D) (bottom right) → mosaic (combine the four images into one).
Figure 9.
Mosaic schematic illustration: image (A) (top left), image (B) (top right), image (C) (bottom left), image (D) (bottom right) → mosaic (combine the four images into one).
Figure 10.
CutMix schematic illustration: image (A) + image (B) (random patch) → CutMix (combines images by pasting a patch from one image to another).
Figure 10.
CutMix schematic illustration: image (A) + image (B) (random patch) → CutMix (combines images by pasting a patch from one image to another).
Figure 11.
Illustration of proposed method based on detection network. When constructing convolutional neural networks (CNNs), some fundamental building blocks are commonly employed to enhance network performance and training stability. Here are two typical building blocks: (1) DBL (Conv + BN + Leaky ReLU): DBL is a basic module that combines a convolutional layer (Conv), batch normalization (BN), and Leaky ReLU activation function. In DBL, the convolutional layer (Conv) is responsible for extracting local features from the input feature map. Batch normalization (BN) is a regularization technique that accelerates network training and mitigates the issues of vanishing and exploding gradients. Leaky ReLU is a nonlinear activation function with a small negative slope, providing. a certain gradient in the negative region, thereby alleviating the vanishing gradient problem. The DBL module integrates these techniques, rendering network training more stable and enhancing feature extraction capabilities. (2) Res_unit (basic residual block): The residual unit (res_unit) is a fundamental module that utilizes skip connections to effectively address the vanishing and exploding gradient problems in deep networks. A basic residual block consists of two DBL layers, where one DBL layer follows another. There is a skip connection between these two DBL layers, connecting the input directly to the output of the second DBL layer. This skip connection allows gradients to propagate more easily within the network, making deep networks easier to train.
Figure 11.
Illustration of proposed method based on detection network. When constructing convolutional neural networks (CNNs), some fundamental building blocks are commonly employed to enhance network performance and training stability. Here are two typical building blocks: (1) DBL (Conv + BN + Leaky ReLU): DBL is a basic module that combines a convolutional layer (Conv), batch normalization (BN), and Leaky ReLU activation function. In DBL, the convolutional layer (Conv) is responsible for extracting local features from the input feature map. Batch normalization (BN) is a regularization technique that accelerates network training and mitigates the issues of vanishing and exploding gradients. Leaky ReLU is a nonlinear activation function with a small negative slope, providing. a certain gradient in the negative region, thereby alleviating the vanishing gradient problem. The DBL module integrates these techniques, rendering network training more stable and enhancing feature extraction capabilities. (2) Res_unit (basic residual block): The residual unit (res_unit) is a fundamental module that utilizes skip connections to effectively address the vanishing and exploding gradient problems in deep networks. A basic residual block consists of two DBL layers, where one DBL layer follows another. There is a skip connection between these two DBL layers, connecting the input directly to the output of the second DBL layer. This skip connection allows gradients to propagate more easily within the network, making deep networks easier to train.
Figure 12.
Illustration of the basic block used in our neural network. The gray dashed box represents the basic block used to construct the model in this paper. The blue blocks represent the convolutional layers, while the orange blocks represent the batch normalization (BN) layers. When the output from the previous layer enters the current block, it is processed through four separate branches. After undergoing the processing illustrated in the figure, the outputs of these branches are concatenated. The resulting output is then fed into the next block.
Figure 12.
Illustration of the basic block used in our neural network. The gray dashed box represents the basic block used to construct the model in this paper. The blue blocks represent the convolutional layers, while the orange blocks represent the batch normalization (BN) layers. When the output from the previous layer enters the current block, it is processed through four separate branches. After undergoing the processing illustrated in the figure, the outputs of these branches are concatenated. The resulting output is then fed into the next block.
Figure 13.
Illustration of the re-parameterization method. RepBlock, short for re-parameterization block, is a novel technique designed to improve the performance of convolutional neural networks (CNNs) by re-parameterizing the weights of convolutional layers. The main idea behind RepBlock is to introduce additional parameters into the network that allow for a more efficient feature extraction and adaptability in the learning process. The RepBlock technique enhances the performance of CNNs by re-parameterizing the weights of convolutional layers. By decomposing the weights into fixed base weights and learnable parameters, the network can learn more expressive and diverse feature representations, ultimately leading to a better performance in various tasks.
Figure 13.
Illustration of the re-parameterization method. RepBlock, short for re-parameterization block, is a novel technique designed to improve the performance of convolutional neural networks (CNNs) by re-parameterizing the weights of convolutional layers. The main idea behind RepBlock is to introduce additional parameters into the network that allow for a more efficient feature extraction and adaptability in the learning process. The RepBlock technique enhances the performance of CNNs by re-parameterizing the weights of convolutional layers. By decomposing the weights into fixed base weights and learnable parameters, the network can learn more expressive and diverse feature representations, ultimately leading to a better performance in various tasks.
Figure 14.
Asymmetric convolution is more robust to up-and-down flips than square convolution. The red boxes indicate the paired features.
Figure 14.
Asymmetric convolution is more robust to up-and-down flips than square convolution. The red boxes indicate the paired features.
Figure 15.
Illustration of the dynamic pruning gate (DPG) module. The DPG module is a technique designed to address the trade-off between accuracy and computational complexity in deep convolutional neural networks, particularly in edge-computing scenarios with limited resources, such as agricultural applications. The main goal of DPG is to improve the efficiency of convolutional networks for data feature extraction without significantly increasing computation and memory requirements.
Figure 15.
Illustration of the dynamic pruning gate (DPG) module. The DPG module is a technique designed to address the trade-off between accuracy and computational complexity in deep convolutional neural networks, particularly in edge-computing scenarios with limited resources, such as agricultural applications. The main goal of DPG is to improve the efficiency of convolutional networks for data feature extraction without significantly increasing computation and memory requirements.
Table 1.
The table summarizes the detection results for different object detection models in terms of mAP@75, mAP@50, recall, and precision. The models compared include Faster RCNN, SSD, YOLO v3, YOLO v4, and the proposed method.
Table 1.
The table summarizes the detection results for different object detection models in terms of mAP@75, mAP@50, recall, and precision. The models compared include Faster RCNN, SSD, YOLO v3, YOLO v4, and the proposed method.
Model | mAP@75 | mAP@50 | Recall | Precision |
---|
Faster RCNN [42] | 0.46 | 0.61 | 0.41 | 0.58 |
SSD [43] | 0.52 | 0.73 | 0.48 | 0.57 |
YOLO v3 [33] | 0.67 | 0.79 | 0.65 | 0.73 |
YOLO v4 [44] | 0.69 | 0.78 | 0.66 | 0.79 |
Ours | 0.78 | 0.92 | 0.73 | 0.94 |
Table 2.
Speeds (frames per second) of different detection models on different platforms. In this table, only the model in this paper could run locally on the Huawei P40 and achieved an inference speed of 17 FPS, while all other models were unable to perform inference under the local computing power and memory limitations of the Huawei P40.
Table 2.
Speeds (frames per second) of different detection models on different platforms. In this table, only the model in this paper could run locally on the Huawei P40 and achieved an inference speed of 17 FPS, while all other models were unable to perform inference under the local computing power and memory limitations of the Huawei P40.
Model | RTX 3080 GPU | PC | Jetson Nano | Huawei P40 |
---|
Faster RCNN [42] | 12 | 8 | 5 | - |
SSD [43] | 21 | 17 | 9 | - |
YOLO v3 [33] | 35 | 28 | 19 | - |
YOLO v4 [44] | 33 | 29 | 17 | - |
Ours | 58 | 49 | 42 | 17 |
Table 3.
Detection results on other datasets using different models.
Table 3.
Detection results on other datasets using different models.
Model | Kaggle | Plantdoc |
---|
Faster RCNN [42] | 0.54 [26] | 0.38 [22] |
SSD [43] | 0.64 [26] | 0.38 [22] |
YOLO v3 [33] | 0.58 [26] | 0.39 [22] |
YOLO v4 [44] | 0.63 [26] | 0.38 [22] |
Ours | 0.66 | 0.48 |
Table 4.
Detection results on more datasets.
Table 4.
Detection results on more datasets.
Research Topic | Metric | Method | Result | FPS |
---|
Wheat head detection | mAP | [26] | 0.6756 [26] | - |
Ours | 0.6748 | 58 |
Maize disease detection | Accuracy | [24] | 97.41% [24] | - |
Ours (backbone + softmax) | 95.38% | 49 |
Apple flower detection | mAP | [25] | 0.9743 [25] | - |
Ours | 0.9438 | 63 |
Leaf disease detection | mAP | [22] | 0.503 [22] | - |
Ours | 0.528 | 58 |
Table 5.
The dataset used in this study consisted of images from four different crops: maize, wheat, rice, and potato. This table provides a detailed overview of the distribution of the dataset, including the number and proportion of images for each crop and disease. For maize, there were 1291 healthy images (8.46% of the dataset), 283 images of macrophthalmia (1.86%), 197 images of black Sigatoka (1.29%), and 84 images of tumor black powder (0.55%). In the wheat category, there were 2013 healthy images (13.19%), 397 images of black Sigatoka (2.60%), 513 images of green dwarf (3.36%), and 523 images of yellow leaf (3.43%). For rice, the dataset contained 4843 healthy images (31.73%), 731 images of rice fever (4.79%), 293 images of stripe blight (1.92%), and 423 images of bacterial streak (2.77%). Lastly, in the potato category, there were 2382 healthy images (15.61%), 472 images of late blight (3.09%), 581 images of black shin (3.81%), and 238 images of blight (1.56%).
Table 5.
The dataset used in this study consisted of images from four different crops: maize, wheat, rice, and potato. This table provides a detailed overview of the distribution of the dataset, including the number and proportion of images for each crop and disease. For maize, there were 1291 healthy images (8.46% of the dataset), 283 images of macrophthalmia (1.86%), 197 images of black Sigatoka (1.29%), and 84 images of tumor black powder (0.55%). In the wheat category, there were 2013 healthy images (13.19%), 397 images of black Sigatoka (2.60%), 513 images of green dwarf (3.36%), and 523 images of yellow leaf (3.43%). For rice, the dataset contained 4843 healthy images (31.73%), 731 images of rice fever (4.79%), 293 images of stripe blight (1.92%), and 423 images of bacterial streak (2.77%). Lastly, in the potato category, there were 2382 healthy images (15.61%), 472 images of late blight (3.09%), 581 images of black shin (3.81%), and 238 images of blight (1.56%).
Crop | Disease | Number | Proportion |
---|
Maize | Healthy | 1291 | 8.46% |
Macrophthalmia | 283 | 1.86% |
Black Sigatoka | 197 | 1.29% |
Tumor black powder | 84 | 0.55% |
Wheat | Healthy | 2013 | 13.19% |
Black Sigatoka | 397 | 2.60% |
Green dwarf | 513 | 3.36% |
Yellow leaf | 523 | 3.43% |
Rice | Healthy | 4843 | 31.73% |
Rice fever | 731 | 4.79% |
Stripe blight | 293 | 1.92% |
Bacterial streak | 423 | 2.77% |
Potato | Healthy | 2382 | 15.61% |
Late blight | 472 | 3.09% |
Black shin | 581 | 3.81% |
Blight | 238 | 1.56% |
Table 6.
After applying data augmentation techniques, the distribution of the dataset was balanced across all crops and diseases. This table provides a detailed overview of the distribution of the augmented dataset, including the number and proportion of images for each crop and disease. For maize, the dataset then consisted of 15,260 healthy images (6.25% of the dataset), 15,215 images of macrophthalmia (6.23%), 15,271 images of black Sigatoka (6.25%), and 15,272 images of tumor black powder (6.26%). In the wheat category, there were 15,261 healthy images (6.25%), 15,269 images of black Sigatoka (6.25%), 15,267 images of green dwarf (6.25%), and 15,247 images of yellow leaf (6.24%). For rice, the augmented dataset contained 15,263 healthy images (6.25%), 15,260 images of rice fever (6.25%), 15,260 images of stripe blight (6.25%), and 15,270 images of bacterial streak (6.25%). Lastly, in the potato category, there were 15,259 healthy images (6.25%), 15,275 images of late blight (6.26%), 15,249 images of black shin (6.25%), and 15,256 images of blight (6.25%). The data augmentation process effectively balanced the dataset by ensuring that each disease category had a similar number of images, which led to a more robust and reliable model.
Table 6.
After applying data augmentation techniques, the distribution of the dataset was balanced across all crops and diseases. This table provides a detailed overview of the distribution of the augmented dataset, including the number and proportion of images for each crop and disease. For maize, the dataset then consisted of 15,260 healthy images (6.25% of the dataset), 15,215 images of macrophthalmia (6.23%), 15,271 images of black Sigatoka (6.25%), and 15,272 images of tumor black powder (6.26%). In the wheat category, there were 15,261 healthy images (6.25%), 15,269 images of black Sigatoka (6.25%), 15,267 images of green dwarf (6.25%), and 15,247 images of yellow leaf (6.24%). For rice, the augmented dataset contained 15,263 healthy images (6.25%), 15,260 images of rice fever (6.25%), 15,260 images of stripe blight (6.25%), and 15,270 images of bacterial streak (6.25%). Lastly, in the potato category, there were 15,259 healthy images (6.25%), 15,275 images of late blight (6.26%), 15,249 images of black shin (6.25%), and 15,256 images of blight (6.25%). The data augmentation process effectively balanced the dataset by ensuring that each disease category had a similar number of images, which led to a more robust and reliable model.
Crop | Disease | Number | Proportion |
---|
Maize | Healthy | 15,260 | 6.25% |
Macrophthalmia | 15,215 | 6.23% |
Black Sigatoka | 15,271 | 6.25% |
Tumor black powder | 15,272 | 6.26% |
Wheat | Healthy | 15,261 | 6.25% |
Black Sigatoka | 15,269 | 6.25% |
Green dwarf | 15,267 | 6.25% |
Yellow leaf | 15,247 | 6.24% |
Rice | Healthy | 15,263 | 6.25% |
Rice fever | 15,260 | 6.25% |
Stripe blight | 15,260 | 6.25% |
Bacterial streak | 15,270 | 6.25% |
Potato | Healthy | 15,259 | 6.25% |
Late blight | 15,275 | 6.26% |
Black shin | 15,249 | 6.25% |
Blight | 15,256 | 6.25% |