A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8

Liu, Wenbo; Bai, Chenhao; Tang, Wei; Xia, Yu; Kang, Jie

doi:10.3390/agronomy14092069

Open AccessArticle

A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8

by

Wenbo Liu

^*,

Chenhao Bai

,

Wei Tang

,

Yu Xia

and

Jie Kang

School of Electrical and Control Engineering, Shaanxi University of Science and Technology, Xi’an 710021, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(9), 2069; https://doi.org/10.3390/agronomy14092069

Submission received: 15 August 2024 / Revised: 2 September 2024 / Accepted: 5 September 2024 / Published: 10 September 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

To address the real-time detection challenge of deploying deep learning-based tomato leaf disease detection algorithms on embedded devices, an improved tomato leaf disease detection algorithm based on YOLOv8n is proposed in this paper. It is able to achieve the efficient, real-time detection of tomato leaf diseases while maintaining model’s lightweight requirements. The algorithm incorporated the LMSM (lightweight multi-scale module) and ALSA (Attention Lightweight Subsampling Module) to improve the ability to extract lightweight and multi-scale semantic information for the specific characteristics of tomato leaf disease, which include irregular spot size and lush tomato leaves. The head network was redesigned utilizing partial and group convolution along with a parameter-sharing method. Scalable auxiliary bounding box and loss function optimization strategies were introduced to further enhance performance. After undergoing the pruning technique, computation decreased by 61.7%, the model size decreased by 55.6%, and the FPS increased by 44.8%, all while a high level of accuracy was maintained. A detection speed of 19.70FPS on the Jetson Nano was obtained after undergoing TensorRT quantization, showing a 64.85% improvement compared to the initial detection speed. This method met the high real-time performance and small model size requirements for embedded tomato leaf disease detection systems, indirectly reducing the energy consumption of online detection. It provided an effective solution for the online detection of tomato leaf disease.

Keywords:

LMSM; ALSA; deep learning; pruning strategies; Jetson Nano

1. Introduction

S o l a n u m

l y c o p e r s i c u m

L., commonly known as tomato, not only serves as a dietary fruit but also functions as a functional food that protects humans against chronic degenerative illnesses such as diabetes, microvascular complications, viral diseases, and cancers. Therefore, tomatoes are very popular [1]. In 2023, global tomato production was approximately 44.4 million tons, while the projected production for 2024 is expected to reach 47 million tons [2]. However, tomato diseases have always been a key issue and problem restricting tomato yield. According to the FAO (Food and Agriculture Organization) of the United Nations, 20–40% of the worldwide tomato crop is lost due to pests and diseases [3]. As the majority of disease occurs in the early stages, if tomato leaf diseases can be identified and prevented in advance [4], corresponding measures can be taken for treatment and control, thereby preventing the spread and exacerbation of diseases. This helps to protect the healthy growth of tomato plants and reduce the harm to tomato yields. Traditional methods for detecting tomato leaf diseases [5] depend on manual assessment by professionals, which has drawbacks such as low efficiency, challenges in scalability, subjective biases leading to errors, and spatial limitations requiring field sampling by experts [6,7,8]. Thus, achieving the precise detection of tomato leaf diseases in real-time, with high efficiency, and easy deployment has emerged as a prominent research area of interest in recent years.

Many scholars have developed and improved many algorithms for tomato leaf diseases. Zhang et al. [9] developed an enhanced Faster R-CNN [10] (the Faster Region-Based Convolutional Neural Network) model utilizing strong feature extraction of the CNN and classification capabilities to identify healthy tomato leaves and four illnesses. RestNet101 is utilized for picture feature extraction instead of the VGG16 (the Visual Geometry Group 16-layer Network), and the anchor frames are clustered utilizing the K-Means clustering approach. The improved method raised the accuracy of identifying agricultural leaf illnesses by 2.71% compared to the original Faster R-CNN. It achieved a 97.18% accuracy in a controlled laboratory setting, coupled with a detection speed of 452 ms on a PC (Personal Computer), demonstrating effective performance in detecting tomato leaf diseases.

When porting to a mobile platform for online plant disease diagnosis, consideration must be given to the complexity, model size, and speed of the neural network model. Current lightweight single-stage target detection algorithms, like YOLO (You Only Look Once) [11], the SSD (the Single Shot MultiBox Detector) [12], and EfficientDet [13], process the entire image in one pass through the neural network, directly predicting bounding boxes and associated class probabilities, resulting in quick and accurate performance. In contrast, two-stage object detection algorithms first propose regions of interest using a network. Then, they refine these proposals for accurate localization and classification, which is slower [14,15]. Many scholars have applied single-stage algorithms in tomato leaf disease detection, achieving good results due to their lightweight characteristics and high-speed detection advantages.

Hu Lingyan [16] et al. improved the structure of SqueezeNet from the perspectives of network lightweighting and extracting features accurately. The Fire module is streamlined, and its Expand layer is adjusted. In addition, the model is combined with the ECA (Efficient Channel Attention) module. The final size of the improved SqueezeNet model is 3.5 MB, with a recognition accuracy of 97.29%, which provides a technical method for embedded devices to identify tomato diseases in real production. Liu Jun [17] et al. used an image pyramid to optimize the feature layer of the Yolov3 model and to improve the detection accuracy and speed of the Yolov3 model. Moreover, using different sizes of the multi-scale training strategy with different sizes of feature maps for training enhances the robustness, with the improved Yolov3 reducing the PC detection time by 5.3 ms to 20.39 ms relative to the pre-improvement network and improving the recognition accuracy by 8.07% to 92.39%, which meets the need of real-time detection while maintaining a high detection rate.

At present, tomato leaf disease detection algorithms have been greatly improved in terms of model size and number of parameters. For application on equipment, the CCNN (Contextual Convolutional Neural Network) model proposed by Aishwarya N [18] and others based on a custom CNN model includes three convolutional layers and three fully connected layers. The number of parameters is reduced by 80% compared to the CNN model, and it has a lower processing time and calculation amount. The model was deployed on a mobile phone for detection, and the results were compared with Alex Net, VGG16, and other networks. It has shorter running time and higher accuracy, reaching 98.44%. However, due to its detection speed and model size, equipment applications based on agricultural background cannot meet the needs of embedded devices with limited hardware resources, which has become a big problem for deployment and online detection. In this study, a tomato leaf disease detection method based on improved YOLOv8n was proposed and deployed to the embedded device side for real-time online detection. The primary contributions of this paper are as follows:

(1): A lightweight model based on YOLOv8n was designed. Combined with the characteristics of tomato leaf diseases, a multi-scale lightweight module was designed for the backbone and neck network to perform feature extraction and feature fusion. In addition, a lightweight downsampling module with attention was designed for the neck network. The head network uses lightweight convolution and parameter-sharing strategies to achieve a lightweight design on YOLOv8n. Finally, the improved model was further lightweighted using the LAMP (Layer-Adaptive Magnitude-Based Pruning) algorithm;
(2): The loss function was improved to MPDIoU (Multi-Scale Progressive Distance Intersection over Union), and a scalable auxiliary bounding box was constructed for the improved loss function to accelerate convergence. Drawing on the localization method of prediction and real frames in MPDIoU, the parameters of its label-matching strategy were redefined to effectively improve the recall size;
(3): The final lightweighted model was deployed to the Jetson Nano embedded device. Inference detection was performed on images and videos captured by the connected IMX218 camera to achieve the online detection of tomato leaf diseases. The experiments showed that the model met the real-time, lightweight, and low-power requirements on embedded devices, and also provided a viable option for tomato leaf disease recognition.

2. Materials and Methods

The whole inspection process is shown in Figure 1. The dataset was labeled using labelimg, resulting in a ‘.txt’ file. It was then split into a training set and a test set for model training, parameter adjustment, and evaluation. The data were trained using 640 × 640 size input images. The model was tested and examined for precision, recall, and other metrics. The enhanced model was exported to the ‘.onnx’ format for porting to the embedded system. Model acceleration optimization was performed using TensorRT (TensorRT8.0.2) on the embedded side. The detection work was finally carried out at the embedded end.

2.1. Dataset

The data should be selected from images with appropriate brightness, contrast, and resolution for training, ensuring that the model can fully learn the features of the images without reaching an overfitting state prematurely. In this study, 16,950 images from the PlantVillage (https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset (accessed on 17 June 2023)) dataset and 3078 manually expanded images were used, resulting in a total of 20,028 images as the dataset, including 9 tomato leaf disease samples and healthy tomato leaf samples. The leaf disease samples included the two most impactful diseases for tomato crops: Early blight and Septoria Leaf Spot. The dataset was divided into two parts: a training set of 17,545 images and a test set of 2483 images, with a ratio of 7:3. The distribution of the dataset is shown in Figure 2.

2.2. Improved YOLOv8n Method

The improvement methods involved lightweight enhancements to the backbone, neck, and head networks, optimization of loss functions and label-matching strategies, and model pruning.

2.2.1. Backbone Lightweighting

The c2f module in the original YOLOv8n is too computationally intensive as a stacked module; hence, the model lightweighting process begins with modifying the c2f module. Due to the actual scene of tomato leaf disease with spot size irregularity, lush tomato leaves, and other issues combined with tomato leaf disease pathology characteristics, the use of a single convolution for feature extraction may be insufficient for spot feature extraction. So, from a multi-scale perspective, to use different specifications of the convolution for feature extraction, and ultimately fusion operations, and to follow the lightweight principle, a LMSM (lightweight multi-scale module) was introduced, as shown in Figure 3, to replace the original backbone network in the c2f module.

For feature maps of the same size, a larger receptive field has better contextual information, which is very important for understanding the global structure of the image. In addition, it has better feature extraction capabilities to capture larger scale features, such as the overall shape of the object, its texture, background information, etc. A smaller receptive field can more accurately capture local details, such as edges, texture, etc., and has a higher processing speed, thus reducing computation and memory overhead. Therefore, a large receptive field was used in one part of the feature map to extract the global information of the dense leaves to enhance the localization of the leaves, and a small receptive field was used in the other part to extract the local lesion features, which were ultimately fused to make it better at both identifying the disease species and relatively reducing the computational and memory overheads.

The module consists of three branches: The 1 × 1 convolutional branch, which is only used to adjust the number of channels of the input feature map, preserves the semantic features of the original feature map. After the 1 × 1 convolution, the output feature maps have half the number of channels compared to the input feature maps along the channel dimension. The 3 × 3 convolutional branch and the 5 × 5 convolutional branch perform the convolution operation on the input feature maps of different halves of the channels, respectively. After the 3 × 3 and 5 × 5 convolutions, the output feature maps have one-fourth the number of channels compared to the feature maps that did not undergo 3 × 3 and 5 × 5 convolutions, respectively. This approach reduces the computation required for each convolutional part. Finally, the feature maps after the 1 × 1 convolution, 3 × 3 convolution, and 5 × 5 convolution are subjected to splicing operations in the channel dimension to generate the final output feature maps. The module retains the original semantic features of the feature map, but also fuses the semantic features of different scales, while separately performing multi-scale convolution operations on the input feature maps on a different half of the channel, which reduces the amount of computation and the number of parameters, in line with the lightweight requirements.

2.2.2. Neck Lightweighting

Using the same stacking idea as the original C2f, the LMSM was stacked as a submodule to generate a C2f-LMSM module to replace the C2f module in the neck network for feature fusion, as shown in Figure 4.

For the downsampling part, the ALSA (Attention Lightweight Subsampling Module) with attention was introduced, drawing on the structure of the RAFConv [19] module, as shown in Figure 5. The module is divided into two branches for attention building and downsampling operations, which work as follows.

a.: Average pooling is performed in the first branch and group convolution in the second branch on the input feature map;
b.: The weighted feature maps constructed after average pooling are rearranged using the Rearrange operation, and the width and height of the rearranged feature maps are halved to achieve the same width and height as the feature maps of branch two; the 1 × 1 convolution is used to adjust the number of channels and increase the nonlinearities and to enhance the network for the interactions between the different features; then, the Softmax operation is performed to constitute the final attention;
c.: A Rearrange operation is performed on the feature map of branch two in the channel dimension to rearrange the feature map to align with the channels of the feature map of branch one;
d.: The feature maps generated from branches I and II and the weighted attention are multiplied and then summed to obtain the final output feature map with attention after downsampling.

2.2.3. Head Lightweighting

It was found that the composition of the head network of YOLOv8n has a large number of parameters, which increases the size of the model and also increases the inference time of the model. Therefore, in order to improve real-time performance, the head network must be lightweight. Here, a parameter-sharing idea was used, and the currently popular lightweight convolution was employed to redesign its head network, named Detect-L (Detect-Lightweight), as shown in Figure 6. This lightweighting approach did not significantly impact accuracy. The approach involves merging the convolution operations performed by the original two branches separately, using convolution parameter sharing. Its lightweight convolution uses partial convolution and grouped convolution, which are then branched to detect Bbox loss and Cls loss, respectively.

2.2.4. Loss Function Improvement

CIoU is the original YOLOv8n loss function. In view of the problem that the aspect ratio of the three elements of Bbox regression was not taken into account in the calculation of the DIoU, a shape penalty coefficient

α

and

ν

was added to accelerate the shape convergence; however, when the predicted bounding box and the real bounding box have the same aspect ratio and the center point coincides, but the width and height values are different, CIoU will lose effectiveness and degenerate into an ordinary IoU, while MPDIoU [20] minimizes the predicted bounding box. The distance between the upper left point and the lower right point between the real bounding box and the introduction of the feature map width and height can effectively solve the above-mentioned CIoU degradation problem. As shown in Figure 7, the red represents the true frames of the target, and the yellow represents the predicted frames.The predicted frames and true frames in (a) and (b) have the same ratio. It could be found that there are differences in the calculation of the CIoU and MPDIoU for images with the same ratio. The MPDIoU calculation formula is shown in Function (1).

\begin{matrix} d_{1}^{2} = {(x_{1}^{B} - x_{1}^{A})}^{2} + {(y_{1}^{B} - y_{1}^{A})}^{2} \\ d_{2}^{2} = {(x_{2}^{B} - x_{2}^{A})}^{2} + {(y_{2}^{B} - y_{2}^{A})}^{2} \\ M P D I o U = I o U - (\frac{d_{1}^{2} + d_{2}^{2}}{w^{2} + h^{2}}) \\ L_{M P D I o U} = 1 - M P D I o U \end{matrix}

(1)

where i in

x_{j}^{i}

and

y_{j}^{i}

are taken as A and B, A denotes the prediction frame, and B denotes the target frame; j is taken as 1 and 2, where 1 denotes the upper left point and 2 denotes the lower right point; w and h denote the width and height of the feature map, respectively.

Then, in order to accelerate the convergence of the prediction frame, according to Inner_IoU [21], the IoU trend of different sizes of borders is the same, and for different sizes of IoU samples, the absolute value of the IoU gradient of the auxiliary borders with smaller or larger scales is greater than the absolute value of the IoU gradient of the actual borders, which is more conducive to the back propagation of gradients to accelerate the convergence speed. In this study, for the dense leaf data, there were more low IoU samples, so it was necessary to choose a scaling factor greater than 1. The experimental tuning parameter of the scaling factor was selected to be 1.4 so as to construct an auxiliary bounding box with a scaling factor of 1.4 on the basis of the MPDIoU. This accelerated the convergence speed of the loss function for the target box, named Inner_MPDIoU. The computational formulas are shown in Equations (2)–(4).

\begin{matrix} b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} * r a t i o}{2}, b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} * r a t i o}{2} \\ b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} * r a t i o}{2}, b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} * r a t i o}{2} \\ b_{l} = x_{c} - \frac{w * r a t i o}{2}, b_{r} = x_{c} + \frac{w * r a t i o}{2} \\ b_{t} = y_{c} - \frac{h * r a t i o}{2}, b_{b} = y_{c} + \frac{h * r a t i o}{2} \end{matrix}

(2)

\begin{matrix} i n t e r = (m i n (b_{r}^{g t}, b_{r}) - m a x (b_{l}^{g t}, b_{l})) * \\ (m i n (b_{b}^{g t}, b_{b}) - m a x (b_{t}^{g t}, b_{t})) \\ u n i o n = (w^{g t} * h^{g t}) * {(r a t i o)}^{2} + (w * h) * {(r a t i o)}^{2} - i n t e r \\ {I o U}^{I n n e r} = \frac{i n t e r}{u n i o n} \end{matrix}

(3)

\begin{matrix} I n n e r_M P D I o U = {I o U}^{I n n e r} - (\frac{d_{1}^{2} + d_{2}^{2}}{w^{2} + h^{2}}) \\ L_{I n n e r_M P D I o U} = 1 - I n n e r_M P D I o U \end{matrix}

(4)

where

b_{j}^{i}

, i for

g t

indicates the real box, and vice versa indicates the predicted box; j take l, r, t, b denote, left, right, up, and down, respectively; w, h denote the width and height, respectively; ratio indicates the scaling factor, where the topic ratio of 1.4 has the best effect.

2.2.5. Tag Matching Strategy Improvement

YOLOv8n utilizes the matching technique of TaskAlignedAssigner [22], which involves identifying positive samples by considering the scores weighted by the classification and regression scores, as depicted in Function (5).

a l i g n_m e t i r c = s^{α} * u^{β}

(5)

where s denotes the prediction category score and u denotes the CIoU value of the predicted and true frames.

Based on Section 2.2.4, it can be seen that CIoU will have a degradation problem at some point, making the CIoU values of different prediction frames the same, ultimately resulting in two different samples having the same score, which is not rigorous. The problem lies in the way of measuring the distance between the predicted box and the real box. Therefore, based on the positioning method of the predicted box and the real box based on the MPDIoU, u was redefined in the above align_metric, and the calculation formula is as shown in Function (6). At this point, the model based on all the above modifications was named LYOLOv8n (Lightweight YOLOv8n).

\begin{matrix} {d_{1}^{2} = (x_{1}^{B} - x_{1}^{A})}^{2} + {(y_{1}^{B} - y_{1}^{A})}^{2} \\ {d_{2}^{2} = (x_{2}^{B} - x_{2}^{A})}^{2} + {(y_{2}^{B} - y_{2}^{A})}^{2} \\ u = I o U - \frac{d_{1}^{2} + d_{2}^{2}}{c^{2}} \end{matrix}

(6)

where i in

x_{j}^{i}

,

y_{j}^{i}

is taken as A and B, where A denotes the prediction frame and B denotes the target frame; j is taken as 1 and 2, where 1 denotes the upper left point and 2 denotes the lower right point; and c denotes the length of the diagonal of the minimum outer join matrix of the prediction frame and the real frame.

2.2.6. Lamp Pruning

We used the LAMP [23] (Layer-Adaptive Magnitude-Based Pruning) algorithm for LYOLOv8n. Unlike other pruning algorithms it does not require setting too many parameters, but measures the relative importance of surviving connections within the same layer by setting the LAMP score. The weight tensor of all fully connected layers and the 2D convolution is firstly unfolded into one-dimensional vectors, and the unfolded vectors are sorted in ascending order according to the index map such that

| W [u] | \leq | W [v] |

holds at

u < v

, where

W [u]

denotes the weight corresponding to the index u. The formula for defining the LAMP score is shown in Function (7).

s c o r e (u; W) = \frac{{(W [u])}^{2}}{\sum_{v \geq u} {(W [v])}^{2}}

(7)

This means that two connections with the same weight magnitude are guaranteed to have different LAMP scores, which can be better adapted to the hierarchical characteristics of the network to ensure a more rational and flexible choice of connections when pruning at different levels. When the LAMP scores are computed, the connections with the minimum LAMP scores are globally pruned, and then the pruning is iteratively performed until the desired global sparsity constraints are satisfied, and the optimal pruning is performed while guaranteeing the performance of the model.

2.2.7. Improved Network Model Structure

In this study, the YOLOv8n model was used as the core framework for tomato leaf disease detection, and its working principle can be summarized as follows: the backbone network performs feature extraction to extract the information in the picture, then it goes through the neck network for feature fusion, and finally, the head network utilizes the features that have been zoned to the previous area for recognition.

The CSPDarkNet structure, similar to that of YOLOv5, was adopted for the backbone network part, with the difference that the C3 structure of YOLOv5 was replaced by the C2f structure with a richer gradient flow, which is relatively lighter, and different numbers of channels were adjusted for different scales of the model, significantly improving the model performance.

The PAN-FPN (Path Aggregation Network for Feature Pyramid Network) structure was adopted for the neck network part, where the FPN layer passes semantic features top-down and the PAN passes localization features bottom-up, so that the location information in the bottom layer can also be passed to the deeper layer, thus enhancing the localization capability on multiple scales.

The coupling header of YOLOv5 was changed in the header network to a form similar to that of the decoupling header, which extracts the location and category information of the target separately, learns them through different network branches, and finally fuses them. This not only effectively reduces the number of parameters and computation but also enhances the generalization ability of the model. There are two branches of the decoupling header, Bbox and Cls: Bbox is used to localize the target location information and Cls is used to classify and perform category prediction.

The network architecture of the improved YOLOv8n model, based on the original YOLOv8 network structure diagram [24], is shown in Figure 8. Multi-scale convolutional modules were introduced to the backbone and neck networks to enhance the feature extraction capability for dense leaves and diseased spots. ALSA was used as a neck network downsampling module, achieving lightweighting while constructing channel weights and assigning different attentional weights. Finally, the head network was reconstructed using the parameter-sharing principle and introducing partial and group convolution to further optimize YOLOv8n for lightweighting.

2.3. Model Evaluation Indicators

The model evaluation metrics used were as follows: the model misdetection rate was evaluated using Precision, Recall was used to measure the missed detection rate, and mAP (mean average precision, %) was used to evaluate the accuracy of the model. Their calculations are shown in Function (8). Parameters to describe the number of parameters included in the model, GFLOPs (giga floating point of operations), and memory usage were used to measure the model complexity, and FPS (frames per second transmitted) was used to evaluate the real-time and recognition efficiency of the model.

\begin{matrix} Precision = \frac{CorrectPredictions}{TotalPredictions} = \frac{T P}{T P + F P} \times 100 % \\ Recall = \frac{CorrectPredictions}{TotalGroundTruth} = \frac{T P}{T P + F N} \times 100 % \\ mAP = \frac{\sum_{i = 1}^{N_{class}} A P_{i}}{N_{class}} \times 100 % \end{matrix}

(8)

where the above is defined as follows:

True Positive (TP): the true category is positive and the predicted category is positive.

False Positive (FP): the true category is negative and the predicted category is positive.

False Negative (FN): the true category is positive and the predicted category is negative.

True Negative (TN): the true category is negative and the predicted category is negative.

A P_{i}

denotes the AP of the ith tomato leaf disease message. AP is the area under the precision/recall curve.

N_{c l a s s e s}

is the number of various tomato leaf disease messages.

3. Results and Discussion

3.1. Experimental Operation Platform

The model training part of this experiment was performed on the same computer with the following platform: AMD R7 5800H; selected NVIDIA GeForce RTX 3070 Laptop GPU with 12 GB of video memory and 16 GB of RAM; Windows system; programming tool: PyCharm; programming language: Python 3.9.16; deep learning framework: PyTorch 2.0.1 (CUDA 11.7).

3.2. Parameter Settings

The experiment was conducted for multi-batch training, the number of GPUs was set to 1. To achieve training with no loss of accuracy while maintaining fast training, the batch size was set to 16. After multiple experiments, it was found that training for 50 epochs was sufficient for model convergence. Therefore, the number of epochs was set to 50. To ensure that the model could fully learn the image features, the input image size was set to 640 × 640 pixels, the image was in rgb mode JPG format, the optimizer was selected as SGD, the initial learning rate was set to 0.01, the final learning rate was set to 0.001, the momentum factor was set to 0.937, the weighting decay factor was set to 0.0005, and the pruning magnitude was set to 1.5.

3.3. Analysis and Comparison of Results

3.3.1. Confusion Matrix Comparison

In order to validate the improved label-matching strategy, a confusion matrix was drawn based on the test machine, as shown in Figure 9.

It can be seen that the recognition rate of tomato leaf diseases other than Early blight and Yellow leaf curl virus improved, and the data analysis of the background item revealed that the improved label-matching strategy effectively reduced the leakage rate of various leaf diseases.

3.3.2. Loss Function Comparison Experiment

A comparison of the loss function for the YOLOv8n-based bounding box regression is presented in Figure 10.

With the introduction of Inner_MPDIoU, the bounding box regression was accelerated and the loss was reduced, resulting in a reduction in the difference between the target box and the real box and the improved localization of the target. The YOLOv8n-based TaskAlignedAssigner label-matching strategy confidence and recall loss are shown in Figure 11.

The original TaskAlignedAssigner label-matching strategy had a final Precision of 96.7% and Recall of 84.8% for 50 iterations, and the improved label-matching strategy based on MPDIoU had a final Precision of 96.2% and Recall of 85.9% for 50 iterations.

Relative to the improvement in the previous Precision, which was almost unchanged, the Recall increased by 1.1%, reducing the leakage rate of detection when deployed on the agricultural inspection robot for the actual inspection task. A smaller leakage rate can increase the efficiency of the robot for each inspection, which is important for practical applications.

Finally, the analysis of the validation dataset revealed that after improving the loss function and label-matching strategy, the predicted bounding boxes had very small errors compared to the ground truth bounding boxes. Additionally, the improved algorithm showed stronger generalization capabilities compared to the previous version, enabling more accurate predictions. As shown in Figure 12, the left image represents the actual labels from the validation set, the middle image shows the predictions from the original model, and the right image displays the predictions from the improved model.

3.3.3. Ablation Experiment

For each of the introduced modules, in order to determine their effectiveness, ablation experiments were used to compare the effects, as shown in Table 1.

The ablation experiments revealed that all the modules introduced improved the number of model parameters, computation, model size, and FPS. The accuracy decreased by only 0.08% when the LMSM and C2L were introduced. The use of the ALSA module resulted in a 0.17% increase in the accuracy. The lightweighting of the head network reduced the accuracy by only 0.02%. After LAMP pruning was performed on the improved model, the accuracy was reduced by 0.62%, while the number of model parameters was reduced by 57.79%, the amount of calculation was reduced by 61.73%, the model size was reduced by 55.56%, and the FPS was increased by 44.79%. The experimental results show that the introduced modules had a beneficial impact on online tomato leaf disease detection.

We compared the improved LAMP-LYOLOv8n network with YOLOv8n, the well-known high-performance YOLOv5n network, YOLOv8n with its backbone replaced by the latest partially convolutional network FasterNet [25] from CVPR 2023, and the smallest models of YOLOv9 [26] and YOLOv10 [27], namely, YOLOv9t and YOLOv10n, as shown in Table 2.

Compared with other classic networks and the latest YOLO series networks, the improved model showed only a slight reduction in accuracy, while it had significant advantages in terms of the number of parameters, computational complexity, model size, and FPS (frames per second). This makes it faster and more suitable for real-time tomato leaf disease detection. Although the YOLOv9t and YOLOv10n networks could introduce some innovative modules to reduce the number of parameters and computational complexity, they increased the number of network layers. This was also an issue that we encountered at the early stages of our model improvements. A larger number of network layers does not significantly enhance the real-time performance of the improved network and may even decrease its FPS. Specifically, YOLOv9t has 658 layers, YOLOv10n has 285 layers, while the original YOLOv8n has only 168 layers.

3.4. Impact of Embedded Device Detection

Compared with other embedded edge devices, the NVIDIA Jetson Nano has a small size, low power consumption, and good real-time performance, making it very suitable for scenarios that require mobility and embedded deployment. In addition, the Jetson Nano also provides a wealth of peripheral interfaces and GPU acceleration support, which can provide powerful processing speed and computing performance. Therefore, the Jetson Nano was selected as the main controller, and the tomato leaf disease detection platform is shown in Figure 13. The detection experiment was conducted in a greenhouse. It consisted of an agricultural inspection robot and a tomato leaf disease detection system. During the inspection process of the agricultural inspection robot, the tomato leaf detection system provided pathological information of the leaves through real-time detection of leaf images collected by the camera to complete the identification task. In addition, location information of detected tomato leaf diseases was retained for subsequent use.

In this study, a Jetson Nano was used as the embedded deployment platform. The software environment was TensorRT 8.0.1, CUDA 11.3, cuDNN 8.2, OpenCV 4.5.3. All the required environments were installed in the JetPack-4.6.1 image. The construction of the model was performed through the trtexec tool and the model was quantized using TensorRT. The inference speed of YOLOv8n, YOLOv8n as improved by lightweighting, and the pruned lightweight YOLOv8n on the Jetson Nano after quantization using TensorRT were tested. The results are shown in Table 3.

As shown in Table 3, compared to the original YOLOv8n, LAMP-LYOLOv8n took 16.55 ms less time in the build phase after FP32-TensorRT acceleration. In addition, the build time was 31.1 ms shorter after using FP16-TensorRT acceleration for LAMP-LYOLOv8n, there was an increase in the detection speed of 64.85%, and detection could be achieved at 19.70 FPS. The experiment demonstrated that the TensorRT-accelerated LAMP-LYOLOv8n is capable of meeting the real-time requirements for detecting tomato leaf disease on embedded devices.

Ablation studies were also performed on the embedded device to demonstrate the effectiveness of the module, as shown in Table 4.

Ablation experiments on the Jetson Nano showed that after FP32-TensorRT acceleration, there was an improvement in the build time and FPS for all the introduced modules. The model build time after the lightweight improvement was reduced by 16.55 ms and FPS was improved by 3.66. The experimental results indicate that all the introduced modules had a positive impact on online tomato leaf disease detection.

The improved YOLOv8n was validated by using tomato leaves collected in the field as a test set, and the recognition results are shown in Figure 14. The results show that the model had good detection results in a realistic environment with a complex background and lush leaves and can well meet the needs of practical applications.

4. Conclusions

In this paper, an improved YOLOv8n recognition method for tomato leaf disease is proposed. To address the problem of lush leaf detection in complex environments, a lightweight multi-scale module that realizes the feature fusion capability of the network for different scales was introduced. A lightweight downsampling module with attention was then proposed to construct the weights between the channels while completing the downsampling operation, facilitating better network learning. For the head network, parameter sharing was constructed based on the introduction of partial convolution and group convolution, which lightened the head network. Scalable auxiliary bounding boxes were constructed on top of MPDIoU in terms of the loss function, accelerating the convergence of the loss function. The parameter definitions of the TaskAlignedAssigner matching strategy were modified to improve the accuracy of the network, increase the detection efficiency, and reduce the energy consumption. Finally, the modified network was pruned to further lighten the network and improve the detection speed.

After simulation experiments and physical verification, the improved YOLOv8n detection model has a smaller number of parameters, computational amount, model size, and a very fast recognition speed. The model can detect at a speed of 19.70 FPS in realistic situations. Moreover, the recognition accuracy of 94.60% produces good detection results, which can better realize the application requirements of real agricultural scenarios, and provides a new reference for the identification of leaf diseases in greenhouse tomatoes or the construction of tomato leaf disease identification systems based on the embedded platform. In the future, further research can focus on enhancing the model’s robustness to different environmental conditions and optimizing its performance for other types of crop diseases. Additionally, during online inspections, sunlight may cause images to be overexposed or underexposed. To address this issue, image enhancement methods can be introduced to mitigate the effects of sunlight. Furthermore, integrating the detection system with IoT (Internet of Things) devices and building a cloud platform for greenhouse monitoring will enable the real-time tracking of temperature, humidity, leaf disease information, and the location of infected plants, providing a more comprehensive smart agriculture solution.

Author Contributions

Conceptualization, W.L.; methodology, W.L. and C.B.; software, C.B.; validation, C.B. and W.T.; formal analysis, W.L. and C.B.; investigation, W.T.; resources, W.L., Y.X. and J.K.; data curation, C.B.; writing—original draft preparation, W.L. and C.B.; writing—review and editing, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (62203285); the Natural Science Basic Research Program Shaanxi Province (2022JQ-181), the Xi’an Science and Technology Plan Project (23NYGG0070) and the Young Talent Fund of Xi’an Association for Science and Technology (959202413041).

Data Availability Statement

The dataset can be found at: https://www.kaggle.com/datasets/abdallahalidev/plantvillage-dataset (accessed on 17 June 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yong, K.T.; Yong, P.H.; Ng, Z.X. Tomato and human health: A perspective from post-harvest processing, nutrient bio-accessibility, and pharmacological interaction. Food Front. 2023, 4, 1702–1719. [Google Scholar] [CrossRef]
Tomato News. Global Tomato Production Estimates for 2023 and 2024. 2023. Available online: https://www.wptc.to/global-tomato-processing-in-2023/ (accessed on 22 June 2024).
Food and Agriculture Organization of the United Nations. Crop Losses Due to Pests and Diseases. Available online: https://www.fao.org/statistics/highlights-archive/en (accessed on 29 August 2024).
Xian, X.; Han, P.; Wang, S.; Zhang, G.; Liu, W.; Desneux, N.; Wan, F. The potential invasion risk and preventive measures against the tomato leafminer Tuta absoluta in China. Entomol. Gen. 2017, 36, 4. [Google Scholar] [CrossRef]
Fu, D.; Feng, J. Tomato Leaf Disease and Pest Identification Technology Based on CNN. Comput. Sci. Appl. 2023, 13, 2509. [Google Scholar]
Xie, X.; Ma, Y.; Liu, B.; He, J.; Li, S.; Wang, H. A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Front. Plant Sci. 2020, 11, 751. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Zhou, J.; Wu, H.; Teng, G.; Zhao, C.; Li, J. Improved Multi-scale ResNet for Vegetable Leaf Disease Identification. Trans. Chin. Soc. Agric. Eng. 2020, 36, 20. [Google Scholar]
David, H.E.; Ramalakshmi, K.; Gunasekaran, H.; Venkatesan, R. Literature review of disease detection in tomato leaf using deep learning techniques. In Proceedings of the 2021 7th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 19–20 March 2021; Volume 1, pp. 274–278. [Google Scholar]
Zhang, Y.; Song, C.; Zhang, D. Deep Learning-Based Object Detection Improvement for Tomato Disease. IEEE Access 2020, 8, 56607–56614. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and Efficient Object Detection. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 10778–10787. Available online: https://api.semanticscholar.org/CorpusID:208175544 (accessed on 17 June 2023).
Liu, J.; Meng, W. Review of Single-Stage Object Detection Algorithms Based on Deep Learning. Aerosp. Weapons 2020, 27, 44–53. [Google Scholar]
Du, L.; Zhang, R.; Wang, X. Overview of two-stage object detection algorithms. J. Phys. Conf. Ser. 2020, 1544, 012033. [Google Scholar] [CrossRef]
Hu, L.Y.; Zhou, T.; Xu, W.; Wang, Z.M.; Pei, Y.K. Improved Lightweight SqueezeNet Model for Tomato Disease Recognition. J. Zhengzhou Univ. (Nat. Sci. Ed.) 2022, 54, 71–77. [Google Scholar]
Liu, J.; Wang, X. Tomato diseases and pests detection based on improved Yolo V3 convolutional neural network. Front. Plant Sci. 2020, 11, 521544. [Google Scholar] [CrossRef] [PubMed]
Aishwarya, N.; Praveena, N.G.; Priyanka, S.; Pramod, J. Smart farming for detection and identification of tomato plant diseases using light weight deep neural network. Multimed. Tools Appl. 2023, 82, 18799–18810. [Google Scholar] [CrossRef]
Zhang, X.; Liu, C.; Yang, D.; Song, T.; Ye, Y.; Li, K.; Song, Y. RFAConv: Innovating Spatial Attention and Standard Convolutional Operation. arXiv 2023, arXiv:2304.03198. [Google Scholar]
Ma, S.; Xu, Y. MPDIoU: A Loss for Efficient and Accurate Bounding Box Regression. arXiv 2023, arXiv:2307.07662. [Google Scholar]
Zhang, H.; Xu, C.; Zhang, S. Inner-IoU: More Effective Intersection over Union Loss with Auxiliary Bounding Box. arXiv 2023, arXiv:2311.02877. [Google Scholar]
Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-aligned One-stage Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 3490–3499. [Google Scholar]
Lee, J.; Park, S.; Mo, S.; Ahn, S.; Shin, J. Layer-Adaptive Sparsity for the Magnitude-Based Pruning. 2020. Available online: https://api.semanticscholar.org/CorpusID:234358843 (accessed on 7 September 2024).
Open-MMLab. MMYOLO: YOLOv8 Configurations. 2024. Available online: https://github.com/open-mmlab/mmyolo/tree/dev/configs/yolov8 (accessed on 29 August 2024).
Chen, J.; Kao, S.H.; He, H.; Zhuo, W.; Wen, S.; Lee, C.H.; Chan, S.H.G. Run, don’t walk: Chasing higher FLOPS for faster neural networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar]
Wang, C.-Y.; Yeh, I.-H.; Liao, H. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]

Figure 1. The training and detection process of deep learning models applied to embedded platforms.

Figure 2. Example of a dataset image. (a) Bacterial leaf spot (1976); (b) Early blight (1670); (c) Healthy (1617); (d) Late blight (1875); (e) Leaf mold (1596); (f) Septoria leaf spot (1677); (g) Two-spotted spider mite (1587); (h) Target spot (1625); (i) Mosaic virus (1590); (j) Yellow leaf curl virus (4815).

Figure 3. The detailed structure of the LMSM.

Figure 4. The detailed structure of C2f-LMSM.

Figure 5. The detailed structure of ALSA.

Figure 6. Schematic diagram of the head network before and after improvement.

Figure 7. CIoU and MPDIoU loss functions for predicting the same aspect ratio as the real bounding frame.

Figure 8. The improved YOLOv8n model structure.

Figure 9. Confusion matrix for tomato leaf disease categories. (a) Confusion matrix for raw label-matching strategies; (b) confusion matrix for the improved label-matching strategy.

Figure 10. Boundary box regression loss function.

Figure 11. The results of confidence and recall loss.

Figure 12. The actual labels and predicted labels of the validation set.

Figure 13. Tomato leaf disease detection platform.

Figure 14. Identification results of leaf diseases in greenhouse tomatoes.

Table 1. Results of ablation experiments.

Models	mAP50	Parameters (M)	FLOPs (G)	Size (M)	FPS
YOLOv8n	95.22%	3,007,598	8.1	6.3	508.8
YOLOv8n-LMSM-C2L	95.14%	2,696,878	7.4	5.6	552.3
YOLOv8n-ALSA	95.39%	2,879,158	8.0	6.0	513.7
YOLOv8n-Detect_Lt	95.20%	2,406,068	5.5	5.0	575.6
LYOLOv8n	94.68%	1,966,708	4.7	4.2	606.6
LAMP-LYOLOv8n	94.60%	1,269,372	3.1	2.8	736.7

Table 2. Comparison of different models.

Models	mAP50	Parameters (M)	FLOPs (G)	Size (M)	FPS
YOLOv5n	94.58%	2,504,894	7.1	5.3	507.1
YOLOv8n	95.22%	3,007,598	8.1	6.3	508.8
FasterNet-YOLOv8n	95.10%	4,174,130	10.7	8.6	404.2
YOLOv9t	93.70%	2,620,460	10.7	6.1	390.3
YOLOv10n	94.70%	2,698,316	8.2	5.8	421.1
LAMP-LYOLOv8n	94.60%	1,269,372	3.1	2.8	736.7

Table 3. Performance comparison of models on embedded devices.

Models	Time Spent in the Build Phase/(ms)	FPS
v8n-TensorRT (FP32)	72.07	11.95
Lv8n-TensorRT (FP32)	65.78	13.85
LAMP-Lv8n-TensorRT (FP32)	55.52	15.61
LAMP-Lv8n-TensorRT (FP16)	40.97	19.70

Table 4. Ablation tests conducted on embedded devices.

Models	Time Spent in the Build Phase/(ms)	FPS
v8n	72.07	11.95
v8n-LMSM-C2L	66.05	12.94
v8n-ALSA	70.04	12.91
v8n-Detect-L	59.27	14.06
Lv8n	65.78	13.85
LAMP-Lv8n	55.52	15.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, W.; Bai, C.; Tang, W.; Xia, Y.; Kang, J. A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8. Agronomy 2024, 14, 2069. https://doi.org/10.3390/agronomy14092069

AMA Style

Liu W, Bai C, Tang W, Xia Y, Kang J. A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8. Agronomy. 2024; 14(9):2069. https://doi.org/10.3390/agronomy14092069

Chicago/Turabian Style

Liu, Wenbo, Chenhao Bai, Wei Tang, Yu Xia, and Jie Kang. 2024. "A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8" Agronomy 14, no. 9: 2069. https://doi.org/10.3390/agronomy14092069

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight Real-Time Recognition Algorithm for Tomato Leaf Disease Based on Improved YOLOv8

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Improved YOLOv8n Method

2.2.1. Backbone Lightweighting

2.2.2. Neck Lightweighting

2.2.3. Head Lightweighting

2.2.4. Loss Function Improvement

2.2.5. Tag Matching Strategy Improvement

2.2.6. Lamp Pruning

2.2.7. Improved Network Model Structure

2.3. Model Evaluation Indicators

3. Results and Discussion

3.1. Experimental Operation Platform

3.2. Parameter Settings

3.3. Analysis and Comparison of Results

3.3.1. Confusion Matrix Comparison

3.3.2. Loss Function Comparison Experiment

3.3.3. Ablation Experiment

3.4. Impact of Embedded Device Detection

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI