A Lightweight YOLOv8 Model for Apple Leaf Disease Detection

Gao, Lijun; Zhao, Xing; Yue, Xishen; Yue, Yawei; Wang, Xiaoqiang; Wu, Huanhuan; Zhang, Xuedong

doi:10.3390/app14156710

Open AccessArticle

A Lightweight YOLOv8 Model for Apple Leaf Disease Detection

by

Lijun Gao

¹

,

Xing Zhao

²,

Xishen Yue

¹,

Yawei Yue

¹,

Xiaoqiang Wang

¹,

Huanhuan Wu

^1,3,*

and

Xuedong Zhang

^1,3

¹

College of Information Engineering, Tarim University, Alar 843300, China

²

School of Information Science and Engineering, Xinjiang University of Science & Technology, Korla 841000, China

³

Key Laboratory of Tarim Oasis Agriculture, Ministry of Education, Tarim University, Alar 843300, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(15), 6710; https://doi.org/10.3390/app14156710

Submission received: 11 June 2024 / Revised: 23 July 2024 / Accepted: 30 July 2024 / Published: 1 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

China holds the top position globally in apple production and consumption. Detecting diseases during the planting process is crucial for increasing yields and promoting the rapid development of the apple industry. This study proposes a lightweight algorithm for apple leaf disease detection in natural environments, which is conducive to application on mobile and embedded devices. Our approach modifies the YOLOv8n framework to improve accuracy and efficiency. Key improvements include replacing conventional Conv layers with GhostConv and parts of the C2f structure with C3Ghost, reducing the model’s parameter count, and enhancing performance. Additionally, we integrate a Global attention mechanism (GAM) to improve lesion detection by more accurately identifying affected areas. An improved Bi-Directional Feature Pyramid Network (BiFPN) is also incorporated for better feature fusion, enabling more effective detection of small lesions in complex environments. Experimental results show a 32.9% reduction in computational complexity and a 39.7% reduction in model size to 3.8 M, with performance metrics improving by 3.4% to a [email protected] of 86.9%. Comparisons with popular models like YOLOv7-Tiny, YOLOv6, YOLOv5s, and YOLOv3-Tiny demonstrate that our YOLOv8n–GGi model offers superior detection accuracy, the smallest size, and the best overall performance for identifying critical apple diseases. It can serve as a guide for implementing real-time crop disease detection on mobile and embedded devices.

Keywords:

identification of apple leaf diseases; object detection; lightweight network; ghost networks; GAM; improved BiFPN; YOLOv8n

1. Introduction

Apples are one of the most popular fruits worldwide [1]. According to statistics from the China Apple Industry Association (www.chinaapple.org.cn), in 2022, global apple production reached 81.578 million tons, with China contributing 47.5718 million tons, which accounts for over 50% of the worldwide total. Apple cultivation is widespread, with China, the United States, and Poland as major producers [2]. However, apple leaf diseases significantly impact apple quality and yield [3,4]. These diseases can cause leaves to dry and fall off, hindering fruit development and flower bud formation, weakening the tree. The consequences of these diseases lead to decreased income for agricultural workers and hamper the apple industry’s growth. Detecting apple leaf diseases in a timely and efficient manner becomes crucial in improving the remuneration of farm workers and maintaining the industry’s development [5].

Traditionally, manual detection is the primary method of detecting apple leaf diseases and pests, requiring skilled researchers or agricultural workers to assess with the naked eye. This approach is relatively straightforward for handling small fruit trees but impractical for large-scale cultivation, potentially resulting in subjective biases and inadequate assessment [6]. Therefore, a fast and accurate method of detecting apple leaf pests and diseases is necessary. Visual computing and advanced machine learning have significantly boosted agriculture, with applications such as precision agriculture management, disease detection, and crop phenotyping [7,8]. Sharma et al. [9] achieved up to 93% accuracy in identifying plant diseases by using image processing techniques to analyze plant disease images. They combined the K-means clustering algorithm with neural networks for feature extraction and used Support Vector Machines (SVMs) as classifiers for disease identification and classification. Similarly, Al Bashish et al. [10] employed a range of supervised machine learning techniques, including Naive Bayes (NB), Decision Trees (DT), K-Nearest Neighbors (KNN), Support Vector Machines (SVMs), and Random Forest (RF) to detect maize diseases through plant images. They conducted a comparative analysis to determine which model achieved the highest prediction accuracy. Among all the models compared, the Random Forest algorithm performed optimally, with 79.23% accuracy, showing its efficiency in predicting maize diseases. Panigrahi et al. [11] successfully identified apple Alternaria leaf blotch with a detection accuracy of 99.1%, using 11 image features and an MLP pattern classifier. However, despite the remarkable results of these studies regarding recognition accuracy, some limitations and challenges still need to be addressed. Light, background, and weather variations in field conditions can significantly affect image quality and representation of disease characteristics, thereby reducing the model’s accuracy and reliability.

In recent years, convolutional neural networks (CNNs) have made impressive strides in various applications [12,13], surpassing traditional machine-learning techniques. A network model proposed by Ahmed et al. [14] employed two pre-trained CNNs, EfficientNetB0 and DenseNet121, to distinguish between healthy and diseased corn plant leaves. The deep features extracted from each CNN were combined using a joining technique to create a more complex feature set, resulting in a 98.56% classification accuracy. In another study, Mohanty et al. [15] employed a hybrid ResNet50 and VGG16 features model to detect corn leaf diseases. By utilizing migration learning and image enhancement techniques, they achieved 99.65% accuracy. Li et al. [16] proposed an integrated model that merges single-stage and two-stage target detection networks. Utilizing YOLO as its foundation, the single-stage network refines its internal structure. In contrast, the two-stage network, leveraging Faster-RCNN, integrates a clustering algorithm during candidate frame generation to enhance the detection of small targets. These models are integrated for inference tasks, and transfer learning accelerates model training. The model achieved a [email protected] of 85.2% for detecting 37 pests and 8 diseases. Xie et al. [17] introduced the faster DR–IACNN model, enhancing its feature-extraction capabilities through inception and inception-reset modules. The model achieves a [email protected] of 81.1%, albeit operating at only 15.01 FPS. Zhu et al. [18] utilized the AlexNet model to extract disease features from leaves with complex backgrounds using the parallel technique. Inflated convolution helped to reduce the model’s parameters, realizing a model volume of 5.87 MB and a recognition accuracy of 97.36%. These target detection models usually use complex and huge neural network structures requiring powerful computing resources, such as GPU and TPU [19,20]. However, due to hardware limitations, mobile and embedded devices may not perform well in terms of performance and responsiveness.

Current research into identifying apple leaf diseases is constrained by the numerous parameters in traditional target detection algorithm models and the compromised accuracy of lightweight models. To address these issues, the YOLOv8n–GGi algorithm has been proposed as follows:

(1): In the backbone network, GhostConv is substituted for the traditional Conv layer, and the C3Ghost module replaces the original C2f structure. It effectively realizes the lightweight of the network model and significantly reduces the number of parameters and computational burden of the model.
(2): A global attention mechanism (GAM) is implemented to enable the network to identify and localize regions of interest more precisely and improve the accuracy of detection.
(3): The enhanced weighted bi-directional feature pyramid network (BiFPN) has been designed to detect better small target spots on apple leaves in complex environments and improve detection capabilities in the field. We have successfully incorporated different weights for features at various scales and implemented cross-scale weight suppression and expression to enhance feature fusion and improve target detection capabilities.

2. Materials and Methods

2.1. Dataset

The experimental dataset comes from the AppleLeaf9 dataset (https://github.com/JasonYangCode/AppleLeaf9) accessed on 5 December 2023 in GitHub. This image dataset, which includes many common types of apple leaf diseases, is specifically used to study apple leaf disease detection [21]. Each image is stored in JPG format. Some apple leaf disease images are shown in Figure 1.

To enrich the dataset and further enhance the model’s generalization capability, we utilized various image enhancement techniques, including random rotation, color adjustment, noise addition, image sharpening, and Gaussian blurring, to simulate the effects of environmental changes like lighting and noise found in real-world scenarios. The number of images was expanded from 1564 to 6984. The original data and image enhanced example images are shown in Figure 2 and the number of apple leaf disease categories before and after data enhancement is shown in Table 1.

We used the YOLOv8n model to conduct experiments on the original and enhanced data under the same experimental environment and other parameters and trained it for 300 rounds. The experimental results are shown in Table 2. The results showed that the model’s accuracy increased from 79.6% to 83.1%, the recall rate increased from 73.5% to 77.9%, and the [email protected] increased from 81.3% to 83.5% after data enhancement. These improvements show that data enhancement effectively improves the model’s generalization ability, making it more accurate and reliable in handling apple leaf disease classification tasks.

2.2. YOLOv8 Network Structures

YOLOv8 is one of the latest versions of the YOLO (You Only Look Once) series of object recognition techniques [22,23]. Ultralytics has launched five versions of YOLOv8: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x [24]. The architecture of YOLOv8 is divided into three main components: the backbone, the neck, and the head. The backbone accepts input images and extracts feature inputs into the neck through a series of convolution and pooling operations [25]. The neck is used for feature fusion and scale transformation to process the features extracted by the backbone further and perform feature fusion, scale transformation, and other operations. The head is responsible for converting the feature mapping processed by the backbone and neck networks into the final detection result.

YOLOv8 is composed of several essential modules, each contributing to the overall detection process. The backbone network incorporates spatial pyramid pooling (SPPF), cross-stage partial network (C2f), convolutional layers (Conv), batch normalization (BN), and Sigmoid linear unit (SiLU). The neck section includes an up-sampling layer (Upsample), concatenation layer (Concat), and the C2f module. The head section is equipped with the detection layer (Detect), which outputs the detection results. Additionally, YOLOv8 features several specialized modules: the spatial pyramid pooling module (SPPF Module) integrates convolutional and maximum pooling layers; the convolution module (Conv Module) combines two-dimensional convolutional layers, batch normalization, and SiLU; the bottleneck module (Bottleneck Module) utilizes convolutional layers; the C2f module merges split layers, bottleneck layers, and concatenation layers; and the detection module (Detect Module) includes convolutional layers, two-dimensional convolutional layers, bounding box loss, and classification loss. These modules work in harmony to ensure that YOLOv8 strikes a balance between computational efficiency and detection accuracy. The architecture of YOLOv8 is illustrated in Figure 3.

Figure 4 illustrates the backbone network architecture comparison between YOLOv8 and YOLOv5.The general structure of the YOLOv8 backbone network shares a general structure with YOLOv5 [26] but diverges by not utilizing the Conv3 modules (C3) of YOLOv5. Instead, it adopts YOLOv7’s Efficient Layer Aggregation Network (ELAN) concept [27], integrating C3 with ELAN to form the CSPDarknet53, which leads to the second-stage feature pyramid network (C2f) module as its primary framework [28]. It allows for extracting feature maps with progressively increasing receptive fields through an expanded number of convolutional layers. The Feature Pyramid Network (FPN) [29] utilizes multiple scales to detect targets of varying sizes, while the YOLOv5 forms a bottom-up feature pyramid. This process enables the Feature Pyramid Network(FPN) [30] to provide robust semantic features from the higher levels to the lower levels, while the feature pyramid handles accurate location features from the lower levels to the higher levels. It integrates features from various backbone layers into different detection layers effectively. The detection head of YOLOv8 employs a multi-scale convolutional layer structure to detect targets of different sizes effectively. Its cross-scale connection and feature reuse mechanism facilitate the exchange and fusion of feature maps across various scales, enhancing the model’s detection performance.

2.3. Design for the YOLOv8–GGi Model

This study introduces an enhanced YOLOv8n–GGi model, an improvement of the YOLOv8n model. The enhancements include replacing the traditional structure with GhostConv [31] and C3Ghost modules, integrating the global attention mechanism [32], and adding the improved BiFPN structure [33], as illustrated in Figure 5. These modifications lighten the model structure and enhance the multi-scale fusion of features. This results in improved recognition of small apple targets and irregular lesions.

2.3.1. Backbone Network Optimization

Traditional convolutional neural networks typically comprise multiple convolutional modules that produce many redundant feature maps during image processing. This redundancy results in increased computational demands and a higher parameter count. Although existing lightweight neural networks, such as SqueezeNet and MobileNet [34], use smaller convolutional kernels to build CNNs, the convolutional layers still take up an extensive array of parameters. In response, Han et al. [35] proposed a lightweight network model, GhostNet [36,37], which uses the Ghost module to generate feature maps equivalent to the number of conventional convolutional layers. It then replaces them with the original convolutional layers, compressing the input feature maps through a nonlinear convolution operation. It compresses the input feature layer by nonlinear convolution operation, performs linear convolution operation, processes the feature maps layer by layer, generates another set of feature maps, and finally merges these two sets of feature maps to obtain a new feature map. This methodological approach effectively reduces the computational intricacy and the model’s parameter count while maintaining good performance. The convolution process of GhostNet is shown in Figure 6.

In a conventional convolutional neural network, given an input feature layer X ∈ R^{c × h × w}, where c is the input channel, h is the height, and w is the input data’s width, a typical convolutional layer that generates n feature maps is shown in Equation (1).

Y = X * f + b,

(1)

In convolutional operations, ∗ is the convolutional operation, b is the bias, X ∈ R^{c′×h′×w′} is the resulting feature map containing n channels, f ∈ R^c×k×k×n is the convolutional kernel in this layer, he and w′ are the height and width of the resulting data, respectively, and k × k is the kernel size of the convolutional kernel f. Based on the above, the FLOPs of an ordinary convolutional neural network can be calculated with Equation (2), from which it can be concluded that if the number of channels c and the number of convolutional kernels n are large, many FLOPs will exhaust the memory and processing capabilities of the mobile device.

F L O P s = n \times h^{’} \times w^{’} \times c \times k \times k,

(2)

The size of the input and output activation maps influences the number of optimized parameters, including the parameters in f (convolutional kernel) and b (bias). However, these output feature maps often exhibit significant redundancy, with many being similar to each other. The Ghost module effectively addresses this issue, as shown in Equations (3) and (4).

Y^{’} = X * f^{’},

(3)

Y ghost = Φ_{i, j} (y_{i}^{’}), j \in [1, s - 1],

(4)

where

y_{i}^{’}

represents the ith original feature map within

Φ_{I, j}

, and

Φ_{I, j}

denotes the jth linear operation responsible for producing the jth ghost feature map. Y′ ∈ R^{h′×w′×m} is the output feature mapping obtained by convolving the input feature layer X ∈ R^c×h×w using f′ ∈ R^c×k×k×m. The final output feature mapping Y is received by the concatenation operation, and the parameter count of GhostNet is defined in Equation (6), which, combined with Equation (2), results in a compression ratio of the model of about s, which significantly decreases the number of parameters.

Y = Y_{g host} + Y^{’},

(5)

F L O P s^{’} = \frac{n}{s} \times h ’ \times w ’ \times c \times k \times k + \frac{n}{s} \times h ’ \times w ’ \times d \times d (s - 1),

(6)

To tackle the issues of slow inference speed and the high parameter count usually associated with apple leaf disease recognition, we have introduced the GhostConv module into the core architecture of the original YOLOv8n model. Additionally, we have integrated the C3Ghost [30] module into the existing C2f module, aiming to optimize performance and efficiency. Illustrated in Figure 7, the C3Ghost module, inspired by the Ghost Bottleneck concept, enhances the feature extraction capabilities of the standard C3 module. By leveraging multiple GhostConv modules in succession, we enhance the ability to capture finer details within images, thereby boosting the accuracy of apple leaf disease identification. This approach improves performance and notably reduces the parameter count compared to conventional network models.

2.3.2. Global Attention Mechanism

The attention mechanism, a fundamental data algorithmic approach in machine learning, excels in natural language processing, image processing, and speech recognition [38]. Its ability to focus on specific regions significantly mitigates the impact of background noise and irrelevant objects, enhancing detection accuracy. We leverage the Global attention mechanism (GAM) [39] in the YOLOv8n backbone network to address challenges posed by spotted leaf drop disease and rust in apples, characterized by irregular shapes and subtle features. Illustrated in Figure 8, GAM amplifies the global interactions within the network, thereby enhancing the detection and differentiation of small, easily overlooked targets. By enhancing the feature extraction capabilities and feature integration techniques, GAM ensures precise focus on relevant areas, substantially boosting overall detection performance.

The global attention mechanism (GAM) is an innovative attention module proposed after the convolutional block attention module (CBAM) [40]. GAM combines channel attention and spatial attention, which can reduce the dispersion of information while enhancing the interaction characteristics of the global dimension. By introducing GAM, the model can better capture the key information in the image, thereby achieving more accurate disease identification under complex background and lighting conditions [41]. GAM consists of spatial and channel attention, operating sequentially from channel to space. In preserving the channel information in the input features, M_C arranges the dimensions of the original features from C × W × H to W × H × C and derives the channel attention coefficients M_C(F₁) by structuring the MLP in two consecutive layers which are then activated by a sigmoid function. In extracting the spatial information of the spatial attention features, two 7 × 7 convolutions are used firstly to achieve spatial information fusion; secondly, the channel dimensions are used for compression, and the pooling layers are removed. Then, the spatial attention coefficient M_C (F₂) is finally obtained by sigmoid function activation. The GAM is shown in Equations (7) and (8).

F_{2} = M_{C} (F_{1}) \otimes F_{1},

(7)

F_{3} = M_{S} (F_{2}) \otimes F_{2},

(8)

F₁ is the given input image, F₂ is the intermediate map feature, F₃ is the output image, M_C is the channel attention mechanism, and M_S denotes the spatial attention mechanism. ⨂ is the multiplication operation by element.

2.3.3. BiFPN Feature Fusion Network

In training models to recognize apple leaf diseases, leaf surface lesions’ varying shapes and sizes result in feature representations of differing resolutions [42]. Combining these features using a traditional linear superposition approach within PANet can cause disproportionate emphasis on features from more prominent leaf spots in the aggregated output. Such an imbalance often results in the prominence of more extensive lesions, overshadowing the finer details of smaller lesions, and may lead to missed detections of these less conspicuous targets. To mitigate this risk, it is essential to safeguard against the loss of high-resolution, shallow features, as they hold the key details necessary for identifying small-scale anomalies on the leaf surface. Figure 9 shows the structures of FPN and PANet in YOLOv8.

To address the morphological diversity observed among various diseases at different stages of development, as well as the size and shape variations within diseases of the same category, we introduce the BiFPN structure instead of the original PAFPN structure [43] in the original multiscale feature fusion network of YOLOv8 and improve the BiFPN structure based on this to increase the size of the P2 large-size feature layer to increase the feature fusion capability of the apple tiny lesion vision target. The improved BiFPN structure is shown in Figure 10.

During the feature fusion process, the input features have different resolutions, so their influence on the output features varies. Apple leaf spots usually have a variety of irregular shapes. BiFPN balances the weights of different features by using a fast normalized fusion module, which helps to dig deeper into the information of small target spots on apple leaves, thus reducing the phenomena of misdetection and omission caused by the complexity of the environment. Equation (9) gives the connection between the inputs and the outputs.

O = \sum_{i} \frac{ω_{i}}{ϵ + \sum_{j} ω_{i}} I_{i},

(9)

where ωi denotes the corresponding learning weights of the input feature Ii. These weights are adjusted using the ReLU activation function to ensure that the value of ωi is non-negative. The initial learning rate is set to 0.0001 to prevent unstable values, and the normalized weights are controlled to decrease in the range of 0 to 1.

P_{4}^{t h} = C o n v (\frac{ω_{1} {\cdot P}_{4}^{i n} + ω_{2} \cdot R e s i z e (P_{5}^{i n})}{ω_{1} + ω_{2} + ε}),

(10)

P_{4}^{o u t} = C o n v (\frac{ω_{1}^{’} {\cdot P}_{4}^{i n} + ω_{2}^{’} {\cdot P}_{4}^{t d} + ω_{3}^{’} \cdot R e s i z e (P_{3}^{o u t})}{ω_{1}^{’} + ω_{2}^{’} + ω_{3}^{’} + ϵ}),

(11)

Using the feature layer P₄ as an example, Resize represents it as maybe the up-sampling or down-sampling operation. This method ensures that the different input features are weighted according to their significance during the fusion process to effectively extract deep information about small target lesions on apple leaves and minimize misdetections or omissions caused by environmental factors.

2.4. Comprehensive Test Platform and Training Evaluation

The experimental phase of this study was conducted from October 2023 to April 2024 at the Oasis Laboratory of Tarim University, with the experimental environment configuration detailed in Table 3. Optimization during the training phase utilized stochastic gradient descent (SGD), with an initial learning rate of 0.01, learning rate momentum of 0.937, weight decay coefficient of 0.0005, batch size of 32, and was carried out over 300 training epochs.

To enhance model selection, this study utilizes a comprehensive suite of evaluation metrics: precision (P), recall (R), mean average precision (mAP), floating points per second (FLOPs), and number of parameters (Parameters).

P stands for Precision, representing the ratio of correctly identified positive outcomes to all positive classifications. TP refers to the number of accurately detected apple leaf diseases, while FP indicates the number of false detections.

P = \frac{T P}{T P + F P} \times 100 %,

(12)

R stands for Recall, which indicates the fraction of correctly identified positive results compared to the total number of true positive instances. FN refers to instances where the model incorrectly predicts positive samples as negative.

R = \frac{T P}{T P + F N} \times 100 %,

(13)

M represents the number of categories detected by the model (In this study, four apple leaf diseases were selected, resulting in M = 4). Average precision (AP) is an important metric to evaluate the effectiveness of a trained neural network model on a single category, reflecting the model’s ability to identify that category. Mean average precision (mAP) is an important indicator to evaluate the overall performance level of the trained network model on all categories. It is the statistical average of the average precision values of all categories. [email protected] is the mean average accuracy calculated at an IoU threshold of 0.5, which is also a common benchmark for evaluating the performance of object detection models.

A P = \int_{0}^{1} P (R) d R,

(14)

m A P = \frac{\sum_{i = 1}^{M} A P_{i}}{M},

(15)

Floating points per second (FLOPs) are used to describe the computational complexity or execution speed of the neural network model, with higher values indicating a more significant amount of computation required by the model. N represents the count of input feature maps, H and W represent the height and width of the feature maps, respectively, C indicates the number of channels in the feature maps, K denotes the number of convolutional kernels, R stands for the height of the convolutional kernels, and S represents the width of the convolutional kernels.

F L O P_{S} = N \times H \times W \times C \times K \times R \times S,

(16)

The number of parameters (Parameters) indicates the number of learnable parameters in the neural network model, including weights and biases.

P a r a m e t e r s = K \times (C \times R \times S + 1)

(17)

3. Results

3.1. Enhancing Model Detection Efficiency

To assess the effectiveness of the proposed YOLOv8n–GGi method in identifying different apple leaf diseases (Alternaria leaf spot, rust, grey leaf spot, and frog leaf disease), A comparative analysis was performed with the original YOLOv8n model. The specific results can be found in Table 4.

The findings from the table above indicate that our method improved significantly in detecting various diseases on apple leaves. Specifically, the mean average precision calculated at an IoU threshold of 0.5 ([email protected]) scores for identifying Alternaria leaf spots, rust, grey spots, and frog eye leaf spots increased by 2.4%, 3.1%, 6.2%, and 1.1%, respectively. It resulted in scores of 88.5%, 86.8%, 84.5%, and 87.7%, outperforming the YOLOv8n model. Overall, [email protected] improved by 3.5% compared to the original model. These results underscore the effectiveness of our method in accurately recognizing various apple leaf diseases.

The confusion matrix serves as a method to assess the effectiveness of classification models. It displays the relationship between the predicted classes by the model and the actual classes. In the confusion matrix, each column corresponds to the class predicted by the model, with the total count indicating the number of predictions for that class by the model. Each row corresponds to the actual class of the data, with the sum in each row indicating how many data instances belong to that class. A comparison of the confusion matrix before and after the improvement is shown in Figure 11.

The recognition of grey and frog eye leaf spots was improved by 4.0% and 2.0%, respectively, over the original model. Notably, the recognition of spotted leaf drop disease was enhanced by 11.0%. These improvements can be attributed to the implementation of the GAM attention mechanism and the improved BiFPN, which collectively enhanced the model’s ability to predict different classes of diseases accurately. The experiments demonstrate that the improved model correctly predicts most samples.

Figure 12 shows the model’s performance before and after the improvement. Before the improvement, the bounding box loss (box_loss) of training and verification dropped to about 1.5, and after the improvement, it dropped to about 1.2; classification loss (cls_loss) dropped from 1.5 to 1.2; and defocus loss (dfl_loss) dropped from 1.2 to 1.0. The improved model has obvious improvements in accuracy, recall, and average precision, showing better convergence and generalization capabilities.

Figure 13 illustrates the comparison plots for precision, recall, mean average precision at IoU 0.5 ([email protected]), and mean average precision at IoU from 0.5 to 0.95 ([email protected]:0.95) before and after the method enhancement. The YOLOv8n–GGi model is better than the YOLOv8n model in four indicators: precision, recall, [email protected], and [email protected]:0.95. The YOLOv8n–GGi model performed well in recognition accuracy and disease sample detection capabilities. The effectiveness and superiority of the improved method were verified, and it is suitable for practical application in apple leaf disease detection.

Figure 14 shows our proposed model’s detection performance on a test set to highlight its benefits. The first through fourth columns show the results of tests for Alternaria leaf spot, rust, grey spot, and frog eye leaf spot. In real-world environments, most apple leaf spots are very dense and small, making it crucial to accurately detect diseased leaves in such dense areas. In experiments, the model showed relatively high confidence in predictions due to the obvious characteristics of Alternaria leaf spot. Grey leaf spots, rust, and frog eye leaf spots usually appear as smaller and denser lesions. The lesion was still successfully identified even in the presence of multiple predicted bounding boxes with low confidence. From the detection effect, the improved model reduced missed detections and improved the detection of apple leaf edge diseases, successfully identifying various diseases.

3.2. Model and Algorithm Test

To further assess the YOLOv8n–GGi model’s effectiveness, an ablation experiment was performed using a modified version of YOLO v8n, as detailed in Table 5.

The outcomes of the experiments demonstrate that the proposed enhancements in this paper significantly improve detection precision while reducing both the number and size of model parameters. After integrating the GhostNet module, there was a 6.1% reduction in FLOPs, a 12.5% decrease in the model’s parameter count, and a 14.3% reduction in weight size. There was a slight decline in accuracy and recall. Further inclusion of the C3Ghost module led to additional decreases in precision and recall. However, this was accompanied by reductions in computational complexity and parameter count. Subsequent addition of the GAM module resulted in a precision increase from 80.0% to 84.9%, with minimal changes in computational complexity and parameter count. Finally, introducing the improved BiFPN module saw precision rise from 84.9% to 86.3%, representing a 1.4% increase, and recall from 77.9% to 80.2%, a 2.3% increase. The [email protected] surged from 83.4% to 86.9%, a 3.5% increase. Additionally, FLOPs decreased by 17.9%, the parameter count dropped by 29.2%, and the model size shrank to just 3.8MB, showcasing significant enhancements in both detection capabilities and the model’s light weight.

To visually display the main regional differences that the model focuses on in the detection process before and after the improvement, we randomly selected four apple leaf disease images for heat map visualization, as shown in Figure 15. Darker colors indicate more crucial regions to the model in the heat map. Comparing the heat maps from YOLOV8n and YOLOv8n–GGi reveals that the enhanced YOLOv8n–GGi model significantly boosts detection capabilities. The heat map of YOLOv8n–GGi exhibits more precise and focused regions of interest, highlighting its superior performance in identifying lesion areas. While the YOLOV8n heat map also points out the same regions, its markings are less distinct and blurrier than those of the YOLOv8n–GGi model. This indicates that our method can accurately identify apple lesions and effectively reduce the interference from intricate background noise on the model.

3.3. Evaluation of Various Target Detection Algorithms

Table 6 compares various target detection models based on several performance metrics, including precision, recall, mean average precision at an IoU threshold of 0.5 ([email protected]%), floating point operations per second (FLOPs), the number of parameters, and model weight size. Our model demonstrates substantial improvements across precision and recall rates compared to YOLOv7-Tiny, YOLOv6, YOLOv5s, and YOLOv3-Tiny. Specifically, we achieve a 11.9%, 13.1%, 8.4%, and 14.2% increase in precision and a 5.8%, 4.3%, 6.4%, and 8.9% increase in recall, respectively, over these models. Moreover, our model achieves these performance gains while significantly reducing computational demands. It operates with only 5.5 G FLOPs, which is notably lower compared to YOLOv7-Tiny by 13.2 G, YOLOv6 by 11.8 G, YOLOv5 s by 23.8 G, and YOLOv3-Tiny by 18.9 G. This efficiency makes our model highly suitable for real-time applications and resource-constrained environments. YOLOv10 has significant improvements from YOLOv8 in terms of accuracy, inference speed, and the model’s light weight. Although our model is 1.8% lower in accuracy and 5.0% lower in recall than YOLOv10, our model has about one-fourth of the floating-point operations of YOLOv10. The model size is much smaller than other models. These results show that our model significantly reduces performance while maintaining high accuracy by introducing GhostConv and C2fGhost into the backbone network, introducing GAM (Ghost Attention Mechanism), and adding improved BiFPN (Bidirectional Feature Pyramid Network) to optimize feature fusion. It reduces the computational cost and model complexity and is suitable for realizing efficient real-time target detection tasks in resource-limited hardware environments.

4. Discussion

The primary motivations for enhancing the model to boost detection performance include the following:

(1): Lightweight improvement: This study introduces GhostConv to replace Conv in the original YOLOv8n structure, forming the main part of the backbone. Additionally, we utilize the C3Ghost module to replace the original C2f structure. This replacement results in a 22.2% reduction in module size, a 3.1% decrease in precision, a 24.4% decrease in floating-point computation, a 28.1% reduction in the number of network model parameters, and a 4.8% reduction in recall. However, the mean average precision (mAP) at the IoU threshold of 0.5 is only reduced by 2.8%. The findings from the experiments indicate that the advanced network architecture, based on the Ghost module, significantly mitigates the parameter count and complexity of the model. This optimization prioritizes enhanced speed at the expense of sacrificing precision and recall. Nevertheless, the refined model maintains relatively efficient detection performance while embodying a streamlined framework.
(2): Introducing the attention mechanism: Implementing lightweight improvements significantly reduces the size of the algorithmic model. However, we introduce the global attention mechanism (GAM) into the backbone network to address the decline in detection performance resulting from insufficient effective feature extraction. This addition results in noteworthy enhancements in precision, recall, and [email protected], with improvements of 4.9%, 4.8%, and 2.7%, respectively, compared to YOLOv8–GhostNet. Despite a 3.9% increase in model size, the computational overhead remains minimal. This efficiency is attributed to the coordinate attention mechanism, which captures cross-channel relationships and direction-aware and location-aware information. Introducing the global attention mechanism improves the model’s emphasis on pertinent features, improving its performance across various tasks. By dynamically adjusting attention weights, the global attention mechanism (GAM) adapts to different scenarios, enhancing the model’s robustness and performance while maintaining low computational costs.
(3): Adding the improved BiFPN bidirectional feature pyramid network: This study addresses the challenge posed by the morphological diversity of diseases at various developmental stages and the variations in shape and size within the same disease class. We investigate the necessity of feature fusion at different scales to tackle this issue. To this end, we replaced the original PAFPN with an improved BiFPN in the multi-scale feature fusion network of YOLOv8 while augmenting the P2 large-scale feature layer. Our approach yielded notable improvements compared to YOLOv8–GhostNet–C3Ghost–GAM, including a 1.4% increase in precision, a 2.3% increase in recall, and a 3.5% increase in the mean average precision calculated at an IoU threshold of 0.5 ([email protected]). We also reduced the model size by 25.5% and FLOPs by 17.9%. Notably, our method demonstrated significant enhancements in fusing features from distant targets, such as small lesions on apple trees, as evidenced by the substantial results observed on the test map images.

In the future, we will apply the improved YOLOv8–GGi model to experiments on crops such as pear trees and cotton. Techniques such as pruning and distillation will further reduce the model’s parameters and computational overhead, and a lightweight accelerated apple leaf disease detection system will be deployed on embedded devices such as the Raspberry Pi and realized in practice.

5. Conclusions

The proposed improved YOLOv8n–GGi lightweight algorithm for apple leaf disease recognition introduces GhostConv and C3Ghost modules to replace traditional structures. Additionally, it incorporates a GAM attention mechanism and adds an improved BiFPN structure. These enhancements result in a lighter model with enhanced performance. The improved model achieves a precision of 86.3%, a recall of 80.2%, and a mAP@50% of 86.9% on the test dataset. Compared to the original models, the improved YOLOv8–GGi demonstrates notable improvements, with a 3.3% increase in precision, a 2.3% increase in recall, and a 3.4% increase in mAP@50%. Moreover, the weight of the improved model is 3.8 MB, representing a 39.7% reduction compared to the original model. These enhancements contribute to higher detection accuracy, lower computational complexity, and smaller model weights than other mainstream algorithms. Therefore, deploying the improved algorithm in agricultural production, particularly on resource-constrained devices, holds significant potential.

Author Contributions

Conceptualization, H.W., X.Z. (Xuedong Zhang)and L.G.; data curation, X.Z. (Xing Zhao); formal analysis, L.G. and X.Z. (Xing Zhao).; funding acquisition, H.W. and X.Z. (Xuedong Zhang); investigation, X.Y., Y.Y. and X.W.; methodology, L.G. and H.W.; project administration, L.G., Y.Y., X.W. and H.W.; software, X.Z. (Xing Zhao); supervision, X.Y. and X.W.; validation, L.G., X.Z. (Xing Zhao), X.Y. and Y.Y.; visualization, X.Y.; writing—original draft, L.G. writing—review and editing, L.G. and H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from the Corps Financial Science and Technology Program Project South Xinjiang Key Industry Innovation Development Support Program (No. 2022DB005).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original survey data presented in the study is publicly available on GitHub at https://github.com/Lijun-Gao/An-Improved-Lightweight-YOLOv8-Model-for-Apple-Leaf-Disease-Detection.git, accessed on 22 July 2024. The data presented in this study are available upon request from the first author.

Acknowledgments

The authors acknowledge their mentors’ excellent guidance and colleagues’ support.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Jiang, P.; Chen, Y.; Liu, B.; He, D.; Liang, C. Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access 2019, 7, 59069–59080. [Google Scholar] [CrossRef]
O’Rourke, D. Economic importance of the world apple industry. Apple Genome 2021, 1–18. [Google Scholar] [CrossRef]
Bonkra, A.; Pathak, S.; Kaur, A.; Shah, M.A. Exploring the trend of recognizing apple leaf disease detection through machine learning: A comprehensive analysis using bibliometric techniques. Artif. Intell. Rev. 2024, 57, 21. [Google Scholar] [CrossRef]
Bansal, P.; Kumar, R.; Kumar, S. Disease detection in apple leaves using deep convolutional neural network. Agriculture 2021, 11, 617. [Google Scholar] [CrossRef]
Dhaka, V.S.; Meena, S.V.; Rani, G.; Sinwar, D.; Ijaz, M.F.; Woźniak, M. A survey of deep convolutional neural networks applied for prediction of plant leaf diseases. Sensors 2021, 21, 4749. [Google Scholar] [CrossRef] [PubMed]
Yağ, İ.; Altan, A. Artificial intelligence-based robust hybrid algorithm design and implementation for real-time detection of plant diseases in agricultural environments. Biology 2022, 11, 1732. [Google Scholar] [CrossRef] [PubMed]
Kaur, A.; Kukreja, V.; Aggarwal, P.; Thapliyal, S.; Sharma, R. Amplifying Apple Mosaic Illness Detection: Combining CNN and Random Forest Models. In Proceedings of the 2024 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), Gwalior, India, 14–16 March 2024; pp. 1–5. [Google Scholar] [CrossRef]
Zambon, I.; Cecchini, M.; Egidi, G.; Saporito, M.G.; Colantoni, A. Revolution 4.0: Industry vs. agriculture in a future development for SMEs. Processes 2019, 7, 36. [Google Scholar] [CrossRef]
Sharma, A.; Jain, A.; Gupta, P.; Chowdary, V. Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 2020, 9, 4843–4873. [Google Scholar] [CrossRef]
Al Bashish, D.; Braik, M.; Bani-Ahmad, S. Detection and classification of leaf diseases using K-means-based segmentation and. Inf. Technol. J. 2011, 10, 267–275. [Google Scholar] [CrossRef]
Panigrahi, K.P.; Das, H.; Sahoo, A.K.; Moharana, S.C. Maize Leaf Disease Detection and Classification Using Machine Learning Algorithms. In Progress in Computing, Analytics and Networking: Proceedings of ICCAN 2019; Springer: Singapore, 2020. [Google Scholar] [CrossRef]
Jan, M.; Ahmad, H. Image features based intelligent apple disease prediction system: Machine learning based apple disease prediction system. Int. J. Agric. Environ. Inf. Syst. (IJAEIS) 2020, 11, 31–47. [Google Scholar] [CrossRef]
Zhang, K.; Ying, H.; Dai, H.-N.; Li, L.; Peng, Y.; Guo, K.; Yu, H. Compacting deep neural networks for Internet of Things: Methods and applications. IEEE Internet Things J. 2021, 8, 11935–11959. [Google Scholar] [CrossRef]
Ahmed, S.R.; Sonuç, E.; Ahmed, M.R.; Duru, A.D. Analysis survey on deepfake detection and recognition with convolutional neural networks. In Proceedings of the 2022 International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 9–11 June 2022; pp. 1–7. [Google Scholar] [CrossRef]
Mohanty, S.N.; Ghosh, H.; Rahat, I.S.; Reddy, C.V.R. Advanced Deep Learning Models for Corn Leaf Disease Classification: A Field Study in Bangladesh. Eng. Proc. 2023, 59, 69. [Google Scholar] [CrossRef]
Li, M.; Cheng, S.; Cui, J.; Li, C.; Li, Z.; Zhou, C.; Lv, C. High-performance plant pest and disease detection based on model ensemble with inception module and cluster algorithm. Plants 2023, 12, 200. [Google Scholar] [CrossRef] [PubMed]
Xie, X.; Ma, Y.; Liu, B.; He, J.; Li, S.; Wang, H. A deep-learning-based real-time detector for grape leaf diseases using improved convolutional neural networks. Front. Plant Sci. 2020, 11, 751. [Google Scholar] [CrossRef] [PubMed]
Zhu, L.; Li, Z.; Li, C.; Wu, J.; Yue, J. High performance vegetable classification from images based on alexnet deep learning model. Int. J. Agric. Biol. Eng. 2018, 11, 217–223. [Google Scholar] [CrossRef]
Wu, H.; Hua, Y.; Zou, H.; Ke, G. A lightweight network for vehicle detection based on embedded system. J. Supercomput. 2022, 78, 18209–18224. [Google Scholar] [CrossRef]
Chen, Y.; Xie, Y.; Song, L.; Chen, F.; Tang, T. A survey of accelerator architectures for deep neural networks. Engineering 2020, 6, 264–274. [Google Scholar] [CrossRef]
Yang, Q.; Duan, S.; Wang, L. Efficient identification of apple leaf diseases in the wild using convolutional neural networks. Agronomy 2022, 12, 2784. [Google Scholar] [CrossRef]
Yue, X.; Qi, K.; Na, X.; Zhang, Y.; Liu, Y.; Liu, C. Improved YOLOv8-Seg Network for Instance Segmentation of Healthy and Diseased Tomato Plants in the Growth Stage. Agriculture 2023, 13, 1643. [Google Scholar] [CrossRef]
Lou, H.; Duan, X.; Guo, J.; Liu, H.; Gu, J.; Bi, L.; Chen, H. DC-YOLOv8: Small-size object detection algorithm based on camera sensor. Electronics 2023, 12, 2323. [Google Scholar] [CrossRef]
Terven, J.; Córdova-Esparza, D.-M.; Romero-González, J.-A. A comprehensive review of yolo architectures in computer vision: From yolov1 to yolov8 and yolo-nas. Mach. Learn. Knowl. Extr. 2023, 5, 1680–1716. [Google Scholar] [CrossRef]
Hussain, M. YOLO-v1 to YOLO-v8, the rise of YOLO and its complementary nature toward digital manufacturing and industrial defect detection. Machines 2023, 11, 677. [Google Scholar] [CrossRef]
Kim, J.-H.; Kim, N.; Park, Y.W.; Won, C.S. Object detection and classification based on YOLO-V5 with improved maritime dataset. J. Mar. Sci. Eng. 2022, 10, 377. [Google Scholar] [CrossRef]
Jiang, K.; Xie, T.; Yan, R.; Wen, X.; Li, D.; Jiang, H.; Jiang, N.; Feng, L.; Duan, X.; Wang, J. An attention mechanism-improved YOLOv7 object detection algorithm for hemp duck count estimation. Agriculture 2022, 12, 1659. [Google Scholar] [CrossRef]
Liu, C.; Wen, J.; Huang, J.; Lin, W.; Wu, B.; Xie, N.; Zou, T. Lightweight Underwater Object Detection Algorithm for Embedded Deployment Using Higher-Order Information and Image Enhancement. J. Mar. Sci. Eng. 2024, 12, 506. [Google Scholar] [CrossRef]
Guo, C.; Fan, B.; Zhang, Q.; Xiang, S.; Pan, C. Augfpn: Improving multi-scale feature learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 12595–12604. [Google Scholar] [CrossRef]
Wang, G.; Chen, Y.; An, P.; Hong, H.; Hu, J.; Huang, T. UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios. Sensors 2023, 23, 7190. [Google Scholar] [CrossRef] [PubMed]
Cao, J.; Bao, W.; Shang, H.; Yuan, M.; Cheng, Q. GCL-YOLO: A GhostConv-based lightweight yolo network for UAV small object detection. Remote Sens. 2023, 15, 4932. [Google Scholar] [CrossRef]
Liu, Y.; Shao, Z.; Hoffmann, N. Global attention mechanism: Retain information to enhance channel-spatial interactions. arXiv 2021, arXiv:2112.05561. [Google Scholar] [CrossRef]
He, L.; Wei, H.; Wang, Q. A New Target Detection Method of Ferrography Wear Particle Images Based on ECAM-YOLOv5-BiFPN Network. Sensors 2023, 23, 6477. [Google Scholar] [CrossRef]
Chen, Y.; Dai, X.; Chen, D.; Liu, M.; Dong, X.; Yuan, L.; Liu, Z. Mobile-former: Bridging mobilenet and transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5270–5279. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1580–1589. [Google Scholar] [CrossRef]
Xiao, X.; Feng, X. Multi-object pedestrian tracking using improved YOLOv8 and OC-SORT. Sensors 2023, 23, 8439. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Xu, C.; Guo, J.; Xu, C.; Wu, E.; Tian, Q. GhostNets on heterogeneous devices via cheap operations. Int. J. Comput. Vis. 2022, 130, 1050–1069. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Liu, Z.; Li, L.; Fang, X.; Qi, W.; Shen, J.; Zhou, H.; Zhang, Y. Hard-rock tunnel lithology prediction with TBM construction big data using a global-attention-mechanism-based LSTM network. Autom. Constr. 2021, 125, 103647. [Google Scholar] [CrossRef]
Ni, Y.-H.; Wang, H.; Mao, J.-X.; Xi, Z.; Chen, Z.-Y. Quantitative detection of typical bridge surface damages based on global attention mechanism and YOLOv7 network. Struct. Health Monit. 2024, 14759217241246953. [Google Scholar] [CrossRef]
Yang, Y.; Jiao, G.; Liu, J.; Zhao, W.; Zheng, J. A lightweight rice disease identification network based on attention mechanism and dynamic convolution. Ecol. Inform. 2023, 78, 102320. [Google Scholar] [CrossRef]
Chao, X.; Sun, G.; Zhao, H.; Li, M.; He, D. Identification of apple tree leaf diseases based on deep learning models. Symmetry 2020, 12, 1065. [Google Scholar] [CrossRef]
Zhou, L.; Liu, Z.; Zhao, H.; Hou, Y.-E.; Liu, Y.; Zuo, X.; Dang, L. A Multi-Scale Object Detector Based on Coordinate and Global Information Aggregation for UAV Aerial Images. Remote Sens. 2023, 15, 3468. [Google Scholar] [CrossRef]

Figure 1. Examples of apple leaf disease images: (a) Alternaria leaf spot, (b) Rust, (c) Grey spot, (d) Frogeye leaf spot.

Figure 2. Example of an image following enhancement of raw data and imagery. (a) Original figure, (b) random rotation, (c) color adjustment, (d) noise addition, (e) image sharpening, (f) Gaussian blurring.

Figure 3. Original YOLOv8 network architecture.

Figure 4. Comparison of YOLOv5 and YOLOv8 neck structures: (a) YOLOv5 and (b) YOLOv8.

Figure 5. The architecture of the proposed YOLOv8n–GGi model.

Figure 6. Convolution process of GhostNet.

Figure 7. GhostNet in the YOLOv8 architecture: (a) GhostConv and (b) C3Ghost.

Figure 8. Global Attention Mechanism (GAM); M_C is channel attention, and M_S is spatial attention.

Figure 9. FPN and PANet structure in YOLOv8; P3~P7 are the input feature maps of different layers: (a) FPN and (b) PANet.

Figure 10. Improved BiFPN structure: (a) original BiFPN structure and (b) improved BiFPN structure.

Figure 11. Comparison of confusion matrices before and after improvement: (a) YOLOv8n and (b) YOLOv8n–GGi.

Figure 12. The performance outcomes of the YOLOv8n model and YOLOv8n–GGi model: (a) YOLOv8n and (b) YOLOv8n–GGi.

Figure 13. Comparison of indices before and after improvement: (a) precision, (b) recall, (c) [email protected], and (d) [email protected]:0.95.

Figure 14. Test results of YOLOV8n and YOLOv8n–GGi: (a) original image, (b) YOLOv8n detection results, and (c) YOLOv8n–GGi detection results.

Figure 15. Heat map of YOLOV8n and YOLOv8n–GGi: (a) original image and (b) YOLOV8n, (c) YOLOv8n–GGi.

Table 1. Number of apple leaf disease categories before and after data augmentation.

Apple Disease Category	Original Images (Sheets)	Enhanced Images (Sheets)	Enhanced Image (Sheets)
Apple Disease Category	Original Images (Sheets)	Enhanced Images (Sheets)	Training Set	Test Set	Validation Set
Alternaria leaf spot	391	1746	1222	349	175
Rust	390	1740	1218	348	174
Grey spot	391	1746	1222	349	175
Frogeye leaf spot	392	1752	1226	350	176
All diseases	1564	6984	4888	1396	700

Table 2. Comparison of generalization performance before and after data enhancement using YOLOv8n.

Apple Leaf Disease Dataset	Precision (%)	Recall (%)	[email protected] (%)
Original dataset	79.6	73.5	81.3
Enhanced dataset	83.1	77.9	83.5

Table 3. Configuration of experimental parameters.

Experimental Environment	Experimental Configuration
Operating System	Windows 10.0.17134.1
CPU	Intel Xeon W-2223 CPU @ 3.60 GHz (Intel Corporation, Santa Clara, CA, USA)
GPU	NVIDIA GeForce RTX 3080 (Intel Corporation, Santa Clara, CA, USA)
CUDA version	CUDA 11.7
Deep learning framework	Pytorch 1.10.0
Compilation Language	Python 3.8

Table 4. Comparison of the mean average precision results for different target classes.

Apple Leaf Disease Category	[email protected] (%)
Apple Leaf Disease Category	YOLOv8n–GGi	YOLOV8n
Alternaria leaf spot	88.5	86.1
Rus	86.8	83.1
Grey spot	84.5	78
Frogeye leaf spot	87.7	86.1
All diseases	86.9	83.4

Table 5. Ablation experiment results.

Model	Precision (%)	Recall (%)	[email protected] (%)	FLOPs (G)	Parameters (M)	Weight Size (MB)
YOLOv8n	83.1	77.9	83.5	8.2	3.2	6.3
YOLOv8n–GhostNet	80.5	75.0	82.1	7.7	2.8	5.4
YOLOv8n–GhostNet–C3Ghost	80.0	73.1	80.7	6.2	2.3	4.9
YOLOv8n–GhostNet–GAM	84.9	77.9	83.4	6.7	2.4	5.1
YOLOv8n–GhostNet–GAM-improved BiFPN	86.3	80.2	86.9	5.5	1.7	3.8

Table 6. Comparative experiments with different models.

Model	Precision (%)	Recall (%)	[email protected] (%)	FLOPs (G)	Parameters (M)	Weight Size (MB)
YOLOv3-Tiny	72.1	71.3	76.4	18.9	12.1	24.4
YOLOv5s	77.9	73.8	78.8	23.8	9.1	18.5
YOLOv6	73.2	75.9	79.7	11.8	4.2	8.7
YOLOv7-Tiny	74.4	74.4	78.0	13.2	6.0	12.3
YOLOv8n	83.1	77.9	83.5	8.2	3.2	6.3
YOLOv10s	88.1	85.2	88.7	21.3	7.1	14.1
Ours	86.3	80.2	86.9	5.5	1.7	3.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, L.; Zhao, X.; Yue, X.; Yue, Y.; Wang, X.; Wu, H.; Zhang, X. A Lightweight YOLOv8 Model for Apple Leaf Disease Detection. Appl. Sci. 2024, 14, 6710. https://doi.org/10.3390/app14156710

AMA Style

Gao L, Zhao X, Yue X, Yue Y, Wang X, Wu H, Zhang X. A Lightweight YOLOv8 Model for Apple Leaf Disease Detection. Applied Sciences. 2024; 14(15):6710. https://doi.org/10.3390/app14156710

Chicago/Turabian Style

Gao, Lijun, Xing Zhao, Xishen Yue, Yawei Yue, Xiaoqiang Wang, Huanhuan Wu, and Xuedong Zhang. 2024. "A Lightweight YOLOv8 Model for Apple Leaf Disease Detection" Applied Sciences 14, no. 15: 6710. https://doi.org/10.3390/app14156710

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Lightweight YOLOv8 Model for Apple Leaf Disease Detection

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. YOLOv8 Network Structures

2.3. Design for the YOLOv8–GGi Model

2.3.1. Backbone Network Optimization

2.3.2. Global Attention Mechanism

2.3.3. BiFPN Feature Fusion Network

2.4. Comprehensive Test Platform and Training Evaluation

3. Results

3.1. Enhancing Model Detection Efficiency

3.2. Model and Algorithm Test

3.3. Evaluation of Various Target Detection Algorithms

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI