EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment

Sun, Jihong; Li, Zhaowen; Li, Fusheng; Shen, Yingming; Qian, Ye; Li, Tong

doi:10.3390/agronomy14092099

Open AccessArticle

EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment

by

Jihong Sun

^1,2,

Zhaowen Li

^2,3,

Fusheng Li

^1,2,

Yingming Shen

⁴,

Ye Qian

^2,3,* and

Tong Li

^2,3,*

¹

College of Agronomy and Biotechnology, Yunnan Agricultural University, Kunming 650201, China

²

The Key Laboratory for Crop Production and Smart Agriculture of Yunnan Province, Yunnan Agricultural University, Kunming 650201, China

³

Big Data School, Yunnan Agricultural University, Kunming 650201, China

⁴

International Cooperation Office, Yunnan Provincial Academy of Science and Technology, Kunming 650201, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2024, 14(9), 2099; https://doi.org/10.3390/agronomy14092099

Submission received: 25 July 2024 / Revised: 1 September 2024 / Accepted: 13 September 2024 / Published: 14 September 2024

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The precise identification of disease traits in the complex sugarcane planting environment not only effectively prevents the spread and outbreak of common diseases but also allows for the real-time monitoring of nutrient deficiency syndrome at the top of sugarcane, facilitating the supplementation of relevant nutrients to ensure sugarcane quality and yield. This paper proposes a human–machine collaborative sugarcane disease detection method in complex environments. Initially, data on five common sugarcane diseases—brown stripe, rust, ring spot, brown spot, and red rot—as well as two nutrient deficiency conditions—sulfur deficiency and phosphorus deficiency—were collected, totaling 11,364 images and 10 high-definition videos captured by a 4K drone. The data sets were augmented threefold using techniques such as flipping and gamma adjustment to construct a disease data set. Building upon the YOLOv8 framework, the EMA attention mechanism and Focal loss function were added to optimize the model, addressing the complex backgrounds and imbalanced positive and negative samples present in the sugarcane data set. Disease detection models EF-yolov8s, EF-yolov8m, EF-yolov8n, EF-yolov7, and EF-yolov5n were constructed and compared. Subsequently, five basic instance segmentation models of YOLOv8 were used for comparative analysis, validated using nutrient deficiency condition videos, and a human–machine integrated detection model for nutrient deficiency symptoms at the top of sugarcane was constructed. The experimental results demonstrate that our improved EF-yolov8s model outperforms other models, achieving mAP_0.5, precision, recall, and F1 scores of 89.70%, 88.70%, 86.00%, and 88.00%, respectively, highlighting the effectiveness of EF-yolov8s for sugarcane disease detection. Additionally, yolov8s-seg achieves an average precision of 80.30% with a smaller number of parameters, outperforming other models by 5.2%, 1.9%, 2.02%, and 0.92% in terms of mAP_0.5, respectively, effectively detecting nutrient deficiency symptoms and addressing the challenges of sugarcane growth monitoring and disease detection in complex environments using computer vision technology.

Keywords:

disease detection; growth monitoring; deficiency syndrome; complex environment; attention mechanism

1. Introduction

Sugarcane is one of the most important sugar crops and ranks as the fifth largest crop in the world. It supplies 80% of the world’s sugar and 40% of fuel ethanol raw materials. However, in recent years, with the continuous warming of the climate and uneven distribution of precipitation, the occurrence of sugarcane diseases has become irregular, significantly impacting both its yield and quality. Therefore, the real-time monitoring of sugarcane diseases is the most effective method for prevention.

At present, effective monitoring of sugarcane growth is one of the important methods to reduce and alleviate diseases. The traditional method mainly focuses on manual monitoring, which involves the early warning of pests and diseases by crop experts based on climate knowledge and crop characteristics, prediction by planting personnel leveraging their planting experience, and the visual observation of the occurrence, development, and outbreak of pests and diseases. The most prevalent approach relies on the human visual inspection of leaf characteristics, such as color variations, size, shape, texture, lesion patterns, and the overall appearance of the affected leaf area, to identify specific diseases and their severity [1]. However, this method, which is based on subjective experience, often leads to erroneous diagnoses, potentially resulting in a decline in crop yield [2]. Due to the sugarcane typically reaching a height of 3 m during its maturation phase, it is challenging for humans to observe the color of the topmost leaves, thus impeding an accurate determination of which primary nutrients the sugarcane may be lacking during its growth process. Moreover, the manual detection method is expensive, time-consuming, and inefficient, which can no longer meet the needs of sugarcane leaf detection in the complex field environment. Therefore, it is imperative to develop an accurate and efficient method for sugarcane disease detection.

In recent years, with the rapid development of artificial intelligence technology, machine learning techniques have been progressively applied to the field of crop disease identification. This is primarily achieved through the construction of classification and detection models for disease identification encompassing traditional machine learning techniques, visualization technologies, and improved deep learning methods. Among these, traditional machine learning models, as one of the classic artificial intelligence models [3,4], primarily focus on research in decision trees, random forests, and artificial neural networks. The primary characteristic is the absence of visualization technology, relying on manually designed feature extraction and classification for the detection of crop pests and diseases [5,6,7]. Techniques such as boosting, support vector machines, fuzzy logic, K-means clustering, and naive Bayes are commonly employed for the detection of crop diseases [8]. However, due to the variability in manual design and poor model generalizability, a significant number of these models struggle to be implemented in production practices [9,10,11]. Yigit et al. [12] used a support vector machine classifier to classify diseased and healthy sugarcane leaves in a single environment, with an accuracy of 92%, but the research results are limited to binary classifications, which are difficult to apply in the actual planting process. Ratnasari et al. [13] used the gray level co-occurrence matrix to extract disease features under laboratory conditions with an accuracy of 80%, but this method cannot automatically update the parameters in the algorithm. Zhang et al. [14] used a genetic algorithm to automatically optimize the parameters of the kernel function in the support vector machine, which solved the problem of automatically adjusting parameters, but increased the time cost. Hossain et al. [15] proposed the use of the KNN algorithm to model, mainly extract, texture parameter features from diseased areas of sugarcane leaf images for classification, and to realize the detection of plant leaf diseases.

With the advancement of deep learning technology, the emergence of deep learning models/architectures and visualization techniques has allowed for the clearer identification of plant diseases. The profound ability of deep learning to learn and extract features plays a pivotal role in fields such as crop quality inspection and disease recognition [16,17,18,19]. In particular, as a typical deep learning algorithm, convolutional neural networks (CNNs) have attracted the attention of a large number of scientific researchers. This is different from traditional image processing technology, which requires manual extraction of target features, often leading to complex processes and low recognition accuracy, making it challenging to apply in complex practical production scenarios [20,21]. The deep learning technology based on CNNs can realize end-to-end detection by learning the characteristics of different fields, scenes, and scales, and has good feature extraction ability and generalization ability [22]. Specifically, based on a large number of data set training, it has improved the ability of deep-level image feature mining, and formed a series of models such as YOLO (You Only Look Once) [23], Fast R-CNN (Fast Region CNN) [24,25], and SSD (Single Shot MultiBox Detector) [26,27] in image target classification [28,29], recognition, and detection [30,31], and other fields. These models have been widely used in object recognition and detection such as target classification, weed recognition, pest stress, and agricultural product quality analysis [32] demonstrating high recognition accuracy and strong robustness. However, the data collection environment of many researchers is limited to indoor, which limits the diversity of crop image resource database and affects the practicability of research results to a certain extent [33]. Moreover, in complex scenes, factors such as image resolution, light intensity, and cross gaps between crop leaves in the scene will affect the target detection (classification) effect [34]. Therefore, it is imperative to apply deep learning methods with strong feature learning and extraction capabilities in complex agricultural scenes [35,36,37].

With the deep integration of crop disease identification with deep learning technology, some researchers have introduced new or improved deep learning architectures to achieve better performance in disease detection models. Zhang, X et al. [38] proposed an enhanced GoogLeNet and Cifar-10 model, along with its performance compared to AlexNet and VGG. The improved model achieved an accuracy rate of 98.7%. However, this method is only applicable to identification in the laboratory, and exhibits poor practicality. With the in-depth development of deep learning technology, target detection [39] and instance segmentation [40], etc., have been gradually realized in the field of crop pest detection. For example, in the field of crop disease detection, Hirani et al. [41]. proposed a new hybrid model GET, which uses lightweight Ghost Net [42] as a feature extraction network, and then put it into the transformer encoder to identify grape leaf diseases and pests. The accuracy of this model reached 98.14%, and its performance was 1.7 times faster and 3.6 times lighter than MobilenetV3_Large_100 [43].

Currently, researchers primarily focus on the detection of common sugarcane diseases, without investigating the adverse effects on sugarcane growth and development caused by the absence of various nutrients, which manifest as distinct symptoms at the leaf tips. Studies predominantly concentrate on disease detection in a singular environment, without employing a variety of algorithms to construct disease models for comparative analysis, selecting the most effective models for algorithmic improvement and subsequent construction of disease detection models, in order to achieve precise detection of sugarcane diseases in the realistically complex planting environment.

The novelty of this study is primarily manifested in the following two aspects.

(1) A method for detecting nutrient deficiency disorders at the top of sugarcane through human–machine collaboration in complex environments is proposed. Different from conventional disease detection models, data collection primarily involves the use of drones to capture the video data of abnormal symptoms (such as sulfur and phosphorus deficiencies) at the top of sugarcane. In terms of modeling, five fundamental instance segmentation models from YOLOv8 are employed to construct individual detection models for sugarcane nutrient deficiency symptoms. These models are validated through drone-captured video data, enabling intelligent growth monitoring and precise disease detection for sugarcane. This approach addresses challenges such as the impact of deficiencies in nitrogen, phosphorus, and potassium on sugarcane growth and the need for accurate disease detection during the planting process.

(2) A method for the precise detection of sugarcane diseases in complex environments is proposed. The experimental data are sourced from the Yunnan Provincial Sugarcane Germplasm Resource Garden in Kunming; this research targets six major sugarcane diseases prevalent in the Yunnan Plateau region. The experimental approach employs a progressive modeling strategy. Initially, seven distinct sugarcane disease detection models are constructed using common deep learning algorithms. After comparing the performance of these models, the five superior algorithms are selected and enhanced with the EMA attention mechanism and Focal loss function to optimize the models. Subsequently, these optimized models are utilized to detect sugarcane diseases, resulting in five different detection models. A further comparison of these models’ performance is conducted to identify the optimal model for precise detection of sugarcane diseases in realistically complex planting environments.

In this study, six common diseases (five common diseases and nutrient deficiency symptoms) in sugarcane growth, including red rot, ring spot, rust, brown spot, brown stripe, and nutrient deficiencies (sulfur deficiency and phosphorus deficiency), were taken as the research object. By utilizing UAVs, patrol robots and other equipment, the abnormal characteristics of the leaf surface and the top of the sugarcane growth process were monitored, focusing on monitoring the color of the top leaf and the characteristics of the leaf surface disease, and the intelligent detection methods of common diseases and the identification methods of the top nutrient deficiency were proposed, respectively. Firstly, the algorithms of Yolov8s, Yolov5n, Yolov7, Yolov8n, Yolov8m, Faster R-CNN, and Libra R-CNN were used to construct sugarcane disease detection models and conduct comparative tests to select the best model. By adding the EMA attention mechanism, the Focal loss function optimization model was used to further improve the accuracy of disease detection model, forming a sugarcane disease target detection model based on EF yolov8s. At the same time, the five basic case segmentation models of yolov8 were used to build the detection models of sugarcane nutrient deficiency symptoms, respectively, and the reasoning verification was carried out through UAV aerial video to realize the intellectualization of sugarcane growth monitoring and the accuracy of disease detection, and solve the problems such as the lack of nitrogen, phosphorus, and potassium elements affecting the growth of sugarcane during planting and the accurate detection of diseases.

2. Materials and Methods

2.1. Image Acquisition and Data Enhancement

2.1.1. Image Acquisition

Since the experiment mainly focuses on the detection of sugarcane disease diversity in Yunnan Plateau, it is challenging to match the current open source data, so it is difficult to collect the data needed for the experiment. The experimental base of this study is Kunming Sugarcane Germplasm Resources Nursery of Yunnan Province. It mainly carries out the growth monitoring of five common diseases, namely, red rot, ring spot, rust, brown spot, and brown stripe, as well as nutrient deficiencies (sulfur deficiency and phosphorus deficiency). Initially, utilizing the Tyrannosaurus 03A-X inspection robot(Produced in Yunnan Province, China by Cloud AgroServices (Hunan) Intelligent Technology Co.), iPhone 13 smartphones (Produced by Apple Inc.), and the DJI Mavic 3 drone (Produced by Shenzhen DJI Technology Co.), a total of 11,364 images and 10 4K high-definition aerial videos were collected from the bottom, middle, and top of sugarcane plantations. These images and videos depict leaf disease of over ten different sugarcane varieties, including YAU 01-58, YAU 01-104, YAU 01-106, YAU 09-38, ROC 20, ROC 22, YC 89-9, GT 11, YT 93-159, and YT 86-368. Photos of disease were taken by mobile devices at a distance of 15 cm to 30 cm from the diseased parts of the leaves, with a resolution of 3000 × 4000 pixels or higher. Images of nutrient deficiency and aerial videos of sugarcane fields were captured by drone equipment at a distance of 15–20 m from the sugarcane fields. As shown in Figure 1, after image classification, sorting, and abnormal image screening, a total of 6270 images of five major sugarcane diseases in Yunnan, namely, red rot, ring spot, rust, brown spot, and brown stripe, were retained, with 1686 images and 10 4K high-definition aerial videos of the nutrient deficiencies.

2.1.2. Image Enhancement

Gamma correction is a nonlinear transformation employed to rectify luminance discrepancies and contrast in images, facilitating a more uniform distribution of brightness and enhancing the visual quality of the image. Color tone adjustment can alter the overall tint of an image, aligning it more closely with detection requirements. In the context of pest and disease detection, images are sourced from diverse environments and conditions. By adjusting the gamma value and color tone of the image, optimizing its brightness and contrast, and thereby enhancing image quality, it is possible to adapt to varying environmental conditions and consequently improve detection accuracy.

To build a sugarcane disease detection model, it is necessary to build an image data set based on image data, and enhance the data based on this data set. For this reason, this experiment uses such preprocessing methods as flipping, adjusting brightness/contrast, gamma value, and hue to expand the data set by three times to form a data resource database, as shown in Figure 2 and Table 1. After expansion, there are 18,810 images in total. LabelImg software (Version: windows_v1.8.1) was selected to mark the above 18,810 sugarcane leaf disease images, and then the data set was divided into training set, test set, and verification set according to the ratio of 7:2:1 to form a complete data set that can be used for training and testing.

The data set of nutrient deficiency symptom pictures is expanded three times by horizontal inversion, vertical inversion, Gaussian blur, and brightness adjustment, and there are 5058 pictures after expansion, as shown in Figure 3 and Table 1. Labelme software (Version: windows_v1.8.1) was selected to label the above 5058 sugarcane leaf nutrient deficiency images and divide the data set into training set, test set, and verification set according to the ratio of 7:2:1 to form a complete data set that can be used for training and testing.

2.2. Experimental Design

YOLOv8, released by Ultralytics (Frederick, MD, USA), the company behind YOLOv5, in January 2023, incorporates the most advanced backbone and neck architectures along with an anchor-free split Ultralytics head, thereby enhancing feature extraction and object detection performance. The backbone architecture adopted is Darknet53, which includes the fundamental convolutional unit Conv, the Spatial Pyramid Pooling module SPPF that facilitates the fusion of local and global features at the Feature Map level, and the C2F module that augments the network’s depth and receptive field to improve feature extraction capabilities. YOLOv8′s detection and segmentation models are available in five distinct model sizes for pre-training, with parameters ranging from small to large as follows: YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, and YOLOv8x. In this study, the detection model employs the n, s, and m models.

YOLOv5 was released by Glenn Jocher in 2020, and its most prominent feature is the inclusion of the Focus structure and CSP Darknet-53 structure within the backbone network. The Focus structure is a crucial component of YOLOv5, utilized for extracting high-resolution features. It employs a lightweight convolutional operation that helps the model maintain a high receptive field while reducing computational burden. CSP (Cross Stage Partial) Darknet-53, the backbone network structure in YOLOv5, introduces the concept of cross-stage partial connections. By dividing the feature map into two parts in the channel dimension, it maintains high feature representation capability, contributing to the enhancement of object detection accuracy and speed. There are five versions of YOLOv5: YOLOv5n, YOLOv5s, YOLOv5m, YOLOv5l, and YOLOv5x. Among them, YOLOv5n is the network with the smallest depth and the narrowest feature map width in the series.

Traditional real-time object detectors have focused on structural optimization, whereas YOLOv7 diverges, emphasizing the optimization of the training process. This includes several modules and optimization methods aimed at enhancing object detection accuracy without increasing inference costs. YOLOv7 introduces a methodical reparameterization model, a strategy applicable to various network layers, featuring the concept of gradient propagation paths. Subsequently, it introduces a novel label assignment method, the coarse-to-fine guided label assignment. Additionally, it proposes “expansion” and “compound scaling” methods for real-time object detectors, enabling efficient utilization of parameters and computations. The methods proposed by YOLOv7 can effectively reduce the parameters and computational demands of state-of-the-art real-time object detectors by approximately 40% and 50%, respectively, resulting in faster inference speeds and higher detection accuracy [44].

Faster R-CNN is an efficient deep neural network architecture for object detection, which optimizes upon R-CNN and Fast R-CNN, significantly enhancing the speed and accuracy of target detection. The core idea of Faster R-CNN lies in the rapid generation of candidate regions through the introduction of the Region Proposal Network (RPN), and the sharing of convolutional features with the detection network, thereby greatly improving detection speed and accuracy. Its architecture primarily consists of the following components: feature extractor, Region Proposal Network (RPN), RoI Pooling layer, classifier, and regressor [45].

Libra R-CNN enhances detection performance by addressing imbalances during training, building upon existing object detection frameworks such as Faster R-CNN. These imbalances primarily include those at the sample level, feature level, and training objective level. It achieves this by introducing three components: IoU-balanced sampling, balanced feature pyramid, and balanced L1 loss, thereby improving the performance of object detection [46].

Considering the complex environment of sugarcane growth, the data of common diseases and the characteristic data of top deficiency were collected as training data sets and test data sets, respectively. Seven commonly used target detection algorithms were used to construct five disease detection models, including brown stripe and rust, and the model was optimized by introducing attention mechanism and improving loss function. Five kinds of network structures provided by yolov8 are used to build a sugarcane growth monitoring model to detect the situation of seed deficiency in the process of sugarcane growth, so as to realize computer vision technology to solve the problem of disease detection in the process of sugarcane planting in complex environments. The detailed experimental design is as follows.

First of all, after image classification, sorting, abnormal image filtering, elimination, and other operations on 11,364 images collected, a total of 6270 disease images, 1686 nutrient deficiency images, and 10 4K high-definition videos of nutrient deficiency were collected. Then, through data cleaning, random image flipping, and adjusting brightness/contrast, gamma value, hue, and other preprocessing methods, data enhancement is performed to form a modeling data set. The disease detection models were constructed using yolov5n, yolov7, yolov8s, yolov8n, yolov8m, Faster R-CNN, and Libra R-CNN algorithms, respectively. The best performance model was selected to build the detection model of sugarcane leaf disease in Yunnan under a complex environment, and then the model was optimized by introducing attention mechanism and improving loss function.

Because the characteristics of sugarcane deficiency are mainly displayed on the whole leaf, it is difficult to detect or identify the symptom through close observation and photographing with handheld devices, patrol robots, and other conventional image acquisition devices, and only remote observation can estimate the diseased plants. Moreover, the symptom of deficiency is usually that multiple leaves interlace to form a complex detection area, which makes it difficult to achieve conventional target detection image processing. Therefore, after studying the characteristics of the symptoms of nutrient deficiency, this paper uses a UAV to take aerial photos, observe the color of the sugarcane top leaf area, identify it through image segmentation, and monitor it through video. At the same time, the segmentation performance of yolov8n-seg, yolov8s-seg, yolov8m-seg, yolov8l-seg, and yolov8x-seg on the growth defect data set was compared, and the best segmentation model of growth defects on the top of sugarcane was selected.

The specific process is shown in Figure 4.

2.3. Construction of Sugarcane Disease Detection Model

2.3.1. Construction of Common Disease Detection Model and Disease Segmentation Model Based on yolov8s

The network structure of yolov8s is shown in Figure 5. First, the features of the imported sugarcane disease pictures are gradually extracted through the backbone part to obtain feature maps of different scales. Then, the feature maps of different scales are fused by the neck module. Finally, the head module adjusts the size of the feature map according to the task needs and outputs the prediction results.

The backbone part contains the C2F module and Conv module. The C2F module includes one slicing operation, two 1 × 1 convolution operations, n bottomleneck operations, and one splicing operation. The first 1 × 1 convolution realizes cross channel information interaction of the input feature map. The slicing operation divides the feature map into two on the channel dimension. Half of the feature maps are stacked through n bottomleneck operations in turn, and then the results of each bottomleneck operation are spliced with the feature map after the first convolution. Finally, through the second 1 × 1 convolution operation, the number of channels in the spliced feature map is compressed to the same number of channels as the input feature map, which not only ensures the lightweight of the model, but also provides more abundant gradient information.

In the head part, the yolov8 model uses the current mainstream decoupled head to separate the classification and positioning prediction heads, while leaving the object branch. Therefore, the model also changes from an anchor-based prediction mode to an anchor-free prediction mode, which omits the design process of a prior frame and saves the time required to remove redundant frames in the model prediction process.

The backbone, neck network, and target detection of instance segmentation are identical, but there are two differences in the head layer: first, in addition to generating the feature maps of three prediction boxes and three prediction CLs at the head layer, three feature maps with 32 channels are also generated to be used as mask Coefficients; second, a Prototype Mask feature map with the size of (1,32.80,80) is generated from the 80 × 80 scale feature map as the feature map of the original segmentation.

Yolov8-seg, as shown in Figure 6, extracts features from the input UAV images through the backbone network. Then, through the FPN feature pyramid, the feature maps of different sizes are fused, and P3 (the feature map with the largest resolution) is taken as the input of the Protonet, and Prototypes is the output of the Protonet, as a native mask for network prediction. First, the detection branch is carried out. For each target object, the category, border information (x, y, w, h), and k mask Coefficients (mask confidence, value 1 or −1) are output. Then, the branch is split, and k Prototypes (mask prototype images) for the current input image are output. For different pictures, the output Prototypes are different, but the number is also K 5. For each target object, k mask Coefficients (the confidence of the mask) and k Prototypes (the mask prototype image) are multiplied, and then all the results are added to obtain the results of the case segmentation of deficiency.

2.3.2. Construction of Improved EF-yolov8s Sugarcane Disease Detection Model

In view of the complex background and imbalance between positive and negative samples in the sugarcane data set collected for this study, we used yolov8s as the base model and added the EMA attention mechanism and Focal loss function to optimize the model. The improvement is shown in Figure 7.

The EMA attention mechanism has a multi-scale parallel sub-network for establishing short- and long-term dependencies, which can reshape some channel dimensions into Batch dimensions to avoid some form of dimensionality reduction through general convolution. Then, the output feature maps of the two parallel sub-networks are fused through cross-space learning methods, and cross-space information aggregation methods are provided in different spatial dimensions to enrich feature aggregation. Specifically, the output of the 1 × 1 branch encodes global spatial information through two-dimensional global average pooling, while the output of the 3 × 3 branch is directly converted into the corresponding dimensional shape. Then, these outputs are aggregated through matrix pointwise operations to generate the first spatial attention map. Finally, the output feature maps within each group are aggregated through two Sigmoid functions that generate spatial attention weights, capturing pixel-level pairing relationships and highlighting the global context of all pixels. The final output of EMA has the same size as the input and can be effectively stacked into modern architectures.

The problem caused by the imbalance between positive and negative samples is that there will be a large number of simple samples in the sample, all of which belong to the background. This will lead to the loss function value being mainly contributed by the easy and difficult samples, which will dominate the update direction of the gradient. The network will not learn useful information and cannot accurately classify the disease. The core idea of Focal loss is to scale the loss as a whole, with the easy classification samples being scaled more than the difficult classification samples, so that the weight of the difficult classification samples is highlighted in the loss function, making the model focus more on the difficult classification samples during training.

2.4. Evaluation Metrics

Accuracy represents the proportion of all samples that are correctly predicted by the model.

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(1)

mAP_0.5: mAP_0.5 represents the average of the mean accuracy across all categories when the IoU threshold is set at 0.5. The mean average precision (mAP) is the average of the average precision (AP) across all target categories detected. The average precision (AP) is used to measure the performance of a particular class detection, specifically the recognition accuracy in the detection of sugarcane diseases. The mean average precision (map) is the metric used to evaluate the overall performance of the detection system.

We evaluated the detection performance across multiple categories, specifically the identification efficacy of all sugarcane disease classes, as follows:

A P = \sum_{i = 0}^{n - 1} (r_{i + 1} - r_{i})

(2)

where r₁, r_2, …, r_n represent the recall values corresponding to the first interpolation point of the precision interpolation segments arranged in ascending order.

m A P = \frac{\sum A P}{N (c l a s s e s)}

(3)

Precision, the ratio of the number of instances of a particular feature detected to the total number of instances detected, is used to evaluate whether the model can accurately identify the target.

P r e c i s i o n = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e P o s i t i v e} = \frac{T P}{T P + F P}

(4)

In this context, True Positive refers to the positive samples that the network model correctly predicts as diseased, False Positive denotes the negative samples incorrectly predicted as diseased, and False Negative represents the positive samples that the network model fails to identify as diseased, thus representing false negatives.

Recall: The ratio of the number of features detected belonging to a class to the total number of features of that class in the data set. A measure of the completeness of a model’s detection system.

R e c a l l = \frac{T r u e P o s i t i v e}{T r u e P o s i t i v e + F a l s e N e g a t i v e} = \frac{T P}{T P + F N}

(5)

In this context, TP (True Positive) refers to the positive samples that the network model correctly predicts as diseased, FP (False Positive) denotes the negative samples incorrectly predicted as diseased, and FN (False Negative) represents the positive samples that the network model fails to detect as diseased, i.e., false negatives.

F1: The harmonic mean of precision and recall. F1 score comprehensively evaluates both precision and recall.

F = \frac{2 P R}{σ^{2} (P + R)}

(6)

In this context, P denotes Precision, R denotes recall, and α represents a weighting factor. When α equals 1, it signifies that the weights of accuracy and recall are equal; thus, F represents F1. The formula is as follows.

F 1 = \frac{2 P R}{P + R}

(7)

Generally, a higher F1 score indicates a more effective model.

Gradient: In machine learning models, the gradient refers to the partial derivative of the objective function with respect to the model parameters. It indicates the rate and direction of change in the objective function at the current point.

Confidence denotes the degree of certainty the model has regarding the existence of the predicted target (the object within the bounding box), while also taking into account the accuracy of the bounding box’s positioning. A high confidence level implies that the model is highly confident that the target indeed exists within the bounding box, and that the position and dimensions of the bounding box are relatively accurate.

FLOPs: A metric used to assess the computational complexity of a model. FLOPs stands for floating-point operations, representing the number of floating-point calculations required for a single forward pass through the model.

Convolutional layers are as follows.

F L O P s = \frac{H_{o u t} W_{o u t} (C_{i n} (2 K^{2} - 1)}{g + 1}

(8)

Hout and Wout denote the height and width of the convolutional layer output, respectively. Cin represents the number of input channels, K the size of the convolutional kernel, Cout the number of output channels, and g the number of groups in grouped convolution. The +1 is for convolutions with bias. The fully connected layer is as follows.

F L O P s = ((2 C_{i n} - 1) + 1) C_{o u t} = 2 C_{i n} C_{o u t}

(9)

3. Results

3.1. Analysis of Experimental Results of Different Depth Learning Models

3.1.1. Performance Analysis of Different Depth Learning Algorithms in Sugarcane Disease Detection

Yolov8s, Yolov5n, Yolov7, Yolov8n, Yolov8m, Faster R-CNN, and Libra R-CNN are used to conduct comparative tests on the same data set. The parameters are set as image size 640 × 640, batch size 32, and epoch 200. The evaluation indicators are selected to obtain the comparison experiment results from the accuracy rate, recall rate, mAP_0.5, F1 score, depth, parameter quantity, gradient, and FLOPS (G). All data in the table use the average value. The experimental results are shown in Table 2: they summarize the relative performance of seven target detection model architectures, and give the evaluation indicators of the models, including mAP_0.5 precision, recall, F1, Depth, parameter amount, gradient, and FLOPS (G). Among them, the mAP_0.5, precision, recall, and F1 of yolov8s are the highest, 85.7%, 84.56%, 81.12%, and 83%, respectively. It is followed by yolov8m and yolov8n, and the other four models are all below 70%. Therefore, yolov8s has better detection rate and generalization ability in sugarcane disease detection task.

3.1.2. Performance Comparison and Analysis of Different Depth Learning Algorithms in Different Sugarcane Disease Detection

Table 3 shows the comparison of accuracy, recall, mAP0.5, and mAP0.5-0.95 of each algorithm for detection of five diseases. The overall performance of yolov8s and yolov8m is better than the other five algorithms. From the perspective of accuracy, yolov8s is particularly prominent in brown stripe disease and red rot disease, with the accuracy of 90.60% and 88.60%, respectively, which has high accuracy in identifying these two diseases. In contrast, the precision of yolov8m only reached 81.2% on red rot, and the other four diseases did not reach 80%. In terms of recall rate, yolov8s also performed well on brown stripe and red rot, with the recall rates of 90.30% and 81.10%, respectively, indicating that the model can better cover the actual disease cases. However, Yolov8m also has a high recall rate of only red rot and a low recall rate of the other four diseases, which means that the model has missed detection when identifying diseases.

MAP is an important index to measure the performance of target detection algorithm. It can be seen from the table that yolov8s performs well in mAP_0.5 and Map_0.5–0.95 on a variety of diseases, indicating that the model has high performance in diagnosis of various diseases. Especially on brown stripe and red rot, the Map_0.5–0.95 of yolov8s reached 73.10% and 53.70%, respectively, which further proved the stability of the model under strict evaluation criteria.

3.2. Analysis of Experimental Results of Improved Deep Learning Model

3.2.1. Overall Analysis of Sugarcane Disease Detection Results after Model Improvement

In order to verify the effectiveness of the improvements proposed in this study, five network models including yolov8m, yolov8n, yolov7, yolov5n, and yolov8s used in the earlier stage of this study were selected for improvement. Five improved network models were obtained and compared with self-built data sets under the same training environment. The experimental results are shown in Table 4. The improved template detection model greatly improved the sugarcane disease target detection scene. Among them, the mAP_0.5, precision, recall, and F1 of EF yolov8s are 4%, 4.2%, 4.9%, and 5% higher than the original model, and the values are 89.70%, 88.70%, 86.00%, and 88.00%. EF-yolov8m has a maximum increase of about 15%, which is the highest value of the five models: 90.65%, 90.37%, 86.00%, and 88.00%. However, the number of network layers, parameter quantities, gradients, and FLOPS are the highest, which has the problems of high model computation and low operation efficiency. Compared with the original model, the model performance of EF-yolov7 is significantly improved by about 20%. EF yolov8m is nearly 15% higher than the original model. The EF-yolov8n and EF-yolov5n models have slightly improved compared with the original model, and the mAP_0.5 and recall values even have a negative increase.

3.2.2. Analysis of Detection Results of Different Sugarcane Diseases after Model Improvement

Table 5 shows the comparison of accuracy, recall, mAP0.5, and mAP0.5-0.95 of the improved Yolo algorithms for detection of five diseases. It can be observed from the table that the introduction of EMA attention mechanism and Focal loss function improves the performance of sugarcane disease detection models, and the performance of EF yolov8s and EF yolov8m are the best. In terms of brown stripe detection, the improved EF yolov8s had the smallest improvement, but still reached the highest values: 93.30%, 92.30%, 91.70%, and 78.40% due to the high accuracy and recall of the original model. The red rot disease increased by 4.30%, 9.20%, 7.90%, and 13.50% with the highest increase rate. In terms of precision of the other three diseases, rust increased by 8.3%, while ring spot and brown spot increased by about 4%. In terms of recall rate, the increase is about 7%. In terms of mAP_0.5, the increase is about 6%. In terms of Map_0.5–0.95, the increase is about 10%. EF yolov8m significantly improved the detection of brown stripe disease, with 22.10%, 26.40%, 18.30%, and 37.90%, respectively. Red rot disease increased by 10.60%, 6.50%, 6.00%, and 19.00%, respectively. The precision, recall rate, and mAP_0.5 of the other three diseases increased by more than 10%, and the Map_0.5–0.95 increased by about 20%.

The target detection results of EF-yolov8s are shown in Figure 8. For the sugarcane disease detection task of this study, the model was able to identify effectively in complex environments.

3.2.3. Performance Comparison before and after Model Improvement

Figure 9 shows the performance comparison on the Precision Confidence, Recall Confidence, and F1 Confidence curves of yolov8s and the improved EF yolov8s. The abscissa represents confidence, from 0 (low confidence) to 1 (high confidence). The ordinate represents the corresponding precision, recall rate, and F1 score. On the Precision Confidence curve, it can be observed that, when the confidence level of yolov8s and EF yolov8s reaches 0.8, the model has better precision. The improved EF-yolov8s has significantly improved the detection accuracy of rust. The accuracy of all categories has increased from 0.946 to 0.957 at a confidence level of 1, and can still maintain a high accuracy at a low confidence level. On the Recall Confidence curve, when the confidence level of EF yolov8s is 0, the recall rate of all categories has increased from 0.91 to 0.93. It can be seen from the figure that the area enclosed by the curve has increased compared with yolov8s, indicating that the improved model can maintain high precision while maintaining high recall rate. On the F1 Confidence curve, yolov8s obtained better F1 scores in the confidence interval of 0.2–0.7, while EF yolov8s obtained better F1 scores in the confidence interval of 0.1–0.8. This shows that, within this range, the model performs better in balancing accuracy and recall, and the range of the improved model is significantly improved compared with the original model. At the peak of F1 score, the performance of EF yolov8s is better than that of yolov8s.

3.3. Analysis of Experimental Results of Monitoring Nutrient Deficiency Symptoms of Sugarcane Tip Growth

Yolov8s-seg model is used to train the symptoms of sugarcane deficiency. According to the scale of the model, yolov8 provides five network structures: yolov8n, yolov8s, yolov8m, yolov8l, and yolov8x. Yolov8n is the smallest and fastest model; Yolov8x is the most accurate but slowest model. According to the task requirements of the power segmentation of the deficiency syndrome, this study uses five basic instance segmentation models of yolov8 for comparative analysis. Its training parameters are set as follows: image size 640 × 640, batch size 32, epoch 150. The mask overlap is required during training, and the sampling ratio under the mask is 16. The results are shown in Table 3.

Table 6 summarizes the relative performance of the five strength segmentation models on the symptoms of sugarcane deficiencies. According to the evaluation indicators, the yolov8 series of segmentation models have achieved more than 75% of mAp_0.5 in the segmentation task of sugarcane deficiency symptoms. While yolov8s seg is more efficient and lightweight, mAP_0.5, precision, recall, and F1 are higher than other segmentation models of yolov8.

The weight of training is derived, reasoning experiments are carried out in UAV aerial video, and the results are shown in Figure 10. The figure shows the segmentation results of sugarcane sulfur deficiency and sugarcane phosphorus deficiency. The image model of obvious nutrient deficiency in the video can be well segmented and detected, but the detection effect in other video frames is not satisfactory.

4. Discussion

4.1. Analysis of the Practicality of a Sugarcane Disease Detection Model Based on Human–Machine Collaboration in Complex Environments

This study introduces a method for sugarcane disease detection utilizing human–machine collaboration in complex environments. Specifically, the EF-yolov8s model and yolov8s-seg were constructed. The EF-yolov8s model is designed to detect five common sugarcane diseases: brown stripe, rust, ring spot, brown spot, and red rot. The model achieved mAP_0.5, precision, recall, and F1 scores of 89.70%, 88.70%, 86.00%, and 88.00%, respectively, demonstrating its practicality for sugarcane disease detection. The yolov8s-seg model, with a smaller number of parameters, achieved an average precision of 80.30%, effectively detecting symptoms of nutrient deficiencies.

After reviewing relevant papers on sugarcane disease identification, the achievements of researchers in this field are summarized as follows: LAKSHMIKANTH PALETI employed k-NN and SVM for ANN pre-training and constructed a disease recognition model based on CNN, developing a novel method for sugarcane disease identification with an accuracy rate of 88% [47]. Isabela Ordine Pires da Silva Sim˜oes utilized the Radial Support Vector Machine (SVM) algorithm to build a classification model for sugarcane orange and brown rust diseases. The experimental results showed that the segmentation produced by Object-Based Image Analysis (OBIA) for RGB images of infected leaves exhibited high accuracy (>0.88), laying the foundation for the development of an application for automatically identifying these two types of rust diseases in sugarcane leaf RGB images [48]. Muhammad Hammad Saleem adopted various visualization techniques under traditional DL architectures to detect and classify symptoms of plant diseases, achieving an accuracy rate of 88% for disease recognition [49]. Cuimin Sun integrated the SE attention module into ResNet-18 (CNN), enhancing the learning of inter-channel weights, introducing Multi-Head Self-Attention (MHSA), and adding 2D relative position encoding to boost the model’s accuracy, ultimately achieving an accuracy rate of 89.57% for sugarcane disease recognition [2]. Swapnil Dadabhau Daphal proposed a multi-level deep learning architecture based on attention for reliable classification of plant diseases, achieving an accuracy rate of 86.53% [50].

The findings of the aforementioned researchers reveal that some focused solely on classifying four types of sugarcane diseases; others constructed classification models solely using images from public data sets; and some employed machine learning algorithms to build sugarcane disease classification models. In contrast, this study targets five common sugarcane diseases and two nutrient deficiencies in the Yunnan Plateau region. The data for modeling were exclusively sourced from the Yunnan Provincial Sugarcane Germplasm Resource Garden in Kunming. Furthermore, the data types are diverse, including images and video data, and the collection methods are varied, using the Bolong 03A-X inspection robot, iPhone 13 smartphones, and DJI Mavic 3 drones to capture disease images from the base, middle, and top of sugarcane plants, respectively. The modeling approach primarily involves integrating an EMA attention mechanism and a Focal loss function to optimize the model, thereby constructing a disease detection model for precise disease identification. Additionally, it can pinpoint the location and number of lesions in the images. This disease detection model is scalable, allowing for further in-depth research in the future.

4.2. Improved EF yolov8s New Method for Constructing Sugarcane Disease Detection Model

In this study, we compared the performance of six target detection models (yolov5n, yolov7, yolov8n, yolov8m, Faster R-CNN, and libra RCNN) with yolov8s on sugarcane leaf disease detection task. Although the average performance of yolov8s is good in all aspects, it still has some limitations. The overall accuracy of the model is not high enough. In the complex background environment and the area where multiple diseases are concentrated, the problem of false detection or missing detection will occur to a large extent. The number of parameters and gradient values in the comparison model are not optimal. At the same time, the backbone network of the model may not be effective for disease feature extraction under complex background, especially in small disease spots or dense disease areas, an attention mechanism or more powerful feature fusion strategy needs to be introduced.

Therefore, we added EMA attention mechanism and Focal loss function to the yolov8s basic model to optimize the model. Compared with the yolov8s basic model, the improved EF yolov8s model has improved in accuracy, but it is still not high enough in general, and also increases the depth and parameters of the model. The improved EF yolov8s is compared with four optimization models, namely, EF yolov8m, EF yolov8n, EF yolov7, and EF yolov5n, which are also added with EF modules. The research results show that the overall performance of the EF yolov8s model is the best when considering lightweight comprehensively. The data for this study were collected from the Yunnan Provincial Sugarcane Germplasm Resource Garden in Kunming, encompassing over ten sugarcane varieties such as YAU 01-58, YAU 01-104, YAU 01-106, YAU 09-38, ROC 20, ROC 22, YC 89-9, GT 11, YT 93-159, and YT 86-368. During the experimental design process, considering the variability of diseases among different sugarcane varieties, data were collected from over 10 distinct sugarcane disease types. Consequently, the EF-yolov8s sugarcane disease detection model exhibits universal applicability. In terms of disease types, this study primarily focuses on the identification of five significant diseases in the Yunnan Plateau region, as well as two nutrient deficiency symptoms in sugarcane (sulfur deficiency and phosphorus deficiency). Moving forward, data from other diseases will be collected to continuously refine the model. Additionally, this study primarily conducts disease detection based on the shape and size of sugarcane diseases, without yet considering the impact of growth conditions on disease detection.

In future research, the focus will be on expanding data sets and exploring growth conditions as breakthroughs for disease detection, aiming to enhance the model’s generalization capability for categories with limited data. Additionally, model fusion methods can be incorporated to leverage the complementarity among models to improve overall accuracy, particularly when dealing with complex backgrounds and multiple diseases. Different models may excel in detecting specific types of diseases, and integrating the results from multiple models can help reduce false positives and missed detections. This will optimize the EF-Yolov8s model, which has shown promising results, in terms of accuracy and lightweight design.

4.3. Application Potential of Sugarcane Tip Growth Monitoring Intelligent Model in Nutrient Deficiency Symptoms

In this study, we used the yolov8s-seg model to achieve instance segmentation of sugarcane deficiency image. We compared four instance segmentation models, yolov8n-seg, yolov8m-seg, yolov8l-seg, and yolov8x-seg. From the research results, we can see that, when yolov8s seg model is used for training, its mAP_0.5 and recall values are the highest, which indicates that the model has high accuracy and recall in identifying sugarcane deficiency instance segmentation, but the precision and F1 values are relatively low. In the whole comparative experiment, the evaluation indicators of each comparative model are relatively low. This may be caused by the following reasons.

Model underfitting: the model capacity is insufficient to fully capture the complex patterns in the data, which are usually manifested in that all evaluation indicators are not high, and the model complexity may need to be increased (such as increasing the number of network layers, expanding the number of filters).
Training data problem: the sample distribution among target categories is uneven. There are too many samples of common categories and too few samples of rare categories, which may cause the model to be too biased towards common categories in the training process, and the detection ability of rare categories is weak, which may require us to pay attention to the distribution of samples in the future in terms of expanding the data set.
Multi scale detection: YOLO models usually detect objects of different sizes at different scales. If multi-scale feature fusion or scale prediction module is not designed properly, it may lead to uneven detection performance between scales. Optimizing the multi-scale detection mechanism or introducing more effective cross scale information transfer methods can help improve the overall detection performance.

In addition, the training effect of segmentation model has reached 80%, but the video reasoning result is unsatisfactory. We speculate that there are several reasons for this.

The model is different from the data. The training data may be different from the sugarcane disease instances in the actual video, such as lighting, angle, occlusion, etc., and the data may not have enough diversity to cover all possible situations in the video.
Video preprocessing. There may be differences between video frames and training images in pre-processing, such as scaling, clipping, normalization, etc.
Post-processing strategy. In video reasoning, more complex post-processing strategies may be required to process the detection results between consecutive frames, such as using tracking algorithms to maintain the stability of detection frames.

Compared with other comparison models, the yolov8s seg model has smaller parameters and a faster gradient update speed, which means that the model can complete training faster in practical applications and has relatively low requirements for hardware resources. However, although yolov8s seg model performs well in performance, its depth is relatively shallow, which may affect its ability to recognize complex scenes and multiple targets. Therefore, in future research, we can further improve the performance of the model by increasing the network depth or introducing more feature extraction layers.

5. Conclusions

This study collected the images of common sugarcane diseases and sugarcane deficiency symptoms under the background of a complex field environment, and used the lightweight yolov8s as the multi-objective detection model of sugarcane diseases. On this basis, EMA attention mechanism and Focal loss function were added to optimize the model, which improved the detection rate and feature fusion ability of the model. For the sugarcane deficiency image, the yolov8s- seg model was used to segment the example of sugarcane deficiency. The following conclusions can be drawn from the comparative analysis of the above experimental results.

Under the same model super parameters, yolov8s is compared with yolov5n, yolov7, yolov8n, yolov8m, Faster R-CNN, libraRCNN, and other models. The yolov8s model is a relatively effective algorithm for sugarcane disease detection. Compared with yolov8m, its mAP_0.5, precision, recall, and F1 increased by 10.7%, 12.8%, 12.8%, and 13.7%, respectively, and the depth, parameter quantity, gradient, and floating-point calculation number of the model decreased by 23.7%, 56.9%, 56.9%, and 63.7%, respectively. The improved EF yolov8s is compared with the EF yolov8m, EF yolov8n, EF yolov7, and EF yolov5n models with the same EF module, and it is concluded that the improved model has achieved better detection results on the sugarcane disease image set. Although, compared with EF yolov8m, mAP_0.5, precision, and recall are 0.95%, 1.67%, and 0.62% lower, respectively, the depth of the model has been reduced by 70 layers, the number of parameters and gradients have been reduced by more than one time, and FLOGS has been reduced by more than two times, laying a foundation for lightweight deployment of subsequent models. Therefore, yolov8s and the improved model accuracy under comprehensive evaluation are better than the comparison algorithm, providing model support for the rapid detection application of multi-objective sugarcane diseases in complex environments.
The yolov8s-seg model was used to train the case segmentation of sugarcane deficiency image. Under the same experimental environment, four case segmentation models, yolov8n-seg, yolov8m-seg, yolov8l-seg, and yolov8x-seg, were compared at the same time. The yolov8s-seg model is superior to other algorithms in mAP_0.5, recall, model depth, parameter quantity, and gradient. In general, yolov8s-seg has high detection accuracy, a lighter model, and other advantages, which makes it more suitable to be an effective tool for sugarcane deficiency case segmentation. At the same time, it can provide a reference for deploying intelligent detection applications of sugarcane deficiency on mobile terminal detection devices such as unmanned aerial vehicles.

Author Contributions

Conceptualization, J.S., Z.L. and T.L.; methodology, J.S., Z.L. and Y.Q.; validation, J.S. and Z.L.; formal analysis, F.L. and T.L.; investigation, J.S., Z.L. and F.L.; resources, F.L. and T.L.; data curation, J.S., Z.L., Y.Q. and F.L.; writing—original draft preparation, J.S., Z.L., Y.S. and Y.Q.; writing—review and editing, J.S., F.L., Y.S. and T.L.; supervision, Y.Q. and F.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Open Fund Grant Open Fund of Yunnan Agricultural University and Yunnan Provincial Key Laboratory of Computer Technology Application, the Yunnan Provincial Science and Technology Major Project (Grant Nos. 202302AE090020, 202202AE090021, 202002AE090010), the Yunnan Provincial Academic Leader Scholarship (Grant No. 202405AC350108), the Yunnan Provincial Key Laboratory of Crop Production and Intelligent Agriculture Open Fund (Grant No: 2021ZHNY02), the Basic Research Special-Project at the Top (Approval No.: 202401AT070253), and the Open Fund Project of Key Laboratory of Computer Technology Application in Yunnan Province, Kunming University of Science and Technology (Approval No.: 2022105).

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tao, T.; Wei, X. A hybrid CNN–SVM classifier for weed recognition in winter rape field. Plant Methods 2022, 18, 29. [Google Scholar] [CrossRef]
Sun, C.; Zhou, X.; Zhang, M.; Qin, A. SE-VisionTransformer: Hybrid Network for Diagnosing Sugarcane Leaf Diseases Based on Attention Mechanism. Sensors 2023, 23, 8529. [Google Scholar] [CrossRef] [PubMed]
Aghighi, H.; Azadbakht, M.; Ashourloo, D.; Shahrabi, H.S.; Radiom, S. Machine learning regression techniques for the silage maize yield prediction using time-series images of Landsat 8 OLI. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 4563–4577. [Google Scholar] [CrossRef]
Sorjamaa, A.; Hao, J.; Reyhani, N.; Ji, Y.; Lendasse, A. Methodology for long-term prediction of time series. Neurocomputing 2007, 70, 2861–2869. [Google Scholar] [CrossRef]
Wen, C.; Guyer, D.E.; Li, W. Local feature-based identification and classification for orchard insects. Biosyst. Eng. 2009, 104, 299–307. [Google Scholar] [CrossRef]
Wen, C.; Guyer, D. Image-based orchard insect automated identification and classification method. Comput. Electron. Agric. 2012, 89, 110–115. [Google Scholar] [CrossRef]
Liu, T.; Chen, W.; Wu, W.; Sun, C.; Guo, W.; Zhu, X. Detection of aphids in wheat fields using a computer vision technique. Biosyst. Eng. 2016, 141, 82–93. [Google Scholar] [CrossRef]
Sasaki, Y.; Okamoto, T.; Imou, K.; Torii, T. Automatic diagnosis of plant disease recognition between healthy and diseased leaf. J. Jpn. Soc. Agric. Mach. 1999, 61, 119–126. [Google Scholar]
Xie, C.; Zhang, J.; Li, R.; Li, J.; Hong, P.; Xia, J.; Chen, P. Automatic classification for field crop insects via multiple-task sparse representation and multiple-kernel learning. Comput. Electron. Agric. 2015, 119, 123–132. [Google Scholar] [CrossRef]
Espinoza, K.; Valera, D.L.; Torres, J.A.; López, A.; Molina-Aiz, F.D. Combination of image processing and artificial neural networks as a novel approach for the identification of Bemisia tabaci and Frankliniella occidentalis on sticky traps in greenhouse agriculture. Comput. Electron. Agric. 2016, 127, 495–505. [Google Scholar] [CrossRef]
Wang, Z.; Chu, G.; Zhang, H.; Liu, S.; Huang, X.; Gao, F.; Zhang, C.; Wang, J. Identification of diseased empty rice panicles based on Haar-like feature of UAV optical image. Trans CSAE 2018, 34, 73–82. [Google Scholar]
Yigit, E.; Sabanci, K.; Toktas, A.; Kayabasi, A. A study on visual features of leaves in plant identification using artificial intelligence techniques. Comput. Electron. Agric. 2019, 156, 369–377. [Google Scholar] [CrossRef]
Ratnasari, E.K.; Mentari, M.; Dewi, R.K.; Ginardi, R.H. Sugarcane leaf disease detection and severity estimation based on segmented spots image. In Proceedings of the International Conference on Information, Communication Technology and System (ICTS), Bandung, Indonesia, 28–30 May 2014; pp. 93–98. [Google Scholar]
Zhang, Z.; He, X.; Sun, X.; Guo, L.; Wang, J.; Wang, F. Image recognition of maize leaf disease based on GA-SVM. Chem. Eng. Trans. 2015, 46, 199–204. [Google Scholar]
Hossain, E.; Hossain, M.F.; Rahaman, M.A. A color and texture based approach for the detection and classification of plant leaf disease using KNN classifier. In Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019; pp. 1–6. [Google Scholar]
Wang, D.; Wang, J. Crop disease classification with transfer learning and residual networks. Trans. Chin. Soc. Agric. Eng 2021, 37, 199–207. [Google Scholar]
Li, X.; Ma, B.; Yu, G.; Chen, J.; Li, Y.; Li, C. Surface defect detection of Hami melon using deep learning and image processing. Trans. Chin. Soc. Agric. Eng 2021, 37, 223–232. [Google Scholar]
Hou, J.; Fang, L.; Wu, Y.; Li, Y.; Xi, R. Rapid identification of ginger seed buds and it’s orientation determination based on deep learning. Trans. Chin. Soc. Agric. Eng 2021, 37, 213–222. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
Chen, H.; Zhu, Y.; Sun, D.; Zhai, L.; Wan, L.; Ma, Z.; Liu, Z.; He, Y. CURRENT Status and Prospects of Deep Learning in Plant Phenotyping Research. Trans. Chin. Soc. Agric. Eng 2020, 36, 1–16. [Google Scholar]
Fan, X.; Xu, Y.; Zhou, J.; Li, Z.; Peng, X.; Wang, X. Leaf disease detection system for grapes based on migration learning and improved CNNs. Trans. Chin. Soc. Agric. Eng 2021, 37, 151–159. [Google Scholar]
Lin, J.; Wu, X.; Chai, Y.; Yin, H. A review of structural optimisation of convolutional neural networks. Acta Autom. Sin. 2020, 46, 24–37. [Google Scholar]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Sun, H.; Li, S.; Li, M.; Liu, H.; Qiao, L.; Zhang, Y. Research Progress in Agricultural Information Imaging Perception and Deep Learning Application. Trans. Chin. Soc. Agric. Mach. 2020, 51, 1–17. [Google Scholar]
Wan, S.; Goudos, S. Faster R-CNN for multi-class fruit detection using a robotic vision system. Comput. Netw. 2020, 168, 107036. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Part I 14. Springer: Berlin/Heidelberg, Germany, 2016; pp. 21–37. [Google Scholar]
Yuan, P.; Lai, W.; Ren, S.; Xu, H. Chrysanthemum flower type and variety recognition based on convolutional neural network. Trans. Chin. Soc. Agric. Eng 2018, 34, 152–158. [Google Scholar]
Zhou, Y.; Xu, T.; Zheng, W.; Deng, H. Classification and identification of major organs of tomato based on deep convolutional neural network. Trans. Chin. Soc. Agric. Eng 2017, 33, 219–226. [Google Scholar]
Quiroz, I.A.; Alférez, G.H. Image recognition of Legacy blueberries in a Chilean smart farm through deep learning. Comput. Electron. Agric. 2020, 168, 105044. [Google Scholar] [CrossRef]
Wang, C.; Wu, X.; Li, Z. Multi-scale hierarchical feature extraction based on convolutional neural network to identify maize weeds. Trans. Chin. Soc. Agric. Eng 2018, 34, 144–151. [Google Scholar]
Huang, S.; Sun, C.; Qi, L.; Ma, X.; Wang, W. A deep convolutional neural network-based method for rice blast detection. Trans. Chin. Soc. Agric. Eng 2017, 33, 169–176. [Google Scholar]
Li, X.; Pan, J.; Xie, F.; Zeng, J.; Li, Q.; Huang, X.; Liu, D.; Wang, X. Fast and accurate green pepper detection in complex backgrounds via an improved Yolov4-tiny model. Comput. Electron. Agric. 2021, 191, 106503. [Google Scholar] [CrossRef]
Li, L.; Zhang, S.; Wang, B. Plant disease detection and classification by deep learning—A review. IEEE Access 2021, 9, 56683–56698. [Google Scholar] [CrossRef]
Abade, A.; Ferreira, P.A.; de Barros Vidal, F. Plant diseases recognition on images using convolutional neural networks: A systematic review. Comput. Electron. Agric. 2021, 185, 106125. [Google Scholar] [CrossRef]
Gu, B.; Wen, C.; Liu, X.; Hou, Y.; Hu, Y.; Su, H. Improved YOLOv7-Tiny Complex Environment Citrus Detection Based on Lightweighting. Agronomy 2023, 13, 2667. [Google Scholar] [CrossRef]
Xie, J.; Peng, J.; Wang, J.; Chen, B.; Jing, T.; Sun, D.; Gao, P.; Wang, W.; Lu, J.; Yetan, R.; et al. Litchi Detection in a Complex Natural Environment Using the YOLOv5-Litchi Model. Agronomy 2022, 12, 3054. [Google Scholar] [CrossRef]
Peng, Y.; Zhong, W.; Peng, Z.; Tu, Y.; Xu, Y.; Li, Z.; Liang, J.; Huang, J.; Liu, X.; Fu, Y. Enhanced Estimation of Rice Leaf Nitrogen Content via the Integration of Hybrid Preferred Features and Deep Learning Methodologies. Agronomy 2024, 14, 1248. [Google Scholar] [CrossRef]
Zhang, X.; Qiao, Y.; Meng, F.; Fan, C.; Zhang, M. Identification of maize leaf diseases using improved deep convolutional neural networks. IEEE Access 2018, 6, 30370–30377. [Google Scholar] [CrossRef]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 213–229. [Google Scholar]
Wang, Y.; Xu, Z.; Wang, X.; Shen, C.; Cheng, B.; Shen, H.; Xia, H. End-to-end video instance segmentation with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 8741–8750. [Google Scholar]
Hirani, E.; Magotra, V.; Jain, J.; Bide, P. Plant Disease Detection Using Deep Learning. Int. J. Recent Technol. Eng. 2021, 9, 909–914. [Google Scholar]
Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. Ghostnet: More features from cheap operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
Paleti, L.; Nagasri, A.; Sunitha, P.; Sandya, V.; Sumallika, T.; Kandukuri, P.; Kumar, K.K. Sugar Cane Leaf Disease Classification and Identification Using Deep Machine Learning Algorithms. J. Theor. Appl. Inf. Technol. 2023, 101, 6460–6472. [Google Scholar]
da Silva Simões, I.O.P.; de Freitas, R.G.; Cursi, D.E.; Chapola, R.G.; do Amaral, L.R. Recognition of Sugar Cane Orange and Brown Rust Through Leaf Image Processing. Smart Agric. Technol. 2023, 4, 10085. [Google Scholar]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Plant disease detection and classification by deep learning. Plants 2019, 8, 468. [Google Scholar] [CrossRef]
Daphal, S.D.; Koli, S.M. Enhanced deep learning technique for sugarcane leaf disease classification and mobile application integration. Heliyon 2024, 10, e29438. [Google Scholar] [CrossRef]

Figure 1. Image display of five diseases and two deficiencies. (a) Sugarcane brown stripe. (b) Sugarcane rust. (c) Sugarcane brown spot. (d) Sugarcane ring spot. (e) Sugarcane red rot. (f) Sugarcane sulfur deficiency. (g) Sugarcane phosphorus deficiency.

Figure 2. Data augmentation techniques for disease image analysis.

Figure 3. Data augmentation processing for images of mineral deficiency.

Figure 4. Technology roadmap.

Figure 5. Yolov8 network structure diagram.

Figure 6. Yolov8-seg network structure diagram.

Figure 7. Structural diagram of the improved EF-yolov8s model.

Figure 8. EF-yolov8s target detection results.

Figure 9. Comparison of the curves before and after the improvement of the Yolov8s model: (a) precision; (b) recall; (c) F1 score.

Figure 10. The Yolov8s-seg image segmentation model demonstrates its inference capabilities on videos of sugarcane nutrient deficiency disorders.

Table 1. Sugarcane disease image data set.

Disease Name	Number of Pictures	Number of Images after Data Enhancement	Number of Category Tags	Number of Labels after Data Enhancement
Sugarcane brown stripe	970	2910	4106	12,318
Sugarcane rust	852	2556	3434	10,302
Sugarcane brown spot	1193	3579	17,823	53,469
Sugarcane ring spot	1130	3390	5376	16,128
Sugarcane red rot	2125	6375	5012	15,036
Aggregate	6270	18,810	35,751	107,253
Sugarcane sulfur deficiency	1019	3057	18,685	56,056
Sugarcane phosphorus deficiency	667	2001	6111	18,333
Aggregate	1686	5058	24,796	74,389

Table 2. Performance analysis table of seven sugarcane disease detection models.

Model	mAP_0.5	Precision	Recall	F1	Depths	Total Parameters	Gradients	FLOPS (G)
Yolov5n	67.80%	66.40%	63.50%	64.00%	193	2,503,919	2,509,423	7.1
Yolov7	64.10%	62.40%	63.70%	63.00%	407	37,216,250	37,216,250	105.2
Yolov8n	70.60%	67.60%	66.40%	67.00%	168	3,006,623	3,011,807	8.1
Yolov8s	85.70%	84.56%	81.12%	83.00%	225	11,137,535	11,137,519	28.7
Yolov8m	77.40%	75.00%	71.90%	73.00%	295	25,859,215	25,859,199	79.1
Faster R-CNN	62.59%	45.31%	73.60%	51.00%	19	136,770,964		401.788
Libra R-CNN	65.30%	51.40%	69.30%	59.00%	50	25,564,732

Table 3. Performance analysis of various deep learning algorithms in the detection of five sugarcane diseases.

Model	Disease	Precision	Recall	Map_0.5	Map_0.5–0.95
Yolov5n	red rot disease	71.40%	73.20%	77.00%	39.30%
	ring spot disease	66.80%	64.10%	66.80%	29.20%
	rust disease	58.70%	65.30%	63.60%	32.20%
	brown spot disease	66.10%	70.60%	70.70%	30.80%
	brown stripe disease	69.30%	44.30%	61.10%	29.90%
Yolov7	red rot disease	54.30%	68.30%	64.10%	25.70%
	ring spot disease	52.90%	63.30%	57.90%	21.30%
	rust disease	45.10%	49.00%	44.70%	19.20%
	brown spot disease	50.80%	71.60%	60.40%	22.20%
	brown stripe disease	40.50%	39.70%	36.90%	11.80%
Yolov8n	red rot disease	72.80%	76.30%	80.40%	42.40%
	ring spot disease	69.50%	64.60%	69.00%	30.80%
	rust disease	61.90%	68.10%	67.70%	34.80%
	brown spot disease	67.00%	71.50%	72.00%	31.80%
	brown stripe disease	66.80%	51.70%	64.00%	32.60%
Yolov8s	red rot disease	88.60%	81.10%	85.70%	53.70%
	ring spot disease	82.90%	74.70%	81.80%	44.00%
	rust disease	79.70%	75.80%	81.70%	50.70%
	brown spot disease	81.20%	77.30%	82.90%	42.50%
	brown stripe disease	90.60%	90.30%	90.70%	73.10%
Yolov8m	red rot disease	81.20%	82.20%	86.80%	50.50%
	ring spot disease	77.10%	70.50%	76.40%	37.70%
	rust disease	70.40%	70.40%	74.10%	41.40%
	brown spot disease	74.90%	72.60%	77.40%	35.20%
	brown stripe disease	71.30%	64.50%	72.50%	42.30%
Faster R-CNN	red rot disease	53.30%	78.40%	67.60%	34.40%
	ring spot disease	44.50%	68.70%	55.30%	30.60%
	rust disease	34.80%	65.80%	48.00%	28.50%
	brown spot disease	32.40%	74.50%	63.80%	27.30%
	brown stripe disease	61.90%	77.90%	77.80%	37.20%
Libra R-CNN	red rot disease	54.80%	74.00%	71.30%	36.70%
	ring spot disease	47.90%	65.90%	62.80%	29.40%
	rust disease	45.60%	58.80%	49.50%	27.50%
	brown spot disease	51.90%	68.30%	69.30%	29.10%
	brown stripe disease	56.90%	79.70%	72.10%	38.10%

Table 4. Performance analysis table of the improved EF-sugarcane disease detection model.

Model	mAP_0.5	Precision	Recall	F1	Depths	Total Parameters	Gradients	FLOPS (G)
EF-yolov8s	89.70%	88.70%	86.00%	88.00%	249	11,191,743	11,191,727	29.5
EF-yolov8m	90.65%	90.37%	86.62%	88.00%	319	25,940,431	25,940,415	80.6
EF-yolov8n	69.30%	68.80%	65.70%	67.00%	192	3,007,519	3,012,703	8.1
EF-yolov7	84.67%	82.50%	80.19%	81.00%	295	6,025,868	6,025,868	13.3
EF-yolov5n	67.20%	67.10%	63.70%	67.20%	217	2,504,815	2,510,319	7.1

Table 5. Performance analysis of various EF-yolo series algorithms in the detection of five sugarcane diseases.

Model	Disease	Precision	Recall	Map_0.5	Map_0.5–0.95
EF-yolov8s	red rot disease	92.90%	90.30%	93.60%	67.20%
	ring spot disease	87.30%	81.90%	88.00%	54.40%
	rust disease	88.00%	81.70%	88.30%	61.00%
	brown spot disease	84.90%	84.30%	88.00%	52.30%
	brown stripe disease	93.30%	92.30%	91.70%	78.40%
EF-yolov8m	red rot disease	91.80%	88.70%	92.80%	69.50%
	ring spot disease	88.90%	84.10%	89.20%	60.10%
	rust disease	89.20%	83.90%	90.40%	67.00%
	brown spot disease	88.50%	85.40%	90.00%	58.80%
	brown stripe disease	93.40%	90.90%	90.80%	80.20%
EF-yolov8n	red rot disease	74.10%	70.80%	75.90%	42.60%
	ring spot disease	67.40%	55.20%	60.00%	27.20%
	rust disease	57.70%	55.30%	57.80%	31.50%
	brown spot disease	66.90%	63.80%	68.00%	30.50%
	brown stripe disease	77.80%	83.60%	84.70%	65.40%
EF-yolov7	red rot disease	85.60%	83.50%	89.10%	48.80%
	ring spot disease	81.10%	72.80%	80.20%	37.40%
	rust disease	80.80%	75.40%	82.70%	44.10%
	brown spot disease	76.70%	80.50%	82.10%	36.80%
	brown stripe disease	88.40%	88.80%	89.30%	63.00%
EF-yolov5n	red rot disease	71.50%	67.10%	72.40%	39.60%
	ring spot disease	66.90%	53.40%	58.80%	25.90%
	rust disease	57.60%	53.20%	55.90%	29.60%
	brown spot disease	63.70%	65.90%	66.80%	29.40%
	brown stripe disease	75.80%	78.90%	82.00%	61.70%

Table 6. Comparison table of experimental results for Yolov8s-seg semantic segmentation in deficiency symptoms image segmentation.

Model	Map_0.5	Precision	Recall	F1	Depths	Total Parameters	Gradients	FLOPS (G)
yolov8n-seg	75.10%	71.90%	73.10%	72.00%	261	3,264,006	3,263,990	12.1
yolov8s-seg	80.30%	74.90%	79.00%	72.00%	261	11,790,870	11,790,854	42.7
yolov8m-seg	78.40%	74.62%	77.55%	76.00%	331	27,240,806	27,240,790	110.4
yolov8l-seg	78.28%	75.69%	77.72%	77.00%	401	45,937,590	45,937,574	220.8
yolov8x-seg	79.38%	76.56%	78.41%	77.00%	401	71,752,774	71,752,758	344.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, J.; Li, Z.; Li, F.; Shen, Y.; Qian, Y.; Li, T. EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment. Agronomy 2024, 14, 2099. https://doi.org/10.3390/agronomy14092099

AMA Style

Sun J, Li Z, Li F, Shen Y, Qian Y, Li T. EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment. Agronomy. 2024; 14(9):2099. https://doi.org/10.3390/agronomy14092099

Chicago/Turabian Style

Sun, Jihong, Zhaowen Li, Fusheng Li, Yingming Shen, Ye Qian, and Tong Li. 2024. "EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment" Agronomy 14, no. 9: 2099. https://doi.org/10.3390/agronomy14092099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

EF yolov8s: A Human–Computer Collaborative Sugarcane Disease Detection Model in Complex Environment

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition and Data Enhancement

2.1.1. Image Acquisition

2.1.2. Image Enhancement

2.2. Experimental Design

2.3. Construction of Sugarcane Disease Detection Model

2.3.1. Construction of Common Disease Detection Model and Disease Segmentation Model Based on yolov8s

2.3.2. Construction of Improved EF-yolov8s Sugarcane Disease Detection Model

2.4. Evaluation Metrics

3. Results

3.1. Analysis of Experimental Results of Different Depth Learning Models

3.1.1. Performance Analysis of Different Depth Learning Algorithms in Sugarcane Disease Detection

3.1.2. Performance Comparison and Analysis of Different Depth Learning Algorithms in Different Sugarcane Disease Detection

3.2. Analysis of Experimental Results of Improved Deep Learning Model

3.2.1. Overall Analysis of Sugarcane Disease Detection Results after Model Improvement

3.2.2. Analysis of Detection Results of Different Sugarcane Diseases after Model Improvement

3.2.3. Performance Comparison before and after Model Improvement

3.3. Analysis of Experimental Results of Monitoring Nutrient Deficiency Symptoms of Sugarcane Tip Growth

4. Discussion

4.1. Analysis of the Practicality of a Sugarcane Disease Detection Model Based on Human–Machine Collaboration in Complex Environments

4.2. Improved EF yolov8s New Method for Constructing Sugarcane Disease Detection Model

4.3. Application Potential of Sugarcane Tip Growth Monitoring Intelligent Model in Nutrient Deficiency Symptoms

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI