1. Introduction
China is the world’s largest producer of fruit, with the largest crop area and output of apples in the world. As one of the most widely consumed fruits in the country, apples play a vital role in economic development and people’s daily lives. However, during the growth of apples, a variety of diseases, such as apple rust, early defoliation, and scab, can occur on the apple leaf area due to natural environmental factors. These diseases seriously hinder the normal growth of apples, affect their yield and quality, and cause significant economic losses to the apple industry. At present, identifying the type of apple leaf diseases requires a manual and experienced approach. This results in low accuracy, is time-consuming and laborious, and results in the spraying of large amounts of pesticide and a low utilization rate, making it difficult to meet the needs of large-scale production.
Due to the complex symptoms of apple leaf diseases, the wrong assessment will lead to the overuse of pesticides, which will not only fail to prevent and control the disease but also lead to a decline in yield and quality, causing environmental problems. Therefore, the automatic identification of apple leaf diseases can provide an effective reference for apple disease control. Therefore, it is of great significance to realize the intelligent, rapid, and accurate identification of apple leaf diseases.
In recent years, many scholars have attempted to use machine learning techniques to design apple leaf disease recognition algorithms. For example, the feature vector is established by using information about color, shape, and texture, then constructing the model for disease recognition [
1,
2]. For example, Wang et al. [
3] took three kinds of common apple leaf diseases as the research object, used an improved SVM classifier to identify apple leaf diseases, and finally realized an apple leaf disease recognition system. Shi et al. [
4] achieved the recognition of three kinds of apple leaf diseases by using two-dimensional subspace learning dimension reduction (2DSLDR) based on differences in the shape, color, and texture of images of disease spots corresponding to different apple leaf diseases. Song et al. [
5] proposed a one-to-one voting SVM strategy, which can effectively identify three kinds of apple leaf diseases, namely mosaic disease, rust disease, and Alternaria leaf spot disease. Wang et al. [
6] applied a support vector machine optimized with a genetic algorithm for the recognition of apple mosaic disease, rust disease, and black spot disease and achieved a good recognition effect. Yu et al. [
7] designed a two-layer structure model to identify apple leaf diseases.
Based on the above research, we found that traditional image processing methods mainly rely on expert experience to manually extract information on the color, texture, and shape of leaf disease images. Due to the complexity and diversity in the image background and disease spots of actually diseases leaves, the methods of artificial feature design and selection using traditional image processing technology are aimed at specific datasets. Therefore, the established disease recognition model is not universal, and the model migration ability is poor. However, in recent years, with the rise of deep learning techniques, CNNs (convolutional neural networks) have become a research focus for the automatic identification of agricultural plant diseases and insect pests. For example, Pallagani et al. [
8] took 14 crop varieties and 26 diseases from the PlantVillage dataset as research objects, for which the recognition accuracy of the ResNet50 model reached 99.24%, and developed a dCrop app based on Android Studio. Albattah et al. [
9] proposed an improved one-stage detector, CenterNet, with a 99.982% average accuracy on the PlantVillage dataset.
Some scholars focused their research on apple leaf diseases, as follows. Liu et al. [
10] took four kinds of apple leaf diseases (leaf spot, rust, brown spot, and mosaic) as research objects and improved the original AlexNet network, with an overall accuracy of 97.62%. Baranwal et al. [
11] proposed a new CNN based on LeNet-5 to identify black rot, scab, rust, and healthy apple leaves with an accuracy of 98.54%. Jiang et al. [
12] proposed a CNN based on GoogLeNet’s Inception structure and Rainbow Connection structure, with which apple leaf diseases, including brown spot, gray spot, mosaic, and blotch, were identified, and the detection accuracy was 78.80%.
However, existing studies have only constructed a classification model for multiple single apple leaf diseases, and few researchers have considered the simultaneous occurrence of two diseases on a single leaf. In this case, when the symptoms of one disease are more pronounced than those of the other, the causes and symptoms of both diseases are very similar. Additionally, this similarity is the main challenge in establishing a classification model for apple leaf diseases [
10,
13].
The previously described methods do not fully cover these challenges, as they focus on late stages of the disease, cannot deal with diseases with similar symptoms, and do not support the simultaneous detection of different diseases on the same plant.
Apple trees are often affected by various diseases during their growth process; as such, many diseases may occur on the same leaf, and when the symptoms of one disease are more pronounced than those of other diseases, the diseases have very similar symptoms, which is the main challenge in building a multi-disease classification model for single apple leaves. Therefore, in this case, it is necessary to study a neural network model that can effectively extract and distinguish these fine-grained features.
There have been studies on multiple diseases on the same apple leaf, such as [
14], which studied four types of apple leaf diseases in the Plant Pathology 2020 challenge dataset using the ResNet50 network model, with high recognition accuracy for three single-leaf and single-disease categories. However, the classification accuracy for the same leaf containing multiple disease symptoms was only 51%. Bansal et al. [
15] proposed an ensemble of pre-trained DenseNet121, EfficientNetB7, and EfficientNet NoisyStudent that aims to classify four classes of diseased apple leaves (healthy, apple scab, apple cedar rust, and multiple diseases), achieving an accuracy of 96.25% on the test set but 90% accuracy for multiple diseases.
Therefore, in order to improve the accuracy of deep neural networks for multiple diseases appearing on the same apple leaf, an optimized RegNet [
16] network was proposed. This study took seven common apple leaf diseases as the research objects, including healthy leaves, rust leaves, scab leaves, ring rot leaves,
Panonychus ulmi disease leaves, both rust and scab diseases leaves, and both
Panonychus ulmi symptomatic disease and ring rot disease leaves, using the optimized RegNet model to identify the small-sample-size apple leaf disease dataset collected in this study.
3. Results and Analysis
3.1. Effect of Data Enhancement on Model Performance
Data enhancement technology can generate more data from limited data; increase the number, as well as the diversity, of samples; and, thus, improve the generalization ability of the model. In view of the problem of there being a small amount of data in this study, in order to avoid the overfitting phenomenon, a data enhancement operation is needed. The purpose of this section is to investigate the effect of different data enhancement methods on the generalization performance of the model.
To study the effects of different image enhancement methods on the performance of the model, the baseline network ResNet50 model was used to study the influence of three different data enhancement methods (offline augmentation, online augmentation, and offline augmentation before online augmentation) on the generalization performance.
Figure 5 shows the validation accuracy and loss function curves of the baseline network ResNet50 model on Dataset 1 and Dataset 2 using the same training method (transfer learning training for all layers) in six sets of experiments with different data enhancement methods.
In the figure, dataset1 and dataset2 represent the original datasets without any data enhancement methods; dataset1+A and dataset2+A are datasets expanded by offline data enhancement for Dataset 1 and Dataset 2, respectively; dataset1+B and dataset2+B are datasets expanded by online data enhancement for Dataset 1 and Dataset 2, respectively; and dataset1+A+B and dataset2+A+B represent Dataset 1 and Dataset 2 enhanced and expanded with offline and online data augmentation, respectively. The solid line represents the accuracy curve and loss function curve of ResNet50 on Dataset 1, and the line with data marker points represents the accuracy curve and loss function curve of ResNet50 on Dataset 2.
Overall, it is clear from
Figure 5 that some augmentation methods a greater impact on the performance of the model on the same dataset. For example, for Dataset 1, using offline augmentation method proved to be much better than using the other two augmentation methods, the effects of which are not very different. For Dataset 2, the validation accuracy of the three different methods of augmentation varied greatly, and offline augmentation was the best, while online augmentation was the worst. It can be concluded that the model can achieve better classification performance by only using offline augmentation rather than the other two augmentation methods. This may be because the online dynamic data expansion method saves the significant amount of space needed to store the expanded data, enriches data diversity, and can reduce the overfitting phenomenon of the model; however, to some extent, the sample distribution of the original dataset is destroyed, and the training volatility is increased.
3.2. Effect of Image Background on Model Performance
This section aims to investigate the impact of different image backgrounds on model performance. Due to the small amount of data in the apple leaf disease dataset used in this study, in order to avoid overfitting, Dataset 1 and Dataset 2 were expanded using offline enhancement. This section describes the use of the RegNet and ResNet50 models and a series of improved models (such as ResNeXt50 and ResNeSt), as well as the analysis of the impacts of different image backgrounds on model performance.
Figure 6a,b show the validation accuracy and loss function curves on different datasets, respectively. The solid line represents the accuracy curve and loss function curve of each model on Dataset 1, and the line with data marker points represents the accuracy curve and loss function curve of each model on Dataset 2.
By observing the accuracy curves of two different datasets (that is, two curves of the same color), it can be concluded that the performance of the four models on Dataset 1 is better than that on Dataset 2, as seen in
Figure 6a. In addition, the accuracy curves on Dataset 2 converge faster and fluctuate less.
In summary, by comparing the two datasets with the same data enhancement method, it was found that the model using Dataset 2 was better than that using Dataset 1, which may be because the captured apple leaf disease images contain too much background information that is irrelevant to the target, and the clipping operation could have reduced the complexity of the image background. Therefore, the effects of different training methods and optimizers on model performance will be investigated based on Dataset 2 in subsequent work.
3.3. Comparison of Different Training Methods
There are three strategies for training. One is to initialize the model parameters randomly in the constructed model (training from scratch), and the second is to initialize the weights of the model using the weight parameters pre-trained on ImageNet and fine-tune all layers of the network structure (i.e., transfer learning strategy one). The third is to initialize the weights of the model using weights pre-trained on ImageNet, then only fine-tune the weights of fully connected layers (i.e., transfer learning strategy two). In this section, all models use the Adam optimizer are investigated.
Figure 7a–d show the accuracy curves of four network models on Dataset 2 using different training methods. In the figure, the black solid line represents transfer learning strategy one, represented by T1. Green represents transfer learning strategy two, represented by T2. The blue line represents the learning strategy trained from scratch, represented by T0.
In
Figure 7, it can be seen that there are three points to consider. First, according to the analysis of convergence speed, all four network models trained from scratch have the slowest convergence speed and exhibit significant fluctuations. The convergence speed using transfer learning strategy one is faster than that of training from scratch. Additionally, the convergence speed of transfer learning strategy two is the fastest, reaching a stable convergence state after only a dozen rounds of training. This is because transfer learning strategy two has fewer trainable parameters than transfer learning strategy one. Secondly, from the accuracy curve, it can be seen that for all models, transfer learning strategy one can achieve higher classification accuracy than transfer learning strategy two. The reason for this may be that, although the convolution modules trained on the ImageNet dataset can be used to extract image features, there are significant differences (quantity, type, and size) between the two datasets compared to the dataset used in this study. The use of transfer learning strategy two cannot achieve ideal results, while the testing accuracies of the four network models using transfer learning strategy one are significantly improved. This indicates that training only the classification layer cannot make the model adapt well to the data used in this study.
In summary, transfer learning can accelerate the convergence of the network and improve the accuracy of training and testing. Therefore, owing to the small sample size of the apple leaf disease dataset, it is necessary to adopt the training method of transfer learning to fine-tune all layer parameters.
3.4. Comparison of Different Optimizers
Classical deep learning architectures such as ResNet, ResNeXt, ResNeSt, and RegNet are trained using typical SGD optimizers, so we attempted to use other optimizers, such as Adam, RAdam, and Ranger, to compare the performance of the models.
(1) Comparison of the performance of the same model under different optimizers
Figure 8a–d illustrate the accuracy curves of four models, namely RegNet, ResNeSt, ResNet, and ResNeXt, using different optimizers on the validation set.
In
Figure 8a and
Table 2, it can be seen that the final convergence states of the RegNet network model using the four optimizers were not significantly different, except for the starting point of the model and the convergence speed. The Ranger optimizer had the highest starting point and the fastest convergence speed, and the SGD optimizer had the second fastest convergence speed after Ranger, while the Adam optimizer had the lowest starting point and the slowest convergence speed.
The ResNeSt network model achieved the worst performance using the SGD optimizer model, with the lowest starting point of the accuracy curve and the lowest convergence state. The model’s accuracy curve fluctuates the most under the RAdam optimizer, and the maximum accuracy of the model was the lowest, as presented in
Figure 8b. However, the effect difference between the Adam and Ranger optimizers was small, and the final convergence state of the model under the Ranger optimizer was better than that under the Adam optimizer. Therefore, the performance of the model was influenced by the optimizer. In conclusion, the Ranger optimizer is the best-suited optimizer for the ResNeSt network model.
According to
Figure 8c,d and
Table 2, the use of the Ranger optimizer had the highest starting point, fastest convergence speed, and highest accuracy on the test set for the ResNet and ResNeXt models.
In summary, of the four models, SGD achieved the worst performance; RAdam had the largest fluctuation; and Ranger had the best effect, the highest accuracy, the most stable convergence, and the fastest convergence. This shows that Ranger has the best universality.
(2) Comparison of performance of different models under the same optimizer
Figure 9 illustrates the accuracy curves of the RegNet, ResNeSt, ResNet, and ResNeXt models using Ranger on the validation set.
Figure 9 illustrates that, with other models, RegNet had the fastest convergence speed and achieved good convergence in the 16th round, while other models mostly started to achieve good convergence in the 25th to 30th rounds. Additionally, the RegNet network model had a higher convergence starting point, faster convergence speed, and higher accuracy than the other three models. The accuracy curves of the other three network models showed little difference, and with the ResNet network model, there was a relatively large oscillation in the accuracy curve. According to the above analysis, the recognition accuracy and convergence of the RegNet model on the apple leaf dataset were better than those of the other models. Therefore, the well-performing RegNet network model equipped with a universal Ranger optimizer will be selected for subsequent studies.
3.5. Test of Model Generalization Performance
In order to verify the generalization performance of the RegNet model, a comparative analysis was conducted on the test results of the models trained on Datasets 1 and 2. Additionally, four confusion matrixes were obtained, as presented in
Figure 10a–d.
The confusion matrix seen in
Figure 10a (dataset1-dataset1) represents the results trained on dataset1 and tested on dataset1,
Figure 10b (dataset1-dataset2) represents the results obtained by the model trained on Dataset 1 and tested on Dataset 2,
Figure 10c (dataset2-dataset1) represents the results obtained by the model trained on Dataset 2 and tested on Dataset 1, and
Figure 10d (dataset2-dataset2) represents the results obtained by the model trained on Dataset 2 and tested on Dataset 2. The main diagonal numbers are the numbers of correctly predicted sample images, and the numbers in other positions correspond to the numbers of incorrectly predicted sample images. Based on the confusion matrix, the generalization performance of the models was evaluated using four indicators, namely precision, recall, specificity, and accuracy. The specific results are detailed in
Table 3.
As shown in
Figure 10a, the recognition accuracy of the model tested on Dataset 1 was 99.74%, and only one sample was misclassified, namely the sample for which a leaf with both scab and rust was misclassified as rust. The overall recognition accuracy of the model on Dataset 2 was 90.77%, and a total of 36 samples were misclassified, which is much lower than the accuracy of the model tested on Dataset 1 (99.74%).
In
Figure 10, the confusion matrix (d) shows that the overall recognition accuracy of the model is 99.23%. Except for three samples that were misclassified, all other samples were correctly identified. One sample with
Panonychus ulmi symptoms was misclassified as
Panonychus ulmi symptoms and ring rot, one sample with both
Panonychus ulmi symptom and ring rot was misclassified as ring rot, and another sample with ring spot disease was misclassified as having both
Panonychus ulmi symptoms and ring rot. As can be seen from the confusion matrix (c), the overall recognition accuracy of the model tested on Dataset 1 was only 93.85%, with a total of 24 samples misclassified, which is far lower than the accuracy of the model tested on Dataset 2 (99.23%).
In fact, the test accuracy of the model trained on different datasets is above 90%, which shows that the model achieved good generalization performance. After careful analysis, it was found that the model trained on Dataset 2 had a good prediction ability for both Dataset 1 and Dataset 2. Therefore, the model trained on Dataset 2 has a stronger generalization ability and can be better applied in actual production. The same conclusion can be drawn from the evaluation indicators (precision, recall, specificity, and accuracy) listed in
Table 3.
4. Discussion
This paper proposes a lightweight RegNet model for optimized for operation in complex environments through a series of improvements. The model effectively solves the problem of multiple diseases on a single leaf seen in images with complex backgrounds.
The following conclusions are drawn from the experimental results. First, the two transfer learning training methods can significantly accelerate network convergence and improve classification performance. Second, background cropping can improve the model’s extraction of disease feature information. Finally, the models trained with Ranger were more robust and accurate.
More precisely, the optimized RegNet model achieves better disease identification for images with complex backgrounds, with an accuracy of 99.23% on the test set and 99.1% accuracy on images showing multiple diseases. In addition, our model achieves significant improvement in identifying multiple diseases on a single leaf compared with the models proposed in references [
15,
17], which reported accuracies of 51% and 90, respectively. The main reason classification errors occur when two diseases can be found on the same leaf and when two diseases are found separately is that the symptoms of one disease may be very similar when the symptoms of other diseases are more pronounced. This similarity is the main challenge associated with the proposed classification model when considering single apple leaves and multiple diseases.
Although our model achieves better recognition of multiple diseases on single leaves in complex backgrounds, there are some undeniable shortcomings identified in this study in terms of data acquisition and processing. First, the model had strong uncertainty and poor interpretability due to the insufficient diversity in the disease dataset (such as early symptoms of diseases). Secondly, the performance of the model on other plant disease datasets was not verified. Finally, there is still room for research on applications embedded in mobile devices. Therefore, it is hoped that in the future, more in-depth research will be conducted on the above-mentioned issues using existing research methods.
5. Conclusions
In order to solve the problem of feature extraction being time-consuming and laborious when using traditional recognition methods, this paper studied the images of seven kinds of diseased apple leaves (including situations wherein multiple diseases occur on the same leaf) by using an improved RegNet depth convolution neural network model. The effects of training mode, data expansion methods, optimizer selection, image background, and other factors were compared and analyzed. The conclusions are as follows:
(1) It was concluded that using only offline augmentation is better than online expansion and a combination of offline and online expansion.
(2) The effects of training methods, image backgrounds, and the choice of optimizer on the performance of four classification models, namely ResNet50, ResNeXt50, ResNeSt, and RegNet, were studied. The experimental results show that, compared to training from scratch, the two transfer learning training methods can significantly accelerate network convergence, improve classification performance, and shorten the training time of the model. Secondly, background cropping can, to some extent, eliminate background information, improve the model’s performance in extracting disease feature information, thereby improving the overall performance of the model. Finally, the four models trained with the Ranger optimizer were more stable and accurate, and RegNet with Ranger performed the best.
(3) In order to analyze the generalization performance of the RegNet network model, it was trained on Dataset 1 and Dataset 2 and tested on both datasets separately. The results show that the model trained on Dataset 2 had a good predictive ability for both Dataset 1 and Dataset 2, and it was concluded that the model trained on dataset2 had a better generalization ability.
In summary, our work supports the identification of multiple diseases on the same apple leaf under complex field backgrounds. Additionally, the results show that it has the potential to accurately identify apple diseases, which will help orchard managers save manpower input and reduce pesticide use.