1. Introduction
Crop diseases can cause irreversible damage to crop growth and are considered one of the main limiting factors for crop cultivation, and spraying pesticides is the main measure to address crop diseases. Appropriate pesticide category selection and dosage regulation can ensure effective crop disease resolution and avoid pesticide residues’ ecological impact. Therefore, accurately identifying the types and degrees of crop diseases is a prerequisite for achieving precise agricultural spraying [
1,
2,
3,
4,
5,
6,
7,
8]. In traditional methods, professionals mainly detect and identify crop diseases based on their naked eyes and experience, but it is time-consuming, laborious, and subjective. With the development of deep learning (DL) and visual perception technology, visual feature learning methods based on deep learning have become the mainstream of crop disease recognition, which realizes automatic recognition of crop diseases by extracting and learning the pest and disease features of crop images [
9,
10].
Deep learning is a branch of machine learning that mainly utilizes deep artificial neural networks to extract multilayer visual features and fuse multigranularity features of input images, thereby achieving high-level semantic learning of images [
11]. Unlike traditional machine learning methods, deep learning methods require significant computational resources, because deep artificial neural network models optimize model parameters through a large number of parameter calculations in the high-level semantic learning of images. With the rapid development of high-performance computing and image processing units, deep learning methods have been successfully applied in various fields, which has turned out to be very excellent in discovering intricate structures in high-dimensional data and is therefore applicable to many domains of science, engineering [
12,
13,
14], industries [
15,
16,
17], bioinformatics [
18,
19,
20], and agriculture [
21,
22,
23,
24,
25,
26]. Concretely, deep learning has provided many significant works in the field of plant stress phenotyping and image analysis for detection [
27,
28,
29,
30], recognition [
31,
32,
33,
34], classification [
35,
36,
37,
38], quantification [
39], and prediction [
40] in agriculture to tackle the challenges of agricultural production [
41]. And the convolutional neural network (CNN)-based approaches are arguably the most commonly used [
42].
Ferentinos developed a plant diseases detection model with a best performance of 99.5% using 87,848 images under controlled conditions [
43]. Liang et al. designed a deep plant diseases diagnosis and severity estimation network (PD2-SE-Net) model to identify plant species, diseases, and their severities with a final accuracy of 99% [
44]. They utilized the artificial intelligence (AI) Challenger [
45] images for experiment data. The approach they proposed reached an accuracy of 99.4%. Zhong et al. proposed an apple diseases classification method based on dense networks with 121 layers (DenseNet-121) and 2462 apple leaf images from AI Challenger, which achieved an accuracy of 93.71% [
46]. He et al. proposed an approach to detect oilseed rape pests based on SSD with an Inception module, which was helpful for integrated pest management [
47]. Zeng et al. introduced a self-attention mechanism to a convolutional neural network, and the accuracy of the proposed model reached 98% using 9244 diseased cucumber images [
48].
Deep convolutional neural networks have a strong ability for feature learning and expression. The above crop disease recognition methods based on CNNs have achieved good accuracies or success rates. However, the accuracy and robustness of deep learning models require training on a large amount of image data. There are two issues that need to be addressed in crop disease identification. On the one hand, there is a lack of diverse maize disease training datasets, as most of the crop disease images used in the existing methods are created under controlled or laboratory conditions. On the other hand, the complexity of existing corp disease models is high, making it difficult to meet the actual detection needs of field scenarios, and their performance in identifying fine-grained corn diseases is insufficient. Therefore, we introduced transfer learning and designed VGNet to solve the above problems. Specifically, we first collected corn disease image data from real field scenarios, covering nine types of corn diseases, which can be used for parameter optimization of fine-grained corn disease recognition models. Afterwards, we designed a relatively simple VGNet model based on the VGG16 model but with relatively high accuracy in identifying crop diseases, which can meet the disease detection needs of actual corn planting scenarios.
The reason why the VGG16 model is selected as the backbone network is that the VGG network is a straight cylinder network structure, and its computing resource consumption is significantly less than the residual network structure, which can satisfy the dual needs of speed and accuracy in real-time crop disease detection. In the VGNet method, the structure of VGG16 is modified by adding the BN, replacing two hidden fully connected layers with a GAP layer, and adding L2 normalization. Through the comparative experiment of different training methods, parameters, and datasets, the redesigned VGNet after fine-tuning achieves an accuracy of 98.3%, which can achieve a 66.8% reduction in testing time compared with the original VGG16 model. The following summary provides the main contributions of this paper:
A lightweight intelligent learning method, termed as VGNet, is proposed for multiple categories of corn disease detection.
Fine-grained corn disease images are collected and can be used for the parameter optimization of corn disease recognition models.
Evaluation results show that the accuracy of the proposed method in disease detection reaches 98.3%, which can satisfy the detection requirements of practical scenarios.
The remainder of this paper is organized as follows.
Section 2 describes the materials and methods. The experiment results of VGNet are detailed in
Section 3. In
Section 4, the discussion of VGNet for fine-grained corn disease recognition is given. Finally, the conclusions are drawn in
Section 5. Further research directions are also proposed.
3. Results
In this study, an assessment of the appropriateness of VGNet with transfer learning and fine-tuning training for the task of crop disease recognition was carried out. Our focus was to pretrain the VGG 16 Network with different public datasets and to fine-tune the newly designed VGNet model with different a training mechanism and parameters. Large open datasets like ImageNet, PlantVillage, and AI Challenger were utilized to pretrain the model; then, the weights and parameters of the convolutional layers and pooling layers were transferred to the new model and frozen. After updating the structure of VGNet, the parameters of the GAP layer, the remaining fully connected layers, and the softmax layer were retrained and fine-tuned by the new dataset obtained from corn fields. The performance of the proposed method was analyzed after five-fold cross-validation experiments to acquire convincing results. K-fold cross-validation is a common method used to test the accuracy of DL algorithms. To perform K-fold cross-validation on the overall data, the image dataset C is divided into K parts for disjoint subsets. In order to prevent data leakage, suppose the number of training samples in dataset C is M; then, the number of samples in each subset is M/K. When training the network model, one subset is selected each time as the verification set, and the other (K-1) subsets are selected as the training set, and the classification accuracy of the network model on the selected verification set can be obtained. After repeating the above process for K times, the average of classification ac-curacy is obtained as the true classification accuracy of the model. In our research, the K is set as 5, since the results of 5-fold validation and 10-fold validation are the same in the previous experimental experience.
3.1. Effects of Fine-Turning Training Mechanism
The following sections analyze the effects on model performance with a different training mechanism in the fine-tuning VGNet process, including different training methods and initial learning rates.
Table 3 shows the testing loss and accuracy of the different training mechanism in the fine-tuning process.
From
Table 3, it can be seen that six different experiments were carried out; their final loss values and accuracies of testing vary with the training methods and initial learning rate.
Figure 6 and
Figure 7 show the loss and accuracy curves of two training methods with initial learning rates of 0.01 and 0.001, respectively. As seen in
Figure 6 and
Figure 7 and
Table 3, training methods and initial learning rate have great influence on the performance of the model. By comparing experiment 1, 2, and 3 using the SGD method, it can be found that the loss value decreases as the learning rate declines, while the accuracy increases with the fall in learning rate. When the learning rate is set to 0.01, the loss value of the model test is 0.103, and the accuracy is only 85.65%. In this process, the performance is unstable, and the loss and accuracy shake violently, which can be seen by the green curves in
Figure 6. When the initial learning rate drops to 0.001, the loss value of the model test decreases to 0.061, and the accuracy is improved to 93.04%. At this time, the testing process has fewer shocks, and the model can converge at about 4500 iterations, which is described by green curves in
Figure 7. Rows 4, 5, and 6 in
Table 3 were fine-tuning-trained with the Adam optimizer. Their variation in loss value and accuracy are consistent with former experiments 1, 2, and 3. The reason is that with the aid of transfer learning, all the front layers of the network obtained good training, and the weight parameters at the initial time of training are close to the optimal state. If the initial learning rate is not set properly, the training process will shock and even diverge. If a higher learning rate (0.01) is used in the fine-tuning training phase, the model is likely to skip the optimal solution, resulting in larger loss, lower accuracy, or severe oscillation. When the initial learning rate is 0.001, the model is more stable, and its performances are much better. Therefore, when the transfer learning mechanism is applied to the training of a convolutional neural network, the initial learning rate in the fine-tuning training stage needs to be lower than that of the model trained from scratch.
Compare experiment 3 with experiment 6 in
Table 3, where the initial learning rate was set as 0.001 with the SGD algorithm and Adam optimizer, respectively. At this point, the final performance of the model was different due to the different training methods. The loss value of the model trained by the Adam optimizer is lower than that of the model trained by SGD algorithm. Furthermore, the model trained by the Adam optimizer reaches convergence first and becomes stable after 3500 iterations, which is illustrated by the red curve in
Figure 7. However, the model trained by the SGD method converges slowly, and the final loss value after convergence is 0.061, which is higher than the model trained by the Adam optimizer. Moreover, since the SGD training algorithm adjusts the weight for each data point, the network performance fluctuates up and down a lot more than the Adam optimizer during the learning process. The right part of
Figure 7 shows the variation in the accuracy of the two training methods. It can be found that the model retrained by the Adam optimizer reached an accuracy of 98.26%, while the model retrained by the SGD algorithm did not perform as well. Apparently, when the model is fine-tuned by the SGD algorithm, it is always lower than when trained by the Adam optimizer. In general, the Adam optimizer algorithm has the advantage of faster model convergence than the SGD training algorithm and is more stable in the testing process. Therefore, the Adam optimizer in the fine-tuning training stage of the model is more in line with the corn disease recognition model.
3.2. Effects of Transfer Learning on Multiple Datasets
To explore the impact of training mechanisms and different datasets in the pretraining process, four completely selfsame VGNet models were utilized in the form of learning from scratch and transfer learning, respectively. The scratched learning model only adopted the image obtained from corn fields without pretraining. The other three models utilized three different large open datasets for pretraining and parameter transfer learning. The experimental results of applying four different learning types and datasets are listed in
Table 4. From
Table 4, it can be seen that the accuracy of learning from scratch is the lowest, reaching an accuracy of 69.57%. Under the condition of transfer learning and fine-tuning learning, the model pretrained using the PlantVillage dataset has the best performance, with an accuracy of 98.26%. Since training the VGNet model from scratch needs more images and time to optimize network parameters, and the training dataset only has 920 images, it is not enough for a deep convolutional neural network. This leads to the nonideal classification effect. Pretraining and transfer learning make the VGNet model acquire the ability of feature extraction and the knowledge of classification; thus, it is easier to achieve higher accuracy than with the scratched learning model. Therefore, transfer learning seems to be a better approach than learning from scratch when the dataset is not big enough. Though the original VGG16 Net is a model with excellent performance trained on ImageNet, a large public dataset, in general, the filter at the bottom of the model can acquire different local edge and texture information through training, which has good universality for any image. However, the feature gaps between the ImageNet dataset from source area and the corn disease images in this new area are too large, while the other two datasets have much more similar features in color, texture, and shape to the corn disease images. Thus, the accuracies of the models pretrained with PlantVillage and AI Challenger are higher than the model pretrained with ImageNet. Images from PlantVillage are very similar to those from AI Challenger, but the number of PlantVillage is bigger than that of AI Challenger. Thus, the model pretrained with PlantVillage obtains a better learning effect, and PlantVillage is more suitable for the pretraining in this research. This indicates that in transfer learning, the source domain and target domain should have a high fitting degree for better performance.
3.3. Effects of Augmentation
Data augmentation was applied here based on image transformations, such as geometric transformation, color changing, and noise adding, to generate new training images from the original ones by applying such random image transformations. The size of the dataset was enlarged from 1150 to 11,500. The ratio of the training dataset and testing dataset was also 8:2. The effects of image augmentation for fine-tuning learning are also illustrated in
Table 4. It can be concluded that the effects of image data augmentation on different training models are different. In the mode of learning from scratch, data augmentation improves the accuracy by nearly 20%. Because the original dataset is too small, and the structure of the network structure is deep, the overfitting phenomenon reduces the performance of the network. When the image data are enlarged by data augmentation, the number and diversity of the data are increased. Thus, data augmentation has a larger role in avoiding overfitting and increasing accuracy when the model is learning from scratch. In the transfer learning mode, the accuracy of the fine-tuned model trained with augmentation is at least 2% higher than that of the model fine-tune-trained by original image data. This is because the pretraining model has learned a lot of knowledge from the large image dataset, which weakens the role of data augmentation. Hence, enlarging data plays a slight role in improving the performance of model classification in transfer learning.
5. Conclusions
Data diversity and representativeness are the key elements to ensure the generalization of the model. In this paper, we devised a VGNet which takes VGG16 as the backbone and adds batch normalization, as well as replacing two fully connected layers with a GPA layer and adding L2 normalization. The parameters of the convolutional layers and pooling layers are transferred to the newly designed VGNet; then, the fine-tuning learning for VGNet is studied to enhance the ability of recognizing corn disease images from real field conditions.
Data augmentation has greater promotion of model learning from scratch than on pretrained model, because the parameters of pretrained models are trained enough by open large datasets. Compared with traditional machine learning methods and state-of-the-art deep learning methods, the proposed VGNet has a stronger ability to identify a hierarchy of features of corn diseases. The accuracy of VGNet is improved by 3.5% compared with the original VGG16 Net, and the testing time for 230 images is reduced by 66.8%, with balanced precision, recall, and F1 indexes. The parameters and memory occupation of the proposed VGNet are reduced by 83.4% and 85.1%, respectively. The comparative experiments and performance analysis illustrated the wide adaptability of the proposed method. In addition, the proposed method could provide baseline architecture for other types of phenotypic information recognition or interpretation with much fewer parameters and computation time. In future work, we will focus on collecting multiple crop disease images from real scenes and developing fine-grained disease detection methods that can be used for multiple categories of crops.