1. Introduction
Citrus fruit is cultivated all over southern China, being the primary cultivated fruit in the country and a main industry in the vast rural areas in south China. However, the areas in which citrus is grown are mostly warm and humid, and the fruit trees have a long growing period, so they are often infected with many diseases. About
of citrus fruits are affected by different diseases [
1]. Intelligent identification of citrus diseases is an important step in building modern and intelligent agriculture systems, which can provide more scientific and effective guidance for citrus pest and disease control and field management [
2,
3]. The occurrence of crop diseases is affected by seasonal and climatic factors, resulting in long image data collection cycles and uneven distribution of various pest and disease data, affecting the performance of classification algorithms. In addition, changes in lighting and perspective during image acquisition also pose certain difficulties for classification. Therefore, automatically classifying citrus diseases in fields using smartphone images remains a challenge.
Various traditional computer vision methods have been applied to crop pest and disease image classification [
4,
5]. Traditional disease image classification usually includes feature extraction (e.g., SIFT, shape, and color features) and building a disease image classifier with some machine-learning algorithms. K. Jagan Mohan et al. [
6] used scale-invariant features (SIFT), K-nearest neighbor classifier, and support vector machine (SVM) to identify three rice diseases: brown spot, rice blast, and white leaf blight. The accuracy of disease identification using SVM was 0.91. In [
7], the model learned an overcomplete dictionary to sparsely represent the training images of each leaf species using a sparse representation (SR) approach. This framework was able to effectively recognize leaves on a public leaf dataset. Shanwen Zhang et al. [
8] used K-means clustering to segment diseased leaf images, extracted shape and color features to provide disease information, and classified diseased cucumber leaf images using SR. A major advantage of this method is that classification in SR space can effectively reduce computational effort and improve recognition performance: in the study, the overall recognition rate was 0.86. Because hand-crafted features may not be invariant to all diseases, finding an image classifier that is robust to all diseases is difficult. Traditional classification methods may be more accurate for one type of disease, but less so for another type.
To accurately identify crop pests and diseases, a variety of classification methods based on deep learning have been developed [
5,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Mohanty et al. [
19] used a deep convolutional neural network architecture to train a model on plant leaf images with the aim of classifying the crop species as well as the presence and identity of the disease. Sladojevic et al. [
20] used the Caffe deep-learning framework to build a convolutional neural network model for disease image recognition using plant leaves, and the experimental results showed the model achieved an accuracy of more than 0.91 and 0.96 for the individual classes tested. However, for deep learning, large-scale data are used as the training basis, which is strongly dependent on the amount of image data [
21]. Many deep neural networks based on transfer learning have been proposed to solve the problem caused by insufficient data. Thenmozhi and Reddy [
22] designed a deep CNN model that can classify insect species using the NBAIR, Xie1, and Xie2 datasets. Selvaraj et al. [
23] retrained ResNet50, InceptionV2, and MobileNetV1 to build disease- and pest-detection methods. The experimental results showed that ResNet50 and InceptionV2 outperformed MobileNetV1. They revealed that the DCNN is a robust and easy-to-deploy digital banana disease and pest detection strategy. Coulibaly et al. [
24] proposed a transfer-learning-based deep neural network for identifying pearl millet disease, which had an average recognition accuracy of 0.95 and an F1 score of 0.92. Barman et al. [
25] compared MobileNet and self-structured CNN for citrus leaf disease classification. They found that the self-structured CNN was more accurate than MobileNet in citrus disease classification using smartphone images. Khanramaki et al. [
26] proposed an integrated classifier of deep convolutional neural networks to identify citrus pests. These methods can usually produce accurate results in farming laboratories; however, they may be less effective for the citrus diseases and pests in the field due to the light changes and complex backgrounds.
In this study, based on an analysis of the characteristics of citrus image data collected by mobile devices in the field from 2019 to 2020, we designed a new deep-learning network that combines transfer learning to classify citrus disease and pest images. The original AlexNet architecture has eleven layers. Our proposed convolutional neural network has thirteen layers with a kernel size for both convolutional and pooling layers, obtaining more nonlinear transformation features from the disease image and reducing the number of parameters of the proposed network. Moreover, we combined the proposed 13-layer convolutional neural network (CNN13) with the pretrained VGG16 to address the problem caused by the uneven distribution of images among disease categories.
The main contributions of this study are twofold: First, we designed a new CNN13 to more accurately extract and fit citrus disease image features. Second, we designed a new joint network called OplusVNet, which combines the proposed CNN13 with transfer learning to alleviate the problem caused by uneven numbers of disease smartphone image data in each category.
The remainder of this paper is organized as follows: In
Section 2, we describe the dataset construction.
Section 3 presents the proposed OplusVNet. The citrus disease image classification experimental results are described in
Section 4. Finally, we provide our conclusions in
Section 5.
4. Experimental Results and Analysis
To evaluate the performance of the proposed OplusVNet network for citrus diseases= image classification, we compared its results with those of AlexNet network [
28,
29] and the VGG16 network (TL-VGG16) [
27]. The data, data pre-processing, and data enhancement operations used by all networks were the same. In this study, we set the learning rate, epoch, and optimizer of the OplusVNet network to 1 × 10
, 50, and Nadam, respectively. We conducted the experiments using Python programming language and TensorFlow deep learning, with Windows 10 as the operating system with a CPU Intel Core i7-8700 with 6 CPU cores, 32 G RAM, and a GeForce GTX 1080 Ti GPU.
4.1. Evaluation Metrics
In this study, we used the F1 score and accuracy rate as metrics to evaluate the effectiveness of the different network models for citrus disease and pest image classification. The F1 score is defined as
where Precision is the ratio between the number of correctly identified disease images and the number of correctly predicted disease images; Recall is the ratio between the number of correctly identified disease images and the number of all correct disease images in that category.
4.2. Experiments with OplusVNet with Different Frozen Mechanisms
Network-based transfer learning is part of the OplusVNet network, and the extracted feature images are used as the input of the subsequent layers. The parameters of the subsequent layer are trained with the target domain data, which plays an important role in the subsequent layer, enabling the network to more accurately fit the target domain data and thus further improve the model prediction results.
We designed different frozen mechanisms for OplusVNet, where OplusVNet_L indicates that the first
L layers are frozen. The experimental results for each network model are shown in
Figure 3.
Figure 3a shows the F1 score of the OplusVNet network with different frozen mechanisms for different disease and pest categories.
Figure 3a shows that the OplusVNet
network achieved more accurate results on different citrus pest categories. The accuracy rates of OplusVNet networks with different frozen mechanisms are shown in
Figure 3b.
Figure 3b shows that OplusVNet
was the most accurate. From the above results and analysis, we found that (1) the fewer the number of frozen layers, the more layers can be trained, which creates the risk of network overfitting. Conversely, the larger the number of frozen layers, the smaller the role played by the target domain data in the network, so the network may not be able to accurately fit the target domain data. (2) When the number of frozen layers of the transformed learning network was 10, our proposed OplusVNet network achieved an F1 score of more than 95% for individual pest category identification, and the overall classification accuracy was 95%, showing that the risk of network overfitting was effectively reduced and the target data were better learned in the network. The training time per batch size of the proposed OplusVNet network was 132 ms.
4.3. Comparison with State-of-the-Art Networks
To further validate the performance of the OplusVNet network, we compared it with AlexNet [
28,
29], TL-VGG16 [
27], and RepVGG [
31]. All the convolutional layers of the VGG16 network are frozen. These frozen layers retain the parameter weights obtained by the VGG16 network training in ImageNet, and only the fully connected layer and output unit of the network are trained. The classification results of the different networks on the test set are shown in
Table 4. The TL-VGG16 network shown in
Table 4 is a transfer-learning-based VGG16 network. For canker disease, the highest F1 score was obtained by the proposed OplusVNet
(1.00), followed by RepVGG (0.99). For scab disease, the highest F1 score was obtained by the proposed OplusVNet
and RepVGG (0.97), followed by TL-VGG16 (0.91). For leaf miner, the highest F1 score was obtained by the proposed OplusVNet
(0.99), followed by RepVGG (0.96). For rust wall, the highest F1 score was obtained by the proposed OplusVNet
(0.99), followed by RepVGG (0.94). For normal leaves, the highest F1 score was obtained by the proposed OplusVNet
(0.95), followed by RepVGG (0.93). For most experiments, the proposed OplusVNet
obtained the highest F1 score, followed by RepVGG. For leaf miner and normal leaves, TL-VGG16 performed substantially worse than the other methods. The overall classification accuracy of the TL-VGG16 network model was lower than that of the AlexNet network because the features obtained by the TL-VGG16 network feature extractor by learning the source domain data were not sufficiently learned. For the proposed OplusVNet
, RepVGG, and AlexNet, the larger the number of images for a specific type, the more accurate the performance. The accuracy rate of the proposed OplusVNet
was 0.99, which was greater than the values 0.93, 0.88, and 0.97 of AlexNet, TL-VGG16, and RepVGG, respectively.
In summary, OplusVNet outperformed the other networks in terms of both the F1 score for individual classes of disease and pest and overall classification accuracy, especially when the number of image classes was relatively low. The proposed network, combined with a network-based transfer learning network, can effectively fit data features and overcome the problems caused by a small and uneven data volume.
4.4. OplusVNet Network Performance Analysis
From the above analyses, we found that for most classification methods, the results largely depended on the number of images used for training. However, for some diseases, obtaining enough images for the classification task may be challenging. To analyze the performance of our proposed OplusVNet network on sets with different numbers of images, we set the number of images in the training set to 170, 120, and 70, separately. The number of images in the test set was 30 for each category. First, we randomly selected 200 images from the original dataset of each class, and then randomly selected 30 images from these 200 images as the test set; we used the remaining 170 images as the first training set. Then, we randomly selected 120 images from the 170 images in the first training set for the second training set. We randomly selected 70 images from the 170 images in the first training set for the third training set. Due to the small number of images in the training set, we set the batch size of the OplusVNet network to 32 in this part of the study.
4.4.1. Experimental Results of Proposed OplusVNet on Small Datasets
Table 5 shows the results of OplusVNet with different frozen layers for different numbers of images in the training set: 170, 120, and 70.
Table 5 shows that when the number of images in the training set was 170, the highest recognition accuracy was achieved with four, six, eight, and ten frozen layers. When the number of images in the training set was 120, the highest recognition accuracy was achieved with six and twelve frozen layers. When the number of images in the training set was 70, the highest recognition accuracy was achieved with six and eight frozen layers. When the number of images in the training set was less than 170, the network with six frozen layers had higher generalization performance. Therefore, the OplusVNet network with six frozen layers should be used when the number of images in the training set is small.
4.4.2. Experimental Results of Different Network Models on Small Datasets
Table 6 shows the experimental results of the OplusVNet_6, AlexNet [
28,
29], and TL-VGG16 [
27] networks on different small training sets.
Table 6 shows that OplusVNet_6 outperformed the other networks in terms of both the F1 score for individual classes of disease and pest and overall classification accuracy. Our proposed OplusVNet network effectively avoids the risk of overfitting in small datasets and can more effectively learn and fit disease and pest image data with texture features.
In summary, the performance of all methods generally decreased with the decrease in the number of images. For scab disease images, AlexNet performed substantially worse than the others. For normal citrus images, AlexNet and TL-VGG16 received low scores. For all the experiments, the proposed OplusVNet obtained the best F1 and accuracy scores. This demonstrated that the proposed OplusVNet is robust and effective for the identification of different disease and pest types and small datasets.