Explainable Deep Learning Study for Leaf Disease Classification

Wei, Kaihua; Chen, Bojian; Zhang, Jingcheng; Fan, Shanhui; Wu, Kaihua; Liu, Guangyu; Chen, Dongmei

doi:10.3390/agronomy12051035

Open AccessArticle

Explainable Deep Learning Study for Leaf Disease Classification

by

Kaihua Wei

,

Bojian Chen

,

Jingcheng Zhang

,

Shanhui Fan

,

Kaihua Wu

,

Guangyu Liu

and

Dongmei Chen

^*

School of Automation, Hangzhou Dianzi University, Hangzhou 310018, China

^*

Author to whom correspondence should be addressed.

Agronomy 2022, 12(5), 1035; https://doi.org/10.3390/agronomy12051035

Submission received: 13 April 2022 / Revised: 20 April 2022 / Accepted: 21 April 2022 / Published: 26 April 2022

(This article belongs to the Special Issue Monitoring and Forecasting Techniques in Fruit and Vegetable Production)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Explainable artificial intelligence has been extensively studied recently. However, the research of interpretable methods in the agricultural field has not been systematically studied. We studied the interpretability of deep learning models in different agricultural classification tasks based on the fruit leaves dataset. The purpose is to explore whether the classification model is more inclined to extract the appearance characteristics of leaves or the texture characteristics of leaf lesions during the feature extraction process. The dataset was arranged into three experiments with different categories. In each experiment, the VGG, GoogLeNet, and ResNet models were used and the ResNet-attention model was applied with three interpretable methods. The results show that the ResNet model has the highest accuracy rate in the three experiments, which are 99.11%, 99.4%, and 99.89%, respectively. It is also found that the attention module could improve the feature extraction of the model, and clarify the focus of the model in different experiments when extracting features. These results will help agricultural practitioners better apply deep learning models to solve more practical problems.

Keywords:

deep learning; leaf disease; interpretability; attention module

1. Introduction

Deep learning (DL) is a subfield of machine learning and can extract high-level features from raw data with hierarchical convolutional neural networks architectures [1]. With considerable advances in algorithms and dramatically lowered cost of computing hardware [2], it has been widely applied in numerous complex applications [3]. In agriculture, DL has become one popular technology in many aspects such as crop classification [4], fruit grading [5], pest detection [6], plant disease recognition [7], and weed detection [8].

Despite all the advantages of DL, the drawbacks and barriers cannot be ignored especially in further application of modern agriculture. For example, the models need to be trained with large datasets of good quality and a long time to achieve an admissible amount of accuracy and relevancy [9]. Among all the shortcomings, the lack of “explainability” is endemic and crucial to the black-box models in the widespread application, which is an important open point in artificial neural networks and deep learning models [10]. These neural network-based black box models make the users cannot fully grasp the information for which reason the output is generated [11,12].

In order to make the models more transparent and interpretable, explanatory artificial intelligence (xAI) reveals its importance, which is considered to be at the highest level in explainability, accuracy, and performance [10]. In recent years, with the continuous research of xAI research, the visualization of model internals has become one of the most intuitive methods to explore the interpretable cognitive factors of deep learning. By mapping abstract data into images, the visual representation of the model is established, which reduces the difficulty for researchers to understand the deep learning model, understands the internal expression, reduces the complexity of the model to a certain extent, and improves transparency.

Many excellent visualization methods have been proposed in recent years [13,14,15]. Simonyan et al. visualized the partial derivative of the predicted class score as pixel intensity, and further modified the original gradient through backpropagation and deconvolution, resulting in quality improvement [13]. Zeiler et al. visualized the neurons inside the deep neural network through activation maximization and sampling methods and found the maximal input image of the activated filter as much as possible, which can effectively display a specific pixel area [16]. Through the reverse process of de-pooling-deactivation-deconvolution, the inside of the convolutional network is visualized, low-level and high-level features are found, and the target recognition ability is improved. These methods all produce fine-grained visualizations, but they cannot be used to distinguish categories. Zhou et al. proposed a more intuitive interpretability algorithm, a class activation mapping (CAM) [17] method for localization. CAM replaces the fully connected layer with a convolutional layer and global average pooling (GAP), making full use of spatial information and robustness Stronger, through the softmax layer after GAP to achieve a specific category of feature maps. Based on CAM, Selvaraju et al. introduced a method of combining feature maps using gradient signals (GradCAM), which overcomes the shortcomings of CAM that need to modify the model structure and make it applicable to any CNN-based model [18].

As the number of convolutional layers increases, the “black box (uninterpretability)” problem of the network framework has become more and more serious, which makes the model interpretability study more and more necessary. The application in the agriculture field is complex, for example, the scene is complex, the plant types are diverse, and the environment and other interference factors are numerous. Therefore, it is particularly important to increase the interpretability of agricultural models. However, there are few reports in this area. Ghosal et al. proposed an interpretation mechanism and made a prediction based on the top-K high-resolution feature map extracted from its local activation level using images of stressed and healthy soybean leaflets in the field. However, it did not explain the internal mechanism of the model [19].

We explored the internal interpretability of deep learning models using the fruit leaf dataset. The specific objectives were to: (1) compare the performance of the ResNet [20], GoogLeNet [21], and VGG [22] network frameworks; (2) introduce the attention mechanism to build the ResNet-Attention model; (3) compare three interpretive algorithms of SmoothGrad [23], LIME [24] and GradCAM [18]. In order to study whether the model is more inclined to identify the shape features of the leaves or the texture features of the diseased spots, the dataset was rearranged into three different experiments: Experiment I is a classification experiment of fruit type and combination of diseases and insect pests, which is a multi-classification problem with 34 categories; Experiment II is a classification experiment of a fruit disease or not, it is binary-classification problems; Experiment III is based on fruit types classification, this is a multi-classification problem with 11 categories.

2. Materials and Methods

2.1. Experimental Design

We downloaded the fruit leaf images from the Plant Village dataset [25] and expanded the dataset with other different kinds of fruit leaf images, which can be downloaded from Google, the AI Challenger 2018 dataset, and so on. The dataset is a hierarchical multi-label classification dataset shown in Figure 1. The specific types of fruit leaves and diseases are shown in Table S1 in the supplementary materials. The first-level label is the type of fruits. The type of fruits is judged by identifying the shape of the leaves. There are 11 types of fruits in total. The secondary label is the type of pests and diseases. The types of pests and diseases corresponding to each fruit are different. There are 34 categories of fruit leaf datasets in total. In order to improve the generalization ability of the model, the dataset uses a series of data augmentation methods. Each category contains about 1000 pictures. The total number of pictures in the dataset is 34,000.

We investigated the difference in the focus of the model under different experiments in order to explain why a certain feature in the picture is important through three sets of experiments. Figure 1 shows the overall flow of the experiments. Experiment I is a classification experiment for the combination of fruit species and pest species, which is a multi-classification problem, focusing on whether the model can recognize the texture, shape, and disease spot features of leaves at the same time. Experiment II is the classification experiment of fruit disease, which is a binary classification problem, focusing on whether the model can well identify the texture features of leaves. Experiment III is a multi-classification experiment based on fruit species, focusing on whether the model can well identify the shape characteristics of leaves.

The three groups of experiments were trained based on VGG, GoogLeNet, and ResNet to compare the accuracy and generalization ability of the three models. Then we introduced the attention-based module into the ResNet model to explore the internal interpretability of the new model based on three visualization algorithms including SmoothGrad, LIME, and GradCAM.

2.2. Methods

2.2.1. The Attention Model

In 2018, Woo et al. proposed the CBAM, which is a simple and effective attention module for feedforward convolutional neural networks [26]. Given an intermediate feature map, the CBAM module sequentially infers the attention map along two independent dimensions (Channel Attention Model (CAM) and Spatial Attention Model (SAM)) and then multiplies the attention map with the input feature map for adaptive feature optimization. Because CBAM is a lightweight general-purpose module, it not only saves parameters and computing power but also ensures that it can be integrated into the ResNet network architecture as a plug-and-play module [26].

2.2.2. Visualization Method

The current interpretability methods based on xAI mainly include SmoothGrad [23], LIME [24], and GradCAM [18]. Smilkov et al. built the SmoothGrad method [23] on the basis of the saliency map [13]. In this method, random noise is added to the input image multiple times, and the transformed image is combined and averaged to achieve the effect of “introducing noise” to “eliminate noise”. Ribeiro et al. proposed the LIME method [24], which interprets the prediction results of the “blackbox” model through a locally interpretable model, making the interpretation results easy to understand. Selvaraju et al. proposed the GradCAM method on the basis of the CAM method [17,18], which overcomes the shortcomings of retraining the model after replacing the classifier in the CAM method.

The schematic diagram of the original image and the three visualization methods are shown in Figure 2. Figure 2a is the original image of the grape black rot leaf, and Figure 2b shows the visualized schematic diagram of the SmoothGrad method, where the high-brightness areas represent the greater the influence of this pixel on the prediction result. Figure 2c shows a schematic diagram of the LIME interpretability method. The LIME method perturbs the data of the original image to obtain new data and calculates the predicted value at the same time. The green part of Figure 2c shows the superpixels that contribute the most to the predicted value. Figure 2d shows the heat map of the GradCAM method. The darker color(red) in the figure indicates the current position with a higher weight. This method can be used to explain how the image is classified.

2.2.3. ResNet-CBAM Model

The ResNet structure was proposed by He et al. to solve the “degradation” problem that occurs when the network is deepened in the convolutional neural network. This phenomenon is caused by the explosion of gradients or the disappearance of gradients during backpropagation. When the depth of the convolutional neural network increases, this problem will become prominent, resulting in a decrease in the recognition rate. In order to solve this problem, a “short circuit” connection structure was introduced [20].

In this paper, the attention-based ResNet framework is constructed by introducing CBAM into the ResNet network framework. When constructing the framework, in order to maintain the integrity of the ResNet network structure, CBAM cannot be added to the block of the ResNet network. Therefore, CBAM is designed to be added to the last layer of convolution of ResNet Block to ensure that the ResNet network can be trained normally with pre-training parameters. The ResNet-CBAM model structure is shown in Figure 3.

The structure of the ResNet-CBAM model is shown in Figure 3, which shows a schematic diagram of the model structure in which CBAM is added to the ResBlock random layer in the ResNet model. The CBAM module is mainly composed of CAM and SAM. The CAM calculation process is as follows: the input feature maps are, respectively, passed through global max pooling (GMP) and global average pooling (GAP) based on width and height, and then go through Multilayer Perceptron (MLP), respectively. The MLP output performs an elementwise operation, and finally performs a sigmoid activation operation to generate the final channel attention feature map. The SAM calculation process is as follows: the feature map output is used by CAM as the input feature map of this module. First, go through channel-based GMP and GAP, then perform concat operations on these two results based on the channel. Then, after a convolution operation, the spatial attention feature is generated through the sigmoid function.

Table 1 shows the comparison of parameters of four ResNet models. Table 2 shows the hyper-parameters of the model. As shown in Table 1, after the CBAM module is applied to the ResNet model, the overall parameters of the model increase slightly, which does not seriously affect the running speed of the model. As shown in Table 2, the shuffle operation is used in the training of the four models to increase the randomness and improve the generalization performance of the network. Thus, we can prevent the gradient from being too extreme when the weight is updated due to regular data appearing, and ultimately avoid overfitting or underfitting of the model. The epochs of model training are 20, 40, and 60, the batch size is 16 and 32, the learning rate is always 0.001, and the best accuracy model weight is finally selected for image visualization prediction.

3. Results

Three classification experiments were carried out in this study. Each classification experiment used three frameworks based on VGG, GoogLeNet, ResNet, and two extended models of ResNet34-CBAM and ResNet50-CBAM for visual display. Experiment I is a multi-classification experiment of fruit type and a combination of diseases and insect pests. Experiment II is a binary-classification experiment on fruit disease or not. Experiment III is a multi-classification based on fruit types classification.

3.1. Classification Experiment of Fruit Species and Pests

The dataset is classified into 34 categories according to the combination of fruit types and pest types, and the number of images in each category is about 1000. The goal of Experiment I is to explore whether the model can recognize the shape features of leaves and the texture features of lesions.

The accuracy of the three models on the test set in experiment I are shown in Table 3. It can be seen from Table 3 that the average accuracy of VGG, GoogLeNet, and ResNet are 98.06%, 98.86%, and 99.11%, respectively. Comparing each category, it is found that although the accuracy of the three models is not much different, the classification accuracy of the ResNet network in apple scab, grape black rot, and guava whitefly is significantly better than VGG and GoogLeNet. Therefore, the ResNet model performs best on the dataset.

The attention-based module is introduced to the ResNet to construct ResNet34-CBAM and ResNet50-CBAM models. Figure 4 shows the results of a diseased leaf in the dataset, and the results of other pictures are similar. As shown in Figure 4, each row represents the visualization result of a model, each column represents a different visualization method and the original image, where layer1–4 corresponds to the four convolutional layers of each ResBlock in Figure 3.

Comparing the four models in Figure 4, the model effect of the ResNet-CBAM structure is significantly better than the ResNet model. The ResNet34 model cannot filter background information well, resulting in mediocre results. However, the ResNet34-CBAM model can overcome this shortcoming, ensuring high confidence in leaf shape and lesion characteristics. This means that the introduction of the attention module is beneficial to the feature extraction of the model, making the display results more explanatory.

Comparing the ResNet34-CBAM and ResNet50-CBAM models, the ResNet34-CBAM model has significantly better overall results on diseased leaves under the multi-classification experiment. From the display of the results of each layer, it can be found that the model first focused on the shape of the leaf and ignores the location of the diseased spot, but the model will focus on the feature of the diseased spot in the later stage, and finally combine the two sets of features to achieve a better classification effect. However, the visualization effect of the ResNet50-CBAM model is not good.

Comparing the GradCAM, SmoothGrad, and LIME, the effect of the GradCAM method is the best. This is where GradCAM comes in handy for class discrimination because it clearly shows the concerns of each layer of the network. Compared with GradCAM, the layered and concise knowledge of LIME is easier to extract, and it highlights superpixels, which means that it is possible to see how the network is explained based on patches of similar pixels. It can be seen from the renderings displayed by SmoothGrad that this method considers that the texture features of the lesions in Experiment I have a high weight (the highlights of the lesions are more concentrated), but the appearance characteristics of the leaves are not well represented. This is not in line with our experimental expectations. Therefore, the GradCAM method works best in Experiment I.

Based on the above comparison, the ResNet34-CBAM model will recognize the shape features of the leaves and the texture features of the diseased spots at the same time to achieve a better classification effect.

3.2. Classification Experiment of Fruit Leaf Disease

The dataset of Experiment II is classified according to whether the fruit is diseased or not. This experiment is a binary-classification experiment. The purpose is to study whether the model can achieve accurate classification only by identifying the texture features of leaf lesions.

The accuracy of the three models on the test set in Experiment II are shown in Table 4. It can be seen from Table 4 that the average accuracy rates of VGG, GoogLeNet, and ResNet are 98.03%, 98.45%, and 99.40%, respectively. The ResNet model has the best classification effect on the test set. Regardless of whether it is healthy or diseased leaves, the classification accuracy of the ResNet model is better than the VGG and GoogLeNet models.

Figure 5 shows the results of a diseased leaf and a healthy leaf in the dataset. The results of other pictures are similar. As shown in Figure 5, each row represents the visualization result of a model, and each column represents a different visualization method and original image.

In this experiment, the effects of ResNet50 and ResNet50-CBAM models are better, and both can focus on the characteristics of lesions. Compared with the ResNet50 model, the ResNet50-CBAM model can extract leaf shape features while paying attention to lesion features, and the features extracted by this model are more detailed. By comparing each layer of GradCAM, the model first pays attention to the outline of the leaves through background details and then predicts the health of the leaves by paying attention to internal details and lesion features. The ResNet50-CBAM model gives higher weight to the lesion (the darkest lesion in Layer 3), while other models are not as good as the ResNet50-CBAM model in extracting detailed features. The ResNet50-CBAM model merges all features in the CBAM2 layer to improve the accuracy of prediction. Therefore, we have reason to believe that the ResNet50-CBAM model has the best effect in experiment II.

Comparing the three methods of GradCAM, SmoothGrad, and LIME, the GradCAM method shows the best results. In addition, the SmoothGrad method can simply describe the appearance of healthy leaves, but it cannot accurately locate the features of diseased spots. In the results displayed by the lime method, we do not have a good understanding of its explanation, but we can still find that the LIME method simply describes the appearance characteristics of the leaves.

In Experiment II, the model will focus on the texture features of the lesions, and the model will also combine the shape features of the leaves to achieve a better classification effect. With the help of the GradCAM method, it can help us understand the model prediction mechanism.

3.3. Classification Experiment of Fruit Types

The dataset of Experiment III was constructed only according to the type of fruit, and multi-classification datasets were created. The purpose of Experiment III is to explore whether the model can well recognize the shape characteristics of the blade, and observe whether the model gives a high weight to the lesion.

In Experiment III, the test accuracy of the three models on the testset after training is shown in Table 5. It can be seen from Table 5 that the average accuracy rates of VGG, GoogLeNet, and ResNet are 99.45%, 99.67%, and 99.89%, respectively. Although the average accuracy of the three models is not much different, the ResNet model has a greater advantage in the classification of apples and peaches. Therefore, we believe that the classification effect of the ResNet model is still the best in Experiment III.

Figure 6 shows the results of each model in Experiment III. Each row in the figure represents the display result of one model, and the columns represent the results of different interpretability methods. In the case of a single ResNet model, the display effect of the ResNet50 model is better than the ResNet34 model. Compared with Layer3, the results displayed by the ResNet34 model are more confusing, and the outline description is more blurred. On the contrary, the model with the attention module is more effective, and the ResNet-CBAM models can clearly describe the shape details of the blade and the vein details inside the blade.

Among all the ResNet-CBAM models, the ResNet50-CBAM model performs best. Compared with the ResNet34-CBAM model, this model relaxes the confidence of the diseased spots inside the leaves and increases the weight of the leaf shape features. This is consistent with experimental expectations. However, the results displayed by the ResNet34-CBAM model are rather vague, and there is no distinction between leaf contours and inner veins. From the analysis of the diseased leaf results, the ResNet34-CBAM model gives higher weight to the diseased spot location. Overall, the ResNet50-CBAM model is the best model.

Comparing the three methods of GradCAM, SmoothGrad, and LIME, the GradCAM method shows the best results. In addition, the SmoothGrad method can simply describe the appearance characteristics of leaves and does not give higher weight to lesions. In the results displayed by the LIME method, we do not have a good understanding of its interpretation, but we can still find that the LIME method simply describes the appearance of the leaves.

Therefore, we confirm that in Experiment III, the ResNet50-CBAM model will pay attention to the shape characteristics of the leaves and relax the texture characteristics of the lesions.

4. Discussion

We also analyzed the background and shadow parts, as shown in Figure 7, under the optimal weight of the Experiment II model, two images with subtle differences in the background and no shadow parts of the leaves are selected for visualization. The visualization results of other leaves are similar. We randomly selected a group of images for display. It can be seen from the figure that in the case of excluding the two unfavorable factors, the four models do not give a high weight to the background part in the prediction process, that is to say, the model pays more attention to the classification task characteristics of the leaves themselves, rather than classification by the background of the image. By comparing the visualization results of various models, the ResNet50-CBAM model has the best visualization results for healthy leaves and presents better visualization results among the three interpretability algorithms. When the ResNet50-CBAM model predicts healthy leaves, it only pays attention to the venation features inside the leaves and the contour features of the leaves. It can be observed from the LIME and SmoothGrad methods that the ResNet50-CBAM model pays more attention to the characteristics of the leaf itself than other models, and does not care about the background, which is helpful for us to understand the operating mechanism of the model. For the classification of diseased leaves, the ResNet50-CBAM model can make good predictions only by the leaf disease characteristics, while the other three models more or less need the help of leaf texture features to assist in prediction. Therefore, we believe that the ResNet50-CBAM model performs the best in Experiment II, and the presented results are also the most consistent with people’s judgment logic in this task. Additionally, in the prediction results of the ResNet50-CBAM model, it is further confirmed that the background and shadow have little effect on the prediction of the model.

Based on the above three groups of experiments, the ResNet50-CBAM model has the best effect. In order to further verify that the attention module can improve the feature extraction ability of the ResNet model in different experiments, we listed the ResNet50-CBAM model on the same grape black rot leaf, shown in Figure 8. As shown in Figure 8, the overall result of the ResNet50-CBAM model is better than that of the ResNet50 model, and the ResNet50-CBAM model has higher confidence in the leaf shape features, and the focus of the model is different for different experiments. However, the ResNet50 model will give a certain weight to the background noise, which is obviously not the result we want. Comparing the output results of the Layer4 and CBAM2 layers of the three sets of experiments, it is found that the CBAM module can better grasp the focus of the image. For example, in Experiment II, the ResNet50-CBAM model can well integrate the appearance and lesion features of the leaves, and the model can extract more features to improve the predictive ability of the model. In contrast, the ResNet model does not integrate the features well, discarding part of the feature information. Therefore, after adding the CBAM layer on the basis of the ResNet framework, the feature extraction is more detailed, which can effectively improve the predictive ability of the model. Comparing the three methods of GradCAM, SmoothGrad, and LIME, the GradCAM method shows the most intuitive results and is the easiest to understand. In addition, it can be seen from the SmoothGrad method that the pixels of the lesion position are the most important, but the effect of explaining the appearance of the leaves is not obvious. The results shown by the LIME method do not meet our experimental expectations. The LIME method simply describes the appearance of the leaves and does not have good explanatory power. These two methods are not as obvious as the GradCAM method in terms of visualization. Therefore, the combination of the ResNet50-CBAM model and the GradCAM method has a better interpretation effect.

In order to verify the generalization ability of the model, we chose an image of eggplant disease from the Internet, and produced a visualization result prediction under the weight of the ResNet50-CBAM model in Experiment II, as shown in Figure 9. The model can make accurate classifications by the location of leaf lesions in all three interpretability algorithm predictions. As can be seen in Figure 9b,d, the model assigns a higher weight to the position of the lesion, and the healthy leaf can be classified by the texture features of the lesion. It is further confirmed that the model has good generalization ability, so it can be applied to more leaf classification scenarios.

5. Conclusions

We studied the interpretability of the different classification models based on the fruit disease leaves dataset. We designed three different experiments on the dataset: Experiment I is a classification experiment combining fruit species and pest species, which is a multi-classification problem, focusing on whether the model can simultaneously recognize the texture, shape, and lesion characteristics of the leaves. Experiment II is a fruit disease classification experiment, which is a binary classification problem, focusing on whether the model can well recognize the texture characteristics of leaves. Experiment III is a multi-classification experiment based on fruit types, focusing on whether the model can well recognize the shape characteristics of leaves. In each experiment, the VGG, GoogLeNet, and ResNet models were used and the ResNet-attention model was applied with three interpretable methods. Through three sets of experimental studies, we have confirmed that the ResNet model has the best accuracy in our classification tasks, which are 99.11%, 99.4%, and 99.89%, respectively. The ResNet-CBAM model is constructed by introducing the attention module. The model is conducive to improving the ability of the model to extract key features and can enhance the generalization power of the model. In addition, by comparing the three visualization methods SmoothGrad, LIME, and GradCAM, the GradCAM method is the most suitable for agricultural classification tasks among the three methods.

Finally, through the above series of experiments, we clarified the internal interpretability of the convolution-based neural network model in dealing with common leaf diseases and insect pests and clarified the focus of the model in feature extraction in the three different sets of experiments. The attention module can effectively improve the feature extraction ability of the model. Combined with three interpretability methods, it shows that the feature extraction results of the model in different agricultural classification tasks are different. This research will help practitioners in the agricultural field make better use of deep learning methods to deal with classification problems in the agricultural field.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy12051035/s1, Table S1: Display table of various diseases of different fruits.

Author Contributions

Conceptualization, B.C. and D.C.; methodology, B.C. and D.C.; software, B.C.; validation, K.W. (Kaihua Wei), B.C., J.Z., S.F., K.W. (Kaihua Wu), G.L. and D.C.; formal analysis, K.W. (Kaihua Wei); investigation, B.C., G.L. and D.C.; resources, J.Z.; data curation, B.C. and D.C.; writing—original draft preparation, B.C. and D.C.; writing—review and editing, K.W. (Kaihua Wei), B.C., J.Z., S.F., K.W. (Kaihua Wu), G.L. and D.C.; visualization, B.C. and D.C.; supervision, D.C. and K.W. (Kaihua Wei); project administration, B.C. and D.C.; funding acquisition, K.W. (Kaihua Wu). All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Zhejiang Provincial Natural Science Foundation of China under Grant No. LGN19F030001, Science and Technology Project of Zhejiang Province of China (LGF20F050004, LQ16F050002), China Postdoctoral Science Foundation (2020M681848) and National Key R&D Program of China (2019YFE125300).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

https://github.com/13665832220/ResNet_CBAM.git. (accessed on 8 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Guo, Y.M.; Liu, Y.; Oerlemans, A.; Lao, S.Y.; Wu, S.; Lew, M.S. Deep learning for visual understanding: A review. Neurocomputing 2016, 187, 27–48. [Google Scholar] [CrossRef]
Deng, L. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Trans. Signal Inf. Process. 2014, 3, e2. [Google Scholar] [CrossRef] [Green Version]
Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2010, 22, 1345–1359. [Google Scholar] [CrossRef]
Kamrul, M.H.; Rahman, M.; Robin, M.R.I.; Hossain, M.S.; Hasan, M.H.; Paul, P. A Deep Learning Based Approach on Categorization of Tea Leaf. In Proceedings of the 2020 International Conference on Computing Advancements (ICCA), Dhaka, Bangladesh, 10–12 January 2020; pp. 1–8. [Google Scholar]
Hossain, M.S.; Al-Hammadi, M.H.; Muhammad, G. Automatic Fruit Classification Using Deep Learning for Industrial Applications. IEEE Trans. Ind. Inform. 2018, 15, 1027–1034. [Google Scholar] [CrossRef]
Trkolu, M.; Hanbay, D. Plant disease and pest detection using deep learning-based features. Turk. J. Electr. Eng. Comput. Sci. 2019, 27, 1636–1651. [Google Scholar] [CrossRef]
Trivedi, N.K.; Gautam, V.; Anand, A.; Aljahdali, H.M.; Villar, S.G.; Anand, D.; Goyal, N.; Kadry, S. Early detection and classification of tomato leaf disease using high-performance deep neural network. Sensors 2021, 21, 7987. [Google Scholar] [CrossRef] [PubMed]
Mishra, A.M.; Harnal, S.; Mohiuddin, K.; Gautam, V.; Nasr, O.A.; Goyal, N.; Alwetaishi, M.; Singh, A. A Deep Learning-Based Novel Approach for Weed Growth Estimation. Intell. Autom. Soft Comput. 2022, 31, 1157–1172. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Daglarli, E. Explainable Artificial Intelligence (xAI) Approaches and Deep Meta-Learning Models for Cyber-Physical Systems. In Artificial Intelligence Paradigms for Smart Cyber-Physical Systems; Luhach, A.K., Elci, A., Eds.; IGI Global: Hershey, PA, USA, 2021; pp. 42–67. [Google Scholar]
Samek, W.; Wiegand, T.; Müller, K.R. Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models. arXiv 2017, arXiv:1708.08296. Available online: https://arxiv.org/abs/1708.08296 (accessed on 28 August 2017).
Fernandez, A.; Herrera, F.; Cordon, O.; Del Jesus, M.J.; Marcelloni, F. Evolutionary fuzzy systems for explainable artificial intelligence: Why, when, what for, and where to? IEEE Comput. Intell. Mag. 2019, 14, 69–81. [Google Scholar] [CrossRef]
Simonyan, K.; Vedaldi, A.; Zisserman, A. Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps. In Proceedings of the International Conference on Learning Representations Workshop (ICLR Workshop), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Springenberg, J.T.; Dosovitskiy, A.; Brox, T.; Riedmiller, M. Striving for Simplicity: The All Convolutional net. arXiv 2014, arXiv:1412.6806. Available online: https://arxiv.org/abs/1412.6806 (accessed on 13 April 2015).
Gan, C.; Wang, N.; Yang, Y.; Yeung, D.Y.; Hauptmann, A.G. DevNet: A Deep Event Network for multimedia event detection and evidence recounting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 2568–2577. [Google Scholar]
Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Neural Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6–12 September 2014; pp. 818–833. [Google Scholar]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Ghosal, S.; Blystone, D.; Singh, A.K.; Ganapathysubramanian, B.; Singh, A.; Sarkar, S. An explainable deep machine vision framework for plant stress phenotyping. Proc. Natl. Acad. Sci. USA 2018, 115, 4613–4618. [Google Scholar] [CrossRef] [PubMed] [Green Version]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.; Wattenberg, M. Smoothgrad: Removing noise by adding noise. In Proceedings of the International Conference on Machine Learning (ICML) Workshop on Visualization for Deep Learning, Sydney, Australia, 10 August 2017. [Google Scholar]
Ribeiro, M.T.; Singh, S.; Guestrin, C. "Why Should I Trust You?": Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Hughes, D.P.; Salathe, M. An Open Access Repository of Images on Plant Health to Enable the Development of Mobile Disease Diagnostics. arXiv 2015, arXiv:1511.08060. Available online: https://arxiv.org/abs/1511.08060 (accessed on 12 April 2016).
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]

Figure 1. Dataset and schematic diagram of experimental design.

Figure 2. Schematic diagram of three visualization methods. (a) Original image. (b) Visualized schematic diagram of the SmoothGrad method. (c) Visualized schematic diagram of the LIME method. (d) Visualized schematic diagram of the GradCAM method.

Figure 3. ResNet-CBAM model structure.

Figure 4. Visualization results of Experiment I. Each row represents the visualization result diagram of one model, and the column represents the effect diagram of GradCAM (including CBAM1, Layer1–4, and CBAM2 each layer result diagram. Layer1–4 corresponds to the four convolutional layers of random ResBlock in Figure 3), SmoothGrad, LIME, and the original image.

Figure 5. Visualization results of Experiment II. Each row represents the visualization result diagram of a model, and each column represents the effect diagram of GradCAM (including CBAM1, Layer1–4, and CBAM2 each layer result diagram. Layer1–4 corresponds to the four convolutional layers of random ResBlock in Figure 3), SmoothGrad, LIME, and the original image.

Figure 6. Visualization results of Experiment III. Each row represents the visualization result diagram of a model, and each column represents the effect diagram of GradCAM (including CBAM1, Layer1–4, and CBAM2 each layer result diagram. Layer1–4 corresponds to the four convolutional layers of random ResBlock in Figure 3), SmoothGrad, LIME, and the original image.

Figure 7. Leaf background and shadow results based on Experiment II. Each row represents the visualization result diagram of a model, and each column represents the effect diagram of GradCAM (including CBAM1, Layer1–4, and CBAM2 each layer result diagram. Layer1–4 corresponds to the four convolutional layers of random ResBlock in Figure 3), SmoothGrad, LIME, and the original image.

Figure 8. Comparison of the same disease picture in model ResNet50-CBAM and ResNet50. The rows represent the display results of the three interpretable methods. The left side of the dotted line in the figure shows the results of the ResNet50-CBAM model on three sets of experiments, and the right side shows the results of the ResNet50 model on the three sets of experiments.

Figure 9. Schematic diagram of three visualization methods. (a) Original image. (b) Visualized schematic diagram of the SmoothGrad method. (c) Visualized schematic diagram of the LIME method. (d) Visualized schematic diagram of the GradCAM method.

Table 1. Comparison of parameters of four ResNet models.

Model	Parameters
ResNet34	21,307,339
ResNet34-CBAM	21,341,429
ResNet50	23,583,691
ResNet50-CBAM	24,110,933

Table 2. Hyper-parameter of deep neural network.

Parameter	Description
Shuffle	True
Epochs	20, 40, 60
Batch Size	16, 32
Learning Rates	0.01
Optimizer	Momentum
Regularizer	L2 (0.002)
Activation Function	ReLU

Table 3. The accuracy of the models on the test set.

Serial Number	Category		Accuracy (%)
Serial Number	Category		VGG	GoogLeNet	ResNet
1	Apple	Scab	97.88	98.94	98.94
2		Black Rot	97.42	100	100
3		Cedar Rust	99.48	100	100
4		Healthy	98.61	99.07	99.54
5	Banana	Leaf Spot	100	100	100
6		Fusarium Wilt	99.49	99.49	99.49
7		Healthy	100	100	100
8	Blueberry	Healthy	99.5	99.5	99.5
9	Cherry	Healthy	99	99.5	99.5
10	Cherry	Powdery Mildew	100	100	100
11	Grape	Black Rot	97.35	100	100
12		Black Measles	100	100	100
13		Healthy	100	100	100
14		Brown Spot	100	99.49	100
15	Guava	Algae Leaf Spot	99.1	99.55	100
16		Healthy	98.97	100	99.49
17		Rust	98.95	100	100
18		Whitefly	97.41	98.45	100
19	Citrus	Huanglong disease	94.68	96.81	97.87
20	Peach	Scab	94.74	97.37	96.32
21	Peach	Healthy	98.48	97.46	97.46
22	Raspberry	Healthy	96.72	95.08	96.72
23	Strawberry	Healthy	96.43	94.9	98.47
24	Strawberry	Leaf Burn	92.45	98.58	99.06
25	Tomato	Scab	99.52	99.52	99.52
26		Early Blight	99.06	100	98.59
27		Healthy	99.49	97.98	99.49
28		Late Blight	95.07	97.54	98.52
29		Leaf Mold	91.75	95.36	97.42
30		Spot Blight	96.79	98.4	95.72
31		Starscream Injury	97.38	99.48	98.43
32		Target spot disease	98.99	99.5	100
33		Mosaic disease	100	100	100
34		Yellow Leaf Curl	99.47	98.93	99.47
	Average		98.06	98.86	99.11

Table 4. The accuracy of the models on the test set.

Serial Number	Category	Accuracy (%)
Serial Number	Category	VGG	GoogLeNet	ResNet
1	Pests and diseases	98.08	97.73	99.13
2	Healthy	97.98	99.16	99.66
3	Average	98.03	98.45	99.4

Table 5. The accuracy of the models on the test set.

Serial Number	Category	Accuracy (%)
Serial Number	Category	VGG	GoogLeNet	ResNet
1	Apple	98.43	99.22	100
2	Banana	100	100	100
3	Blueberry	100	100	100
4	Cherry	99.56	99.56	99.56
5	Grape	99.21	99.21	100
6	Guava	100	100	100
7	Citrus	100	100	100
8	Peach	98.32	99.58	100
9	Raspberry	99.18	100	99.59
10	Strawberry	99.18	99.18	99.59
11	Tomato	100	99.6	100
12	Average	99.45	99.67	99.89

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, K.; Chen, B.; Zhang, J.; Fan, S.; Wu, K.; Liu, G.; Chen, D. Explainable Deep Learning Study for Leaf Disease Classification. Agronomy 2022, 12, 1035. https://doi.org/10.3390/agronomy12051035

AMA Style

Wei K, Chen B, Zhang J, Fan S, Wu K, Liu G, Chen D. Explainable Deep Learning Study for Leaf Disease Classification. Agronomy. 2022; 12(5):1035. https://doi.org/10.3390/agronomy12051035

Chicago/Turabian Style

Wei, Kaihua, Bojian Chen, Jingcheng Zhang, Shanhui Fan, Kaihua Wu, Guangyu Liu, and Dongmei Chen. 2022. "Explainable Deep Learning Study for Leaf Disease Classification" Agronomy 12, no. 5: 1035. https://doi.org/10.3390/agronomy12051035

APA Style

Wei, K., Chen, B., Zhang, J., Fan, S., Wu, K., Liu, G., & Chen, D. (2022). Explainable Deep Learning Study for Leaf Disease Classification. Agronomy, 12(5), 1035. https://doi.org/10.3390/agronomy12051035

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Explainable Deep Learning Study for Leaf Disease Classification

Abstract

1. Introduction

2. Materials and Methods

2.1. Experimental Design

2.2. Methods

2.2.1. The Attention Model

2.2.2. Visualization Method

2.2.3. ResNet-CBAM Model

3. Results

3.1. Classification Experiment of Fruit Species and Pests

3.2. Classification Experiment of Fruit Leaf Disease

3.3. Classification Experiment of Fruit Types

4. Discussion

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI