1. Introduction
Agriculture is one of humanity’s most critical activities, of which plant disease control is a cornerstone. It is necessary to pay attention to the quality and wellbeing of the agricultural harvest. This will help maintain food production levels in the face of natural diseases and aid countries in coping with political and environmental challenges. Tomatoes are among the vital crops and staple food products around the world because of their rich nutritional content and their role in many recipes [
1]. The food and agriculture organization (FAO) ranks tomatoes as the sixth most abundant vegetable around the world [
2]. In 2017, nearly 170.8 million tons of tomatoes were produced worldwide [
3]. However, the tomato plant is susceptible to many diseases caused by bacteria, viruses, or fungi that have a direct adverse effect on productivity [
4].
To detect plant diseases, farmers refer to plant pathologists. Alternatively, they can rely on their own experience or public resources. However, the required time, effort, and technical expertise may be prohibitive for most professional or hobby farmers [
5]. Thus, technological solutions that can aid the disease detection and identification will go a long way in reducing cost and improving the accuracy and speed of disease control. In this regard, recent advances in artificial intelligence (AI) have empowered a wide swath of applications from various disciplines. AI systems capture domain knowledge in their models through the training and validation process. They provide decision-making capabilities with nontrivial sophistication and complexity [
6,
7]. More specifically, deep learning algorithms have enabled the capture of intricate relationships and features of real-life processes. Convolutional neural networks (CNNs) are one of the types of deep learning algorithms that were found to be particularly useful for direct image-based decision-making and objection detection [
8].
Neural networks are comprised of three layers; input, output, and hidden. On the other hand, deep learning involves a far greater number of layers, which enables the capturing of input features and details at various scales. Out of the many deep learning artificial intelligence algorithms, convolutional neural networks are the most suitable for handling images as input [
8]. Layers in a convolutional neural network perform a series of convolution operations using filters of various sizes, which is typically followed by a rectified linear unit (ReLU) activation function. The result from the ReLU is a feature map that is downsampled by the subsequent pooling layer. In general, the final layer before the output in CNN is a fully connected layer, which combines the various features learned from previous layers and feeds the output layer.
Building CNN models is an elaborate process, which needs to balance the computational cost with the ability to automatically extract appropriate features at various scales, orientations, colors, reflections, and spatial properties. Moreover, the models may suffer from overfitting, underfitting, or inefficiency. In addition, thorough evaluation is needed to establish the trustworthiness of the models. Luckily, several public and well-established models exist in the literature. These models offer a wide-range of reliable capabilities with great efficiency [
9]. Importantly, these models can be reused via an approach called transfer learning. This method utilizes generically pre-trained models by reusing the network structure and retraining part or all of the models including the existing model weights and parameters. Harnessing these robust models accelerates the development of new innovative AI applications without reinventing new CNN architectures. This methodology was successfully employed in many AI solutions for image classification [
6,
7].
In the context of technological and AI-based innovations for tomato disease diagnosis, several studies were conducted in the literature. Mim et al. [
10] developed a system that helps tomato farmers discover the type of disease using leaf images of the plant. The researchers used artificial intelligence algorithms and CNN to develop a six-class (five diseases and one healthy) classification model with an accuracy of 96.55%. Hlaing and Zaw [
11] isolated the leaf image from the background, and used explicit feature extraction in the form of statistical properties and scale invariant feature transform of texture features. These descriptors fed a support vector machine (SVM) classifier, which distinguishes between seven input categories (six diseases and one healthy) with an accuracy of 84.7%. Kumar and Vani [
12] experimented with four deep learning models: LeNet, VGG16, ResNet, and Xception for ten-class classification (nine diseases + one healthy) of tomato leaf images, and reported a maximum accuracy of 99.25% using VGG16. Similarly, Tm et al. [
13] used the AlexNet, GoogleNet, and LeNet models for the same classification problem and achieved an accuracy range of 94–95%. Annabel and Muthulakshmi [
14] used masking and threshold-based segmentation to identify and isolate infected areas of a leaf image. They extracted several features (e.g., dissimilarity, homogeneity, and contrast) and used a random forest classifier to category 3 diseases plus healthy leaves with an accuracy of 94.1%. Agarwal et al. [
15] developed a custom CNN model by modifying the VGG16 structure. They compared this model with traditional machine learning models (e.g., random forest and decision trees) and three deep learning ones (i.e., VGG16, Inceptionv3, and MobileNet) for ten-class classification, and achieved an accuracy of 98.4%. Ouhami et al. [
16] employed transfer learning of three models; DensNet-161, DensNet-121, and VGG16. The highest accuracy was achieved using DenseNet-161 (i.e., 95.65%). Similarly, Alhaj Ali et al. [
17] used Inceptionv3 and reported the highest accuracy to be 99.8%. However, the high aforementioned results were achieved with augmented images with duplication.
Other approaches were employed in the literature. In one avenue, deep learning algorithms were combined with traditional machine learning to solve the classification problem. Al-gaashani et al. [
18] extracted features from leaf images using MobileNetv2 and NASNetMobile. After dimensionality reduction, the concatenation of these features fed into non-deep classification networks (i.e., random forest, SVM, multinomial logistical regression). In another methodology, deep object detection methods were applied on plant images to detect diseases on leaves. Liu and Wang [
19] employed the you only look once version 3 (YOLOv3) algorithm to detect gray leaf spot disease. They reported a mean average precision of 92.5%. Similarly, Wang et al. [
20] used Faster R-CNN and Mask R-CNN to detect eleven disease states (including healthy) in fruit images.
Other approaches were employed in the literature. In one avenue, deep learning algorithms were combined with traditional machine learning to solve the classification problem. Al-gaashani et al. [
18] extracted features from leaf images using MobileNetv2 and NASNetMobile. After dimensionality reduction, the concatenation of these features fed into non-deep classification networks (i.e., random forest, SVM, multinomial logistical regression). In another methodology, deep object detection methods were applied on plant images to detect diseases on leaves. Liu and Wang [
19] employed the you only look once version 3 (YOLOv3) algorithm to detect gray leaf spot disease. They reported a mean average precision of 92.5%. Similarly, Wang et al. [
20] used Faster R-CNN and Mask R-CNN to detect eleven disease states (including healthy) in fruit images. Traditional methods were also used in the literature recently. Gadade and Kirange [
21] extracted the features using Gabor filters, gray level co-occurrence matrix, and speeded up robust features. This approach involves less computational and memory overhead than deep learning, but it is less effective in solving the classification problem as demonstrated by their reported accuracy of 74%. Similarly, Lu et al. [
22] presented spectral vegetation indices as features for classification using K-nearest neighbors (KNN), and they reported a 100% accuracy, albeit with a very small dataset (445 images).
This work is motivated by the following factors:
The adoption and implementation of technological innovations is generally lacking in the agricultural literature in comparison with other fields (e.g., medicine). This is especially true for the number of artificial intelligence applications in agriculture versus in medicine.
Traditional classification methods rely upon explicit feature extraction and/or image processing techniques, which may be sensitive to changes in image quality, orientation, size, lighting, noise, etc. Furthermore, the classification performance is directly affected by the quality of features on which it is based. Moreover, pre-processing increases the delay, computational requirements, and compounded errors. In addition, it may hinder the deployment of real-life applications if complicated actions are required by the user.
Previous works suffer from several deficiencies. First, some of these studies artificially increase the size of the dataset by including subtle differences in the dataset images. However, deep learning models are known to be immune to such changes. This duplication artificially improves the results by exposing the model to recognizing the similarities with the original images rather than features of the disease or health states. Second, building a customized CNN model is fraught with risks in terms of overfitting, underfitting, efficiency, and hardware requirements. Using deep transfer learning with pre-existing network architecture, carries with it the inherent credibility of the thousands of applications based on these models and the extensive scrutiny they have gone through, although at the expense of perceived lack of novelty and originality. Third, transfer learning is able to achieve competitive if not superior performance.
In this paper, deep transfer learning was used to detect and classify tomato disease using images of infected leaves. This approach has the advantages of employing well-established, trustworthy, and robust models without the need to redesign/reinvent a custom architecture. Moreover, deep learning models can render feature extraction and image preprocessing needless. The contributions of this paper are as follows:
Develop deep transfer learning models for the detection and classification of tomato diseases from leaf images for nine tomato diseases: bacterial spot, early blight, late blight, leaf mold, mosaic virus, septoria leaf spot, spider mites, target spot, and yellow leaf curl virus. In addition, healthy leaves were discerned as a 10th class;
Implement transfer learning of eleven deep convolutional neural networks models for the classification of leaf images into ten classes. Future Implementation of such a system in smart devices will greatly help farmers do prompt disease control;
Evaluate the performance of the various models using multiple metrics that cover many aspects of the detection and classification capabilities. Moreover, the training and validation times were reported.
The remainder of this paper is organized as follows: the data, convolutional network models, and performance evaluation metrics and setup are presented in detail in
Section 2,
Section 3 discusses the performance evaluation results along with comparison to the related literature and discussion of the models, and we conclude in
Section 4.
3. Results and Discussion
The performance evaluation was performed in order to gauge and compare the classification capabilities of the various deep transfer learning models using well-known and reflective performance indices. Moreover, the evaluation was repeated for 10 times to account for random choices for the various data subsets. In addition, the time requirements for training/validation were reported for all models under the various setups.
Three data split strategies were used (i.e., 50/50, 70/30, and 90/10), which may reveal the abilities of the different models in learning from more data, and any underfitting/overfitting anomalies.
Table 1 shows the mean over 10 runs for the overall F1 score, precision, recall, specificity, and MCC using 50% of the data for training. Most models performed exceptionally well with the highest mean F1 score of 98.5% using DenseNet-201. The worst performing model was SqueezeNet with a 90.9% F1 score. These performance values are corroborated by the confusion matrices for the best and worst performing models as shown in
Figure 3. The matrix for SqueezeNet shows a problematic trend of misclassifying leaves with diseases as healthy, especially spider mites and target spots.
Further insight into the results is provided by
Figure 4, which shows the mean, minimum, and maximum accuracy for all algorithms over 10 randomized runs for the 50/50 data split. Three models (i.e., SqueezeNet, GoogLeNet, and Darknet53) experienced high variability over the 10 random runs in comparison with the other models, which indicates their relative sensitivity to the choice of images included in the training/validations sets. The maximum standard deviation was 2.0% for SqueezeNet. The highest average accuracy was 98.8% for DesneNet-201.
Although the number of images is somewhat acceptable considering the corresponding results, it is worthwhile to explore the effect of increasing the size of the training dataset. Deep learning models, in comparison to traditional machine learning algorithms, are well-known to achieve better performance with more data.
Table 2 shows the mean over 10 runs for the overall F1 score, precision, recall, specificity, and MCC using 70% of the data for training. All models achieved better performance although with diminishing returns. SqueezeNet improved to 91.8% F1 score and DenseNet201 performed the best with an F1 score of 99.0%. The confusion matrices in
Figure 5 corroborate the performance values and reveal a drastically improved diagnosis in comparison with the matrix in
Figure 3 with relation to misclassifying spider mites and target spots as healthy.
Figure 6 shows the fluctuation of the accuracy results for the eleven models over 10 randomized runs. In comparison to
Figure 4, Darknet-53 displayed much less fluctuation with more training data, which means the model had the potential for better learning with more data. Most of the other models experienced less fluctuation; however, the smaller models (i.e., SqueezeNet and GoogleNet) do not seem to benefit from more training data with respect to their sensitivity to the random choices of the images to be included in the training data. The standard deviation of the accuracy results remained 2.0% for SqueezeNet. The highest average accuracy was 99.2% for DenseNet-201 and Darknet-53.
Pushing toward the extreme case of using 90% of the images for training reveals further insight into the models.
Table 3 shows the mean over 10 runs for the F1 score, precision, recall, specificity, and MCC using 90% of the data for training. Both SqueezeNet and GoogLeNet improved further to an F1 score of 93.3% and 95.9%, respectively. However, the other models with high performance values seemed to peek. Darknet53 did not improve and the remaining algorithms showed small improvements (i.e., <1%). DenseNet-201 achieved the maximum mean F1 score of 99.2% and was closely followed by Inceptionv3 at 99.1%.
Figure 7 shows sample confusion matrices for the DensNet-201 and SqueezeNet models using 90% of the data for training. The figure shows that very few images were misclassified. DenseNet-201 classified several categories perfectly. Another observation relates to the ResNet models (101, 50, and 18) with larger numbers in the model’s name corresponding to a deeper network; the models’ performance improved with an increased depth and number of layers.
Regarding the fluctuation of the results with different random choices,
Figure 8 shows that SqueezeNet improved to 0.4% standard deviation for the classification accuracy, but GoogleNet had the highest standard deviation with 1.0%. The ShuffleNet model fluctuation does not seem to be affected by more training data and remained almost fixed throughout the various data splitting strategies. The highest average classification accuracy was 99.4% using DensNet-201.
Table 4 shows the mean training and validation times for all the models using 50/50, 70/30, and 90/10 data split. The SqueezeNet model trains the fastest in comparison to all other models. However, it also performs the worst. On the other hand, the Resnet18 seems to represent a good compromise between better classification performance and faster training time. The model produced a range of F1 scores of 97.2–98.2% with a corresponding training time of 395.5–491.9 s, which is very fast in comparison to the better performing models. Nonetheless, training times may not affect the ability to deploy the models in real-life applications, especially if no live model update is performed. This is because testing does not involve model update and is usually very fast, and training is done once and offline with respect to the deployment. The inference times were in the range of 0.5–7 millisecond/image, which is very small from a human user perspective. These times are independent of the data split and depend on the hardware planform and size of the model.
Several studies were conducted in the literature on the application of machine learning and deep learning algorithms for the identification and classification of plant diseases. Some of these studies (e.g., Hlaing and Zaw [
11] and Annabel and Muthulakshmi [
14]) used the traditional approach of employing image processing techniques to segment the input images (i.e., separation of the leaf or infected area from the background) and to extract texture features that reflect the disease state of the leaf. These features form the input for non-deep traditional machine learning algorithms (e.g., SVM). However, these studies did not consider images of different backgrounds and the classification performance results were worse than their deep learning counterparts. On the other hand, deep learning algorithms do not require these preprocessing steps and the accompanying overhead and errors. Agarwal et al. [
15] modified the well-established structure of the VGG16 model and produced good performance. However, the original VGG16 model has shown its worth over hundreds of applications and thousands of studies and any modification will need to go through rigorous scrutiny. Tm et al. [
13] used a similar approach to ours; however, the comparison was performed for three weaker models only (i.e., AlexNet, GoogleNet, and LeNet). Similarly, Kumar and Vani [
12] experimente with four models (i.e., LeNet, VGG16, ResNet, and Xception) and produced 99.25% accuracy. However, their results were based on 14,903 leaf images from the same dataset with no apparent reason for dropping the remaining 3257 images.
Table 5 shows a summary of the related literature to identify and classify tomato disease.
The present study has some limitations. First, tomato has two major leaf shapes (regular and potato leaf) and multiple other variations relating to leaf dimensions, color, and shades of green. However, the dataset does not include varieties of tomato leaf shapes. This will narrow the applicability and performance of any tomato disease identification system to the specific tomato variant in the dataset. Second, all the images in the dataset have a unified background. It would be worthwhile to investigate leaf images with different backgrounds taken in a non-unified manner. Third, tomatoes are susceptible to other diseases or pests (e.g., Tuta absoluta) that are not part of the dataset. Fourth, the dataset is imbalanced with varying numbers of images in each class.
4. Conclusions
Tomato is an important mass-produced agricultural product that is susceptible to diseases and the consequent yield loss. The use of deep transfer learning and well-established models showed a great potential in many applications in the literature. In this work, we targeted the identification of tomato diseases from infected leaf images. Using leaf images as input, eleven deep learning models were customized and retrained to identify nine tomato diseases in addition to healthy plants. The models (i.e., DarkNet-53, DenseNet-201, GoogLeNet, Inceptionv3, MobileNetv2,ResNet-18, ResNet-50, and ResNet-101, ShuffleNet, SqueezeNet, and Xception) were compared in terms of six common metrics and training/validation times. Although all models performed well, the DenseNet-201 model produced the best results with values larger than 99% for all metrics. However, the SqueezeNet model trained the fastest, and had the shortest inference time (i.e., 0.50 milliseconds/image).
The transfer learning approach carries inherent credibility and less complexity. In addition, it does not require explicit image processing nor feature extraction. Thus, it is suitable to be implemented in standalone smartphone applications, which can aid plant pathologists and farmers in quick and effective disease recognition and control. Future work will consider evolving the models by using incremental learning (i.e., improving the model during deployment). Moreover, the same approach can be adapted to identify diseases from tomato fruit images rather than the leaves. This may require 3D deep learning models to cover all sides of the image. In addition, other models or an ensemble of models can be used for solving the same problem. Field testing and commercial availability in the form of ready-to-download applications are promising areas of future activities.