Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification

Ozturk, Ozan; Sarica, Beytullah; Seker, Dursun Zafer

doi:10.3390/horticulturae11040437

Open AccessArticle

Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification

by

Ozan Ozturk

^1,*

,

Beytullah Sarica

²

and

Dursun Zafer Seker

³

¹

Department of Civil Engineering, Faculty of Engineering and Architecture, Recep Tayyip Erdogan University, 53100 Istanbul, Türkiye

²

Department of Applied Informatics, Graduate School, Istanbul Technical University, 34469 Istanbul, Türkiye

³

Department of Geomatics Engineering, Faculty of Civil Engineering, Istanbul Technical University, 34469 Istanbul, Türkiye

^*

Author to whom correspondence should be addressed.

Horticulturae 2025, 11(4), 437; https://doi.org/10.3390/horticulturae11040437

Submission received: 3 March 2025 / Revised: 16 April 2025 / Accepted: 17 April 2025 / Published: 19 April 2025

(This article belongs to the Section Plant Pathology and Disease Management (PPDM))

Download

Browse Figures

Versions Notes

Abstract

:

Tea leaf diseases are among the most critical factors affecting the yield and quality of tea harvests. Due to climate change and widespread pesticide use in tea cultivation, these diseases have become more prevalent. As the demand for high-quality tea continues to rise, tea has assumed an increasingly prominent role in the global economy, thereby rendering the continuous monitoring of leaf diseases essential for maintaining crop quality and ensuring sustainable production. In this context, developing innovative and sustainable agricultural policies is vital. Integrating artificial intelligence (AI)-based techniques with sustainable agricultural practices presents promising solutions. Ensuring that the outputs of these techniques are interpretable would also provide significant value for decision-makers, enhancing their applicability in sustainable agricultural practices. In this study, advanced deep learning architectures such as ResNet50, MobileNet, EfficientNetB0, and DenseNet121 were utilized to classify tea leaf diseases. Since low-resolution images and complex backgrounds caused significant challenges, an ensemble learning approach was proposed to combine the strengths of these models. The generalization performance of the ensemble model was comprehensively evaluated through statistical cross-validation. Additionally, Grad-CAM visualizations demonstrated a clear correspondence between diseased regions and disease types on the tea leaves. Thus, the models could detect diseases under varying conditions, highlighting their robustness. The ensemble model achieved high predictive performance, with precision, recall, and F1-score values of 95%, 94%, and 94% across folds. The overall classification accuracy reached 96%, with a maximum standard deviation of 2% across all dataset folds. Additionally, Grad-CAM visualizations demonstrated a clear correspondence between diseased regions and specific disease types on tea leaves, confirming the ability of models to detect diseases under varying conditions accurately and highlighting their robustness.

Keywords:

plant diseases; tea leaf diseases classification; deep learning; ensemble learning; Grad-CAM

1. Introduction

Agriculture plays a vital role in maintaining economic stability. The decline in production affects the global economy and jeopardizes farmers’ livelihoods, especially in developing countries. Insufficient food productivity leads to difficulties in accessing food, increases in food prices, and adverse effects on the global economy [1,2,3,4,5,6,7]. One of the main reasons for the decreased harvest yield is diseases affecting plant leaves. Various factors trigger the formation of leaf disease, such as lack of clean water, climate change, drought, pest-borne infections, ecological changes, and incorrect chemical use. However, using different cultivation methods, specifically chemicals, endangers human health and food safety. These diseases may directly impact their structural integrity, quality, and productivity, making it an important issue for plants whose leaves are consumed, such as tea [8].

Many crops, including tea, are susceptible to leaf diseases, which can have significant economic and cultural implications. As one of the most widely consumed beverages worldwide, tea is crucial in global trade and cultural traditions. Regarding production and consumption, the leading countries include China, India, Sri Lanka, Kenya, Vietnam, Indonesia, Myanmar, Türkiye, and Bangladesh. The Food and Agriculture Organization (FAO) reports that the tea harvest has reached 2 million tons worldwide. This industry has generated 3 billion dollars, significantly impacting the global economy. However, while expanding the harvest area from 3 million to approximately 5 million hectares, the tea leaf production yield has not increased simultaneously in the same ratio. The main reason for this could be the plant diseases that affect tea leaves. The demand for tea production is negatively affected. Thus, continuous monitoring of the diseases that affect tea leaves is necessary.

Tea is cultivated in diverse climates and soil conditions, leading to the emergence of numerous diseases affecting tea leaves. These variations make the monitoring of diseases more complex and challenge the generalization of effective solutions [9,10,11]. Traditional disease detection methods are inadequate, as they rely primarily on manual observations and depend heavily on the expertise of specialists. This process demands significant labor and resources, making it impractical for large-scale operations [12]. The challenge is even greater in dense and rugged terrain, particularly during tea leaf harvest [13]. Moreover, it is nearly impossible for experts to visit all farms and gather disease information comprehensively [14]. Despite their expertise, professionals often face difficulties in identifying plant species and diseases due to imperceptible symptoms to the human eye [15,16]. These problems obstruct the development of robust and standardized agricultural policies, as significant policy variations exist from country to country. Traditional farmers also struggle to recognize and classify these diseases, and educating them remains a considerable challenge. Thus, obtaining reliable disease information that is crucial for innovative agricultural strategies is extremely difficult. To address these challenges, rapid and automated classification techniques that minimize human judgment are urgently needed [17,18,19]. Innovative approaches utilizing artificial intelligence, particularly deep learning, can facilitate the automation of disease classification in agriculture.

Deep learning, a cutting-edge approach in artificial intelligence, has achieved remarkable success in classifying and segmenting medical images of various diseases such as Alzheimer’s [20,21], COVID-19 [22,23], and multiple sclerosis [24,25,26], as well as in the classification of leaf diseases of plants. Advanced deep learning models have been developed that can accurately classify textures, patterns, and disease symptoms in images, nearly matching the expertise of professionals. Recently, Convolution Neural Networks (CNNs), primarily used in deep learning, have been extensively utilized to classify leaves of various plants such as tea [27], tomato [28], potato [29], coffee [30], and fruits [31]. Based on these CNN-based models, transfer learning and ensemble learning approaches have also been used for the detection of plant diseases [32,33]. Consequently, deep learning models have successfully classified various plant diseases. Although deep learning models have shown remarkable success in classifying various plant diseases, challenges remain due to symptom texture and color variations. Among various plant diseases, tea leaf diseases make difficulties particularly complex due to their direct impact on tea quality and the diversity of symptoms exhibited. Diseases can affect various parts of the plant but generally progress from the roots to the leaves. Symptoms are most observable on leaves [6,12,34]. Leaves are the most crucial component of tea, directly influencing its quality. More than a dozen common diseases have been identified in tea leaves. The penetration of disease in tea leaves can result in changes in color, shape, and the formation of spots [11,17,35]. Consequently, the accurate classification of tea leaf diseases remains complex [10,13]. In addition to focusing on the accuracy of classification models, it is equally important to ensure that these models can effectively detect and explain disease symptoms. Therefore, the approaches should be robust and explainable. In this study, an ensemble deep learning approach that integrates multiple pre-trained models, including ResNet [36], MobileNet [37], DenseNet [38], and EfficientNet [39], was proposed to classify tea leaf diseases. In addition to statistical results, Grad-CAM (Gradient-Weighted Class Activation Mapping) [40] visualizations highlight the ensemble model’s ability to detect and differentiate diseases more effectively than individual models.

2. Related Works

Deep learning has become a crucial tool for addressing the challenges of plant disease research. In recent years, it has gained significant attention due to the availability of comprehensive datasets, such as the PlantVillage dataset [41]. Although research on tea leaf diseases remains limited, deep learning is increasingly replacing traditional methods. Hu et al. [42] developed a deep learning method to segment disease spots using a conditional deep convolutional generative adversarial network to generate synthetic images, addressing the problem of insufficient data. These synthetic images were then used to train a VGG-16 model to identify tea leaf diseases. As a result, Red Scab and Leaf Blight were detected with 100% accuracy, whereas Red Leaf Spot was detected with 70% accuracy.

Leaf images were generally captured in the field and office environments, often without professional preparation. Accordingly, low-resolution images can limit the feature extraction capacity of deep learning models due to background defects, occlusion, or variations in disease formations. Hu et al. [43] introduced a multiscale feature extraction module to the CIFAR-10 model to improve the detection of three different textures of the disease. By comparing three CNN structures involving LeNet-5, AlexNet, and VGG16, they achieved an average identification accuracy of 92.5%. Chen et al. [44] published a comprehensive dataset containing multiple tea leaf diseases to address the problem of inadequate datasets. Thereafter, they proposed the LeafNet model, which was specifically designed to identify tea leaf diseases. The model achieved a classification accuracy of 90.16%, outperforming the support vector machine (SVM) classifier by 30% and the multilayer perceptron (MLP) classifier by 20%. Similarly, Datta and Gupta [45] published a dataset that included both healthy and diseased tea leaves. They developed a deep learning model based on CNN, resulting in an overall classification accuracy of 96.56%.

Even though CNNs perform better than traditional approaches, they are still challenged by various plant diseases. For this reason, classical CNN-based can be inadequate in extracting the underlying features and can obtain lower accuracy metrics under environmental conditions. Therefore, Bao et al. [10] proposed AX-RetinaNet, improved with advanced multiscale and attention modules. The results indicate that 95.4% of tea leaf diseases can be automatically detected and identified. However, despite the significant progress made by deep learning models, they require many parameters and substantial hardware resources for training. Moreover, these models are prone to overfitting and the vanishing gradient problem. He et al. [46] proposed the residual learning approach to overcome this challenge. Following this approach, Bhuyan et al. [47] proposed Res4Net-CBAM2, an enhanced model based on the ResNet architecture that includes a convolution block attention module [48], to classify tea leaf diseases. A successful classification was obtained with an average accuracy of 98.27% and an F1-score of 98.37%. Heng et al. [49], who experienced a similar problem in the classification of tea leaf diseases, proposed the integration of hybrid pooling into CNNs to increase the generalization of the model. As a result, the hybrid convolutional CNN demonstrated a diagnostic capability that exceeded standard models by at least 2.35%.

Based on these advancements, classification models that take advantage of advanced deep learning techniques have been developed. For instance, Hu et al. [50] applied a post-classification detection approach using VGG16 networks to determine the severity of tea leaf blight. Subsequently, Hu et al. [51] enhanced the severity index by calculating it based on segmented spot areas obtained from U-Net. This methodology provided accurate estimations even in occluded and damaged leaves affected by the same disease. Lanjewar and Panchbhai [11] developed a fully automated cloud-based system that covers the entire process from image capture to data processing. This approach eliminated time-consuming procedures, improving efficiency. The system supports various model configurations and accurately predicts Tea Leaf Blight, Red Leaf Spot, and Red Scab diseases. On the other hand, object detection, an important domain within deep learning, has been studied in some research. Jiang et al. [52] introduced the lightweight and efficient LC3Net model to detect Tea Leaf Blight, achieving an average precision of 92.29%. In particular, the YOLO model, which is one of the most popular detection architectures, has been included in disease detection studies in tea plants with its DDMA-YOLO [53], TSBA-YOLO [54], YOLO-5 [55], and YOLO-7 [56] versions. Although recent studies have shown promising performance in the classification of diseases in tea leaves, some challenges remain. One major concern is the ability of models to effectively learn disease patterns from small, imbalanced datasets [57]. Generating extensive datasets remains difficult due to labeling challenges and the reliance on expert opinions. Another critical challenge is the presence of complex backgrounds in leaf images, which restricts the generalization capabilities of deep learning models. In addition, environmental factors, such as variations in lighting and occlusions, further complicate classification [58].

In this context, ensemble learning has emerged as a promising solution. Integrating multiple classification models improves decision-making through diverse hypotheses, mitigating the overfitting risks of small and imbalanced datasets. Furthermore, ensemble learning improves the representation of phenomena, capturing patterns that individual models may miss [59,60,61]. These advantages have made ensemble learning a major focus in leaf disease classification studies [62,63,64,65]. Ensemble learning can be effectively combined with transfer learning to overcome data limitations further. Moreover, transfer learning enables models to rely on information obtained from diverse datasets by reusing pre-trained weights, which speeds up the optimization process and improves generalization to features such as shape and texture [66,67]. Therefore, combining ensemble learning with transfer learning is expected to play a critical role in advancing plant leaf disease classification.

Another significant challenge is the black-box nature of deep learning models, which considerably constrains their interpretability and transparency. Existing studies particularly emphasize statistical performance metrics, which do not necessarily confirm whether models accurately recognize disease-representative features. However, it is important to understand how these models generalize diseases in decision-making. Hence, interpreting Grad-CAM activation maps is essential to ensure transparency. A comprehensive approach integrating classification accuracy, model interpretability, and practical usability is crucial to advance leaf disease studies and their real-world applications. Considering these expectations, this study proposes an end-to-end framework for classifying tea leaf diseases that uses ensemble learning and transfer learning alongside an explainable technique.

3. Material and Methods

As illustrated in Figure 1, the tea leaf disease classification framework consists of several key steps, beginning with the dataset preparation and ending with the final classification of the images. Data augmentation is applied to enhance the variability of the training data and improve the model’s generalization capability. Feature extraction is performed using pre-trained deep learning models, which are subsequently fine-tuned to optimize performance. Finally, extracted features are integrated into an ensemble model to ensure an accurate and robust model.

3.1. Dataset

The Tea Sickness Dataset [68] used in this study consists of 885 images of tea leaves categorized into eight classes, including seven common tea leaf diseases and one healthy leaf class. The classes are Algal Leaf Spot, Anthracnose, Bird’s Eye Spot, Brown Blight, Gray Blight, Red Leaf Spot, White Spot, and Healthy. The main reasons for selecting these seven disease classes are their common occurrence in tea plantations and their distinct visual characteristics. Figure 2 presents sample images of each class in the dataset. In particular, the images in the dataset were published in their raw form, exhibiting visible variations in background color and lighting conditions. These inconsistencies are observed within individual classes and across different classes. In addition, the accurate detection of leaf lesions is critical for identifying disease symptoms. However, variation in image resolution can lead to additional difficulties, particularly in cases involving diseases such as Red Leaf Spot and Anthracnose, where lesions are less prominent and more challenging to detect.

3.2. Data Preprocessing

The dataset was divided into 90% for training and 10% for testing. Subsequently, the training set was further divided using 5-fold cross-validation, a variant of k-fold that generates stratified folds, ensuring a consistent class distribution across all folds. The images were resized to a standard size of

224 \times 224

to ensure compatibility with the input requirements of the proposed model. Various data augmentation techniques were applied to enhance the diversity of training data and improve model generalization, including rotation, flipping, and zooming. For this purpose, TensorFlow’s ImageDataGenerator was used to apply a series of transformations to each image. These transformations include rotation (up to 20 degrees), horizontal and vertical shifts (up to 20% of the image’s width and height), shearing, zooming (up to 20%), horizontal flipping, brightness adjustment (randomly between 0.8 and 1.2 times), channel shifting (up to 20 units), and nearest neighbor filling for any gaps. A custom advanced augmentation technique was implemented, utilizing contrast stretching with the exposure.rescale_intensity function from the scikit-image library. These augmented images enhance the diversity of dataset, contributing to the training of a more robust model with improved generalization capabilities. Table 1 summarizes the statistical details of the dataset.

3.3. Models

In this study, powerful deep learning models commonly used for image classification, including ResNet50, DenseNet121, EfficientNetB0, and MobileNetV2 were utilized.

3.3.1. ResNet50

Kaiming et al. [36] proposed ResNet based on residual learning. As deep learning architectures become deeper, training becomes increasingly challenging and accuracy can decrease. This originates from the vanishing gradient problem and can cause learning saturation in training. Instead of sequentially transmitting the output of each layer to the next, residual learning employs identity mapping to skip the connection of stacked layers. This solution ensures that the residual value is maintained as information rather than converging to zero. The fundamental architecture of ResNet is designed of plain networks added with shortcut connections incorporated into each 3 × 3 block. This architecture is configurable in terms of the number of layers, and there are various ResNet configurations, such as 18, 34, 50, 101, and 152 layers. As the number of layers increases, it should not be forgotten that it also leads to higher computational costs. Residual learning has been integrated into the baseline of the VGG-16 network. Each convolutional block in the architecture contains 3 × 3 filters. During feature extraction, the dimensions of the feature maps were adjusted by downsampling and expanding them to facilitate model modification and computation. Shortcut connections, the most important component of residual learning, are established by identity mappings between two convolution blocks. The high-level features are filtered according to window size via average pooling. Finally, these features are combined through 1000 fully connected dense layers.

3.3.2. MobileNet

MobileNet [37] is a lightweight deep neural network architecture designed to address challenges that generally arise as models become deeper, including greater model complexity, extended training times, and increased computational costs. This framework, primarily proposed for mobile and embedded vision applications, is distinguished by its use of depthwise separable convolutions. The architecture is built on convolution blocks, where convolution, normalization, and activation operations are performed. These blocks can be repeated in various sizes throughout the network. MobileNet introduces depthwise separable convolution layers at the initial stage, replacing the traditional method of extracting and combining information from the same input through standard convolution blocks. Each input channel is a depthwise convolution, and the outputs obtained are aggregated through a pointwise convolution. This strategy significantly reduces both the complexity of the model and computational burden.

3.3.3. DenseNet121

The vanishing gradient problem, addressed by ResNet, was also tackled by DenseNet, approached by Huang et al. [38]. The primary goal of DenseNet is to achieve robust information transfer through feature maps, similar to ResNet. However, the fundamental distinction between these two architectures lies in their respective methods of information transfer. Unlike ResNet, which directly sums information, DenseNet concatenates it at the level of the features. Additionally, DenseNet performs feature filtering, and the weights of each layer are not stored, enabling it to train with fewer parameters. Each module, called the bottleneck layer, comprises 1 × 1 and 3 × 3 convolution layers. Information extraction expands as feature maps from each block expand with a growth rate. Like ResNet, the DenseNet architecture ends with a 7 × 7 global average pooling layer, a 1000-dimensional fully connected layer and a softmax layer.

3.3.4. EfficientNetB0

As the model complexity increases, researchers have favored scaling fixed Convolutional Neural Networks (ConvNETs) to enhance the generalization capabilities of deep learning models. Using deeper models aims to approximate higher-level, often latent features. However, the scaling operations involving depth, width, and image size are experimental, and developing the desired model is based on arbitrary configurations that require computational cost. To address this challenge, Tan and Le [39] proposed EfficientNet. The EfficientNet architecture was created by redesigning ConvNets. It enables uniform network depth, width, and resolution scaling at specific rates. This approach presents a balanced and efficient structure that allows the simultaneous adjustment of these architectural features to achieve higher accuracy. The baseline of the model is uniformly scaled in width, depth, and resolution. Significant features can be extracted by adjusting the model’s width, while scaling the resolution improves its capability to detect patterns. Although a flexible architecture has been developed, increasing the complexity of the model can result in a decline in accuracy, necessitating fine-tuning during training. EfficientNet has been released in eight different versions, ranging from B0 to B7, each representing an increasing number of parameters.

3.3.5. Proposed Ensemble Architecture

The proposed ensemble model employs a bagging technique by integrating ResNet50, DenseNet121, MobileNetV2, and EfficientNetB0 to classify tea leaf diseases. Each pre-trained model was extended with custom classification layers. First, the spatial information data from the feature maps produced by the pre-trained models was combined using Global Average Pooling. By reducing the dimensionality of the data, essential features were preserved while minimizing complexity. Subsequently, two Dense layers with ReLU activation were applied to the pooled features. Dropout was implemented for regularization, while batch normalization improved training stability. These layers introduced non-linearity to the model and improved the representations learned by the underlying models. Finally, a dense layer with softmax activation was used to generate the final predictions for the eight output classes. This layer transformed the fine-tuned features into class-wise probabilities. The ensemble model then aggregates the predictions via averaging. Thus, the architecture of the ensemble model effectively captured collective insights from the base models, resulting in an accurate and robust classifier suitable for various image classification tasks. The proposed ensemble model is illustrated in Figure 3.

3.3.6. Implementation Detail

The Keras 2.15 (https://keras.io/, accessed on 19 April 2025) and TensorFlow 2.15 (https://www.tensorflow.org/, accessed on 19 April 2025) libraries were used to implement the proposed model in Python 3.10 (https://www.python.org/, accessed on 19 April 2025). An input tensor was defined that matches the image input shape, and list comprehensions were used to process the input through each base model. The outputs were then averaged using Keras’s Average function, combining predictions from all base models into a single ensemble prediction. This ensemble approach improves predictive performance and increases model robustness by mitigating risks associated with overfitting to biases of a single model. The ensemble model effectively combines multiple pre-trained models into a single framework, providing a complex and interpretable solution for the tea leaf disease classification.

During training, Keras callback functions, EarlyStopping, ReduceLROnPlateau, and ModelCheckpoint, were used to avoid overfitting and optimize model performance. ModelCheckpoint saved the best model weights based on the validation loss. EarlyStopping stopped training after no improvement in validation loss was observed for 60 epochs, ensuring effective use of computational resources and robust model performance. Similarly, ReduceLROnPlateau dynamically decreased the learning rate when training plateaued. Furthermore, an exponential decay learning rate schedule was utilized to dynamically adjust the learning rate during training, with an initial rate of 0.0001 and a 10% decay every 10,000 steps. TensorFlow’s ExponentialDecay was used to implement this schedule, and the learning rate was taken out and set as a non-trainable variable. The SparseCategoricalCrossentropy loss function, given in Equation (1), along with this learning rate, was utilized to compile the model with the Adam optimizer [69].

L = - \frac{1}{N} \sum_{i = 1}^{N} log (p_{y_{i}})

(1)

Here, N is the total number of samples,

y_{i}

is the true class label of the i-th sample, and

p_{y_{i}}

is the predicted probability of the true class for the i-th sample.

3.3.7. Evaluation Metrics

Precision, recall, and the F1-score are critical metrics to evaluate the performance of classification models, providing information on the sensitivity and overall effectiveness of the model.

Precision, or positive predictive value, quantifies the accuracy of the positive predictions made by the model. It is defined as the ratio of true positive predictions to the total number of predicted positives, as seen in Equation (2).

$P r e c i s i o n = \frac{T P}{T P + F P}$

(2)

Here, TP, FP, and FN are True Positives, False Positives, and False Negatives, respectively.
Recall, also known as the sensitivity or true positive rate, measures the model’s ability to identify all relevant instances within a dataset. It is the ratio of true positive predictions to the total number of actual positives, as seen in Equation (3).

$R e c a l l = \frac{T P}{T P + F N}$

(3)
F1-score is the harmonic mean of precision and recall, balancing the two metrics, especially in cases of uneven class distribution or when the costs of false positives and false negatives differ. The F1-score ranges from 0 to 1, with 1 indicating the best performance. The F1-Score is calculated in Equation (4).

$F 1 = \frac{2 * P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l} = \frac{2 * T P}{2 * T P + F P + F N}$

(4)

4. Results

Table 2 presents the calculated classification metrics for DenseNet21, EfficientNetB0, MobileNetV2, ResNet50, and the ensemble training approach, all trained without augmented images. Among individual models, ResNet50 achieved the highest classification accuracy in Fold 2, with precision, recall, and F1-scores, 91%, 90%, and 90%, respectively. The ensemble model outperformed ResNet50 by 2% to 10% in all metrics when considering all the fold scores. However, DenseNet21 and MobileNetV2 performed poorly. Additionally, standard deviations were calculated to assess consistency between models. Individual models exhibited standard deviations of up to 9% in their prediction scores. In contrast, the ensemble model demonstrated notably consistent performance, with a standard deviation of only 3% for all metrics.

The results in Table 3 demonstrate the impact of incorporating augmented images into the training dataset. The significant contributions of data augmentation techniques are evident, with performance metrics for individual models improving by approximately 20%. Precision metrics indicate that DenseNet achieved 80%, EfficientNet 92%, and ResNet50 93%. Although DenseNet and MobileNet exhibited notable performance gains compared to other models, their overall prediction accuracy remained inadequate. The ensemble approach, particularly in Fold 4, achieved outstanding results with 96% in all prediction metrics. The inter-fold standard deviation was also low, with 2% in all metrics. This indicates that data augmentation improves the consistency of the ensemble models. These results highlight how the ensemble approach effectively utilizes the classification capabilities of individual models to make stronger predictions. Moreover, it consistently delivers robust predictions across both augmented and non-augmented datasets.

Figure 4 illustrates the confusion matrices, depicting the inter-class relationships of predictions generated by models for Fold 4 without data augmentation techniques (all matrices across all folds are provided in Figure S1). Misclassification rates are notably high in the MobileNet and DenseNet models. In contrast, the ResNet50 and ensemble models exhibited a significantly higher capacity to discriminate diseases. The ensemble approach consistently achieved an accurate classification across all folds, particularly for Red Leaf Spot and Healthy class samples. Although some misclassifications occurred between folds, the ensemble method demonstrated a strong generalizability.

The confusion matrices in Figure 5, representing the dataset augmented with data augmentation techniques, demonstrate a significant decrease in false negative predictions when all models were trained on the augmented dataset. Figure 5 shows the confusion matrix exclusively for Fold 4, while the confusion matrix for all other folds can be found in Figure S2. As the ResNet and ensemble models already exhibited strong generalization ability without data augmentation, this effect is particularly noticeable in the DenseNet and EfficientNet models. Although the ResNet50 and ensemble models produced some incorrect predictions for certain diseases when trained without data augmentation, the application of data augmentation further reduced these errors. Overall, the ensemble approach demonstrated superior predictive performance compared to the ResNet model, with only a few misclassifications in three classes of diseases in different folds. An analysis of the confusion matrix for the Fold 4 ensemble training reveals that the model correctly classified all samples in four classes as true positives. These classes include Gray Blight, Algal Leaf Spot, Healthy, and Red and White Spot. However, one Anthracnose sample was misclassified as Bird’s Eye Spot. Additionally, one Bird’s Eye Spot sample was misclassified as Anthracnose, and one Brown Blight sample was misclassified as White Spot.

Figure 6 and Figure 7 present the prediction results of the test images after training the model with the Fold 4 dataset in two experiments. Each leaf image is labeled with its actual class and the model’s predicted class. The results show that the ensemble learning approach achieved highly accurate predictions, even under challenging conditions. However, in some cases, incorrect predictions appear to be influenced by leaf damage or high background complexity in the images.

The class-based prediction metrics for Fold 4, which achieved the highest accuracy in this study, are presented in Table 4. An accuracy of 100% was achieved in all metrics for the Algal Leaf Spot, Healthy, and Red Leaf Spot classes. The precision metrics exceeded 90% for all classes of disease except Anthracnose, indicating that the model is generally resistant to false-positive predictions in different diseases. Similarly, recall metrics showed strong performance, with 100% achieved for four diseases classes. In general, the model demonstrated an impressive accuracy of 96%.

Understanding Model Decisions for Tea Leaf Disease Classification

The reliability of models in real-world applications can be ensured by demonstrating their ability to generalize the underlying structure of the predicted phenomena. In this context, the usability of the proposed models for classifying diseases in tea leaves can be better understood through a more transparent and interpretive analysis. The ensemble training strategy consistently achieved statistically robust results in all classifications of tea leaf disease. However, examining the decision-making mechanisms of the models during the prediction process can provide a deeper understanding of their effectiveness. Figure 8 illustrates the focus regions for disease detection during classification, as identified by the ensemble, ResNet, EfficientNet, MobileNet, and DenseNet models, using the Grad-CAM method.

Grad-CAM visualizations provide valuable insight into the ensemble model’s decision-making process by identifying critical regions. The observations show that the ensemble model mainly concentrates on the leaf and diseased areas, with minimal influence from background complexity. Specifically, the model distinguishes itself by effectively focusing on the entire leaf in the healthy class. This attention is particularly crucial for identifying diseases such as Anthracnose and Algal Leaf Spot, where accurately perceiving diseased regions, which are often difficult to distinguish, is essential. In Algal Leaf Spot, brown-toned lesions develop on the leaves, whereas Anthracnose produces irregular, circular lesions in similar color tones [70]. The ensemble model effectively identifies the lesions associated with both diseases.

The ensemble model demonstrates strong predictive performance for Anthracnose and Algal Leaf Spot. Furthermore, its ability to accurately distinguish other diseases, such as Brown Blight, Gray Blight, Red Leaf Spot, and White Leaf Spot, highlights the overall robustness of the proposed approach. Brown Blight and Gray Blight present lesions with concentric rings and a yellowish-green or yellow margin. As these lesions expand, they turn brown or gray with tiny black dots and eventually lead to defoliation as dried tissue falls [71]. The ensemble model remained highly focused on the affected regions and fallen leaves, showing less sensitivity to background complexities than other models. Similarly to Anthracnose, spots close to red and white tones generally form on the leaf in the Red and White Leaf Spot. Likewise, in Gray Blight, parts of the leaf can be shed. These diseases produce smaller, rounded structures rather than the larger, irregular lesions seen in Algal Leaf Spot [72]. Additionally, since Red Leaf Spots often appear alongside brown lesions in Algal Leaf disease, models must be capable of distinguishing between them. The results indicate that the ensemble model shows a high predictive capacity for both diseases.

In addition to these diseases, the performance of the ensemble model was also evaluated in other challenging cases, such as Bird’s Eye Spot, which shares characteristics with multiple diseases and presents unique detection challenges. In this disease, circular gray spots with brown edges appear in the center of the leaves, resembling symptoms of various other infections. Grad-CAM visualizations reveal that detection levels for this disease are relatively low in models such as MobileNet and DenseNet, which often focus on areas outside the leaf. Despite its surprisingly high prediction accuracy, a similar issue is observed in the ResNet model. In contrast, the ensemble and EfficientNet models demonstrate better precision by focusing on the leaf and diseased regions.

5. Discussion

Studies on the classification of diseases in tea leaves have been limited due to the scarcity of available datasets, although there are some notable studies. Table 5 presents research that exploits the same dataset as in this study. The results indicate that the ensemble approach proposed in this study achieved outstanding performance, surpassing 96% accuracy in all metrics and outperforming other methods. Bhuyan and Singh [73] applied the Swim Transformer model to rice, maize, and tea leaves. As one of the most advanced transformer models, it achieved over 96% accuracy in datasets of rice and maize leaves. However, its performance in the tea sickness dataset was only 67%, which is considered unsatisfactory compared to the results for other crops. This performance may be attributed to the insufficient and imbalanced number of images in the tea sickness dataset, which is inadequate for training large-scale models like transformers. In both this study and Yücel and Yıldırım [27]’s research, the issue of underfitting was mitigated by employing integrated systems. Compared to Yücel and Yıldırım [27]’s study, our approach demonstrates a significant improvement in classification performance, achieving an approximately 5% increase in F1-score and accuracy metrics. Furthermore, the ensemble model outperforms Heng et al. [49]’s approach by achieving near-perfect predictions for the Algal Leaf Spot, Red Leaf Spot, and White Spot classes in four out of all dataset folds and perfect predictions for the Gray Blight class.

The effectiveness of the ensemble model is also demonstrated compared to studies on the same diseases. The diseases that overlap with our study and similar research are listed in Table 6. However, dataset structure, size, and quality differences influence model performance, making direct comparisons challenging. Additionally, multiclass classification introduces class correlations that impact performance metrics. Since previous studies employed different combinations of tea leaf diseases, our study focuses on evaluating classification performance for shared disease classes rather than comparing overall metrics. This approach offers a more precise understanding of how the ensemble model performs in individual diseases, providing insights that contribute to model optimization.

The classification performance of specific disease classes can be examined in greater detail to highlight these differences and similarities between studies. The Healthy class, one of our study’s most accurately classified categories, also demonstrated high accuracy in other research. Bhuyan et al. [47], using Res4Net-CBAM, and Datta and Gupta [45], utilizing a deep CNN, achieved detection accuracies of approximately 99%, with only a few misclassified images. Similarly, successful results were achieved in classifying the Algal Leaf Spot and the Red Leaf Spot, with perfect accuracy (100%) in four folds in our study. In comparison, the Res4Net-CBAM model achieved precision scores of 99% for the decection of the Algal leaf spot and 97% for the Red Leaf Spot, as reported by Bhuyan et al. [47]. Furthermore, Chen et al. [44] found that their proposed LeafNet model achieved sensitivity scores of 93% and 86% to detect Algal Leaf Spot and Red Leaf Spot, respectively. In the same study, Bird’s Eye and White Spot were also analyzed, with sensitivity scores of 84% and 71%, respectively. In the study, the precision, calculated similarly to the sensitivity, obtained 100% for the Algal Leaf Spot and Red Leaf Spot.

In addition to these diseases, the classification performance of other commonly studied leaf diseases, such as Brown Blight and Gray Blight, must also be considered. Brown Blight has been extensively analyzed in previous research. Although LeafNet reported a relatively low detection accuracy of 63%, Soeb et al. [56] achieved a significantly higher accuracy of 97% using the YOLOv7 model. The model used in this study produced comparable results, with an F1-score of 95%. Furthermore, in training for Fold 4, Gray Blight was predicted with a precision of 100%, a recall of 90%, and an F1-score of 95%, demonstrating an improvement in precision compared to the deep CNN model proposed by Datta and Gupta [45]. However, Anthracnose, although it achieved a recall of 90% and a precision of 82% in all diseases classes, was classified with a precision of 88.18% in LeafNet [44].

Although the developed models have achieved highly successful predictions, challenges remain in maintaining robust performance under varying conditions. Decision support systems and machine learning classifiers alone may be inadequate for accurately detecting and classifying plant diseases. Moreover, many existing studies lack insight into the model decision-making processes. Interestingly, recent research has shown that advanced deep learning models yield superior results. In particular, integrating transfer learning with ensemble algorithms has proven effective in mitigating background noise and reducing the misclassification of diseases in leaf images [66,67]. However, relying solely on statistical results does not adequately reveal how models recognize and interpret disease characteristics. This gap in understanding increases the risk of drawing inaccurate conclusions when developing identification tools for tea leaf diseases. Consequently, this study demonstrates the advantages of ensemble learning over individual models by incorporating Grad-CAM visualizations for model interpretability. The class-wise interpretation of the disease characteristics revealed that tea leaf diseases are characterized by distinct structural patterns, underscoring the importance of developing more comprehensive and explainable approaches in future research. The proposed ensemble model combines lightweight architecture (EfficientNetB0, MobileNetV2) and deep architecture (ResNet50, DenseNet121). Thus, robustness and efficiency are balanced. Although deep models provide hierarchical feature representations, enhancing accuracy and generalization, lightweight models contribute to low-complexity feature extraction, ensuring a reduced computational burden. In addition, data augmentation techniques were observed to effectively improve the performance of models, as demonstrated by the comparison between Table 2 (non-augmentation) and Table 3 (augmentation), addressing the imbalance in the dataset. Moreover, a sparse categorical cross-entropy reduced memory usage and computational burden during training. The proposed ensemble model also adopts a bagging technique, where multiple pre-trained models independently make predictions and average their outputs. This aggregation method improves the robustness of the model and mitigates the impact of class imbalance by reducing variance and improving generalization.

Selecting an optimal single model generally imposes a significant computational burden. However, this study demonstrates that an ensemble learning strategy can effectively address such limitations. Integrating multiple models makes decision-making more robust, and performance metrics surpass those of even the best-performing single model. This approach also mitigates the time-consuming challenges associated with optimizing single-model configurations. Combined with Grad-CAM visualizations, the ensemble strategy enables more accurate and reliable disease texture predictions while minimizing under-performing models. The proposed framework integrates multiple pre-trained convolutional neural networks (CNNs) with a Grad-CAM-compatible architecture, an optimized data augmentation pipeline, and an adaptive averaging-based fusion strategy. The findings of the proposed framework demonstrated that the model is more reliable and easier to understand with better interpretability and robustness.

6. Conclusions

The presence of complex backgrounds and the similarity between different diseases present significant challenges to classifying tea leaf diseases. Our study proposes an ensemble learning approach that integrates multiple pre-trained deep learning classifiers to address the challenges effectively. The ensemble training strategy integrates data augmentation techniques and k-fold validation, ensuring consistent and reliable prediction metrics across all models. The effectiveness of ensemble learning in monitoring leaf diseases and assessing the usability of these models in real-world applications is further supported by Grad-CAM visualizations. As one of the most powerful explainability tools, these visualizations provide insight into how disease characteristics are accurately detected and classified. They also play a crucial role in the effective development of disease patterns. Consequently, the ensemble learning strategy has significant potential for adaptation and application in diverse agricultural contexts. By facilitating the implementation of proactive strategies against economically significant plant diseases, this approach can contribute to improving harvest efficiency and food security. Future research efforts will focus on the development of efficient and automated techniques for real-time applications.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/horticulturae11040437/s1. Figure S1: The confusion matrix for the dataset without augmented images; Figure S2: The confusion matrix for the dataset with augmented images.

Author Contributions

Conceptualization, O.O., B.S., and D.Z.S.; methodology, O.O. and B.S.; software, O.O. and B.S.; validation, O.O., B.S., and D.Z.S.; formal analysis, O.O. and B.S.; investigation, O.O. and B.S.; data curation, O.O. and B.S.; writing—original draft preparation, O.O., B.S., and D.Z.S.; writing—review and editing, D.Z.S.; visualization, O.O. and B.S.; supervision, D.Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The dataset can be accessed from the database of the Mendeley Data via https://data.mendeley.com/datasets/j32xdt2ff5/1, accessed on 19 April 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Amin, H.; Darwish, A.; Hassanien, A.E.; Soliman, M. End-to-End Deep Learning Model for Corn Leaf Disease Classification. IEEE Access 2022, 10, 31103–31115. [Google Scholar] [CrossRef]
Albattah, W.; Nawaz, M.; Javed, A.; Masood, M.; Albahli, S. A novel deep learning method for detection and classification of plant diseases. Complex Intell. Syst. 2022, 8, 507–524. [Google Scholar] [CrossRef]
Andrew, J.; Eunice, J.; Popescu, D.E.; Chowdary, M.K.; Hemanth, J. Deep Learning-Based Leaf Disease Detection in Crops Using Images for Agricultural Applications. Agronomy 2022, 12, 2395. [Google Scholar] [CrossRef]
Umamageswari, A.; Deepa, S.; Raja, K. An enhanced approach for leaf disease identification and classification using deep learning techniques. Meas. Sens. 2022, 24, 100568. [Google Scholar] [CrossRef]
Gautam, V.; Trivedi, N.K.; Singh, A.; Mohamed, H.G.; Noya, I.D.; Kaur, P.; Goyal, N. A Transfer Learning-Based Artificial Intelligence Model for Leaf Disease Assessment. Sustainability 2022, 14, 13610. [Google Scholar] [CrossRef]
Lamba, S.; Saini, P.; Kaur, J.; Kukreja, V. Optimized classification model for plant diseases using generative adversarial networks. Innov. Syst. Softw. Eng. 2023, 19, 103–115. [Google Scholar] [CrossRef]
Mzoughi, O.; Yahiaoui, I. Deep learning-based segmentation for disease identification. Ecol. Inform. 2023, 75, 102000. [Google Scholar] [CrossRef]
Hazarika, L.K.; Bhuyan, M.; Hazarika, B.N. Insect pests of tea and their management. Annu. Rev. Entomol. 2009, 54, 267–284. [Google Scholar] [CrossRef]
Hu, G.; Fang, M. Using a multi-convolutional neural network to automatically identify small-sample tea leaf diseases. Sustain. Comput. Inform. Syst. 2022, 35, 100696. [Google Scholar] [CrossRef]
Bao, W.; Fan, T.; Hu, G.; Liang, D.; Li, H. Detection and identification of tea leaf diseases based on AX-RetinaNet. Sci. Rep. 2022, 12, 2183. [Google Scholar] [CrossRef]
Lanjewar, M.G.; Panchbhai, K.G. Convolutional neural network based tea leaf disease prediction system on smart phone using paas cloud. Neural Comput. Appl. 2023, 35, 2755–2771. [Google Scholar] [CrossRef]
Pal, A.; Kumar, V. AgriDet: Plant Leaf Disease severity classification using agriculture detection framework. Eng. Appl. Artif. Intell. 2023, 119, 105754. [Google Scholar] [CrossRef]
Attallah, O. Tomato Leaf Disease Classification via Compact Convolutional Neural Networks with Transfer Learning and Feature Selection. Horticulturae 2023, 9, 149. [Google Scholar] [CrossRef]
Zhang, N.; Wu, H.; Zhu, H.; Deng, Y.; Han, X. Tomato Disease Classification and Identification Method Based on Multimodal Fusion Deep Learning. Agriculture 2022, 12, 2014. [Google Scholar] [CrossRef]
Reddy, B.S.; Neeraja, S. Plant leaf disease classification and damage detection system using deep learning models. Multimed. Tools Appl. 2022, 81, 24021–24040. [Google Scholar] [CrossRef]
Haridasan, A.; Thomas, J.; Raj, E.D. Deep learning system for paddy plant disease detection and classification. Environ. Monit. Assess. 2023, 195, 120. [Google Scholar] [CrossRef] [PubMed]
Wei, S.J.; Riza, D.F.A.; Nugroho, H. Comparative study on the performance of deep learning implementation in the edge computing: Case study on the plant leaf disease identification. J. Agric. Food Res. 2022, 10, 100389. [Google Scholar] [CrossRef]
Russel, N.S.; Selvaraj, A. Leaf species and disease classification using multiscale parallel deep CNN architecture. Neural Comput. Appl. 2022, 34, 19217–19237. [Google Scholar] [CrossRef]
Paul, S.G.; Biswas, A.A.; Saha, A.; Zulfiker, M.S.; Ritu, N.A.; Zahan, I.; Rahman, M.; Islam, M.A. A real-time application-based convolutional neural network approach for tomato leaf disease classification. Array 2023, 19, 100313. [Google Scholar] [CrossRef]
Sharma, S.; Guleria, K.; Tiwari, S.; Kumar, S. A deep learning based convolutional neural network model with VGG16 feature extractor for the detection of Alzheimer Disease using MRI scans. Meas. Sens. 2022, 24, 100506. [Google Scholar] [CrossRef]
Marwa, E.G.; Moustafa, H.E.D.; Khalifa, F.; Khater, H.; AbdElhalim, E. An MRI-based deep learning approach for accurate detection of Alzheimer’s disease. Alex. Eng. J. 2023, 63, 211–221. [Google Scholar]
Jamshidi, M.; Lalbakhsh, A.; Talla, J.; Peroutka, Z.; Hadjilooei, F.; Lalbakhsh, P.; Jamshidi, M.; La Spada, L.; Mirmozafari, M.; Dehghani, M.; et al. Artificial intelligence and COVID-19: Deep learning approaches for diagnosis and treatment. IEEE Access 2020, 8, 109581–109595. [Google Scholar] [CrossRef] [PubMed]
Aslani, S.; Jacob, J. Utilisation of deep learning for COVID-19 diagnosis. Clin. Radiol. 2023, 78, 150–157. [Google Scholar] [CrossRef]
Sarica, B.; Seker, D.Z. New MS lesion segmentation with deep residual attention gate U-Net utilizing 2D slices of 3D MR images. Front. Neurosci. 2022, 16, 912000. [Google Scholar] [CrossRef] [PubMed]
Sarica, B.; Seker, D.Z.; Bayram, B. A dense residual U-net for multiple sclerosis lesions segmentation from multi-sequence 3D MR images. Int. J. Med. Inform. 2023, 170, 104965. [Google Scholar] [CrossRef]
Ghosal, P.; Roy, A.; Agarwal, R.; Purkayastha, K.; Sharma, A.L.; Kumar, A. Compound attention embedded dual channel encoder-decoder for ms lesion segmentation from brain MRI. Multimed. Tools Appl. 2024, 1–33. [Google Scholar] [CrossRef]
Yücel, N.; Yıldırım, M. Classification of tea leaves diseases by developed CNN, feature fusion, and classifier based model. Int. J. Appl. Math. Electron. Comput. 2023, 11, 30–36. [Google Scholar] [CrossRef]
Shoaib, M.; Hussain, T.; Shah, B.; Ullah, I.; Shah, S.M.; Ali, F.; Park, S.H. Deep learning-based segmentation and classification of leaf images for detection of tomato plant disease. Front. Plant Sci. 2022, 13, 1031748. [Google Scholar] [CrossRef]
Chakraborty, K.K.; Mukherjee, R.; Chakroborty, C.; Bora, K. Automated recognition of optical image based potato leaf blight diseases using deep learning. Physiol. Mol. Plant Pathol. 2022, 117, 101781. [Google Scholar] [CrossRef]
Tassis, L.M.; Krohling, R.A. Few-shot learning for biotic stress classification of coffee leaves. Artif. Intell. Agric. 2022, 6, 55–67. [Google Scholar] [CrossRef]
Wei, K.; Chen, B.; Zhang, J.; Fan, S.; Wu, K.; Liu, G.; Chen, D. Explainable Deep Learning Study for Leaf Disease Classification. Agronomy 2022, 12, 1035. [Google Scholar] [CrossRef]
Turkoglu, M.; Yanikoğlu, B.; Hanbay, D. PlantDiseaseNet: Convolutional neural network ensemble for plant disease and pest detection. Signal, Image Video Process. 2022, 16, 301–309. [Google Scholar] [CrossRef]
Vallabhajosyula, S.; Sistla, V.; Kolli, V.K.K. Transfer learning-based deep ensemble neural network for plant leaf disease detection. J. Plant Dis. Prot. 2022, 129, 545–558. [Google Scholar] [CrossRef]
Abuhayi, B.M.; Mossa, A.A. Coffee disease classification using Convolutional Neural Network based on feature concatenation. Inform. Med. Unlocked 2023, 39, 101245. [Google Scholar] [CrossRef]
Lehmann-Danzinger, H. Diseases and pests of tea: Overview and possibilities of integrated pest and disease management. J. Agric. Trop. Subtrop. 2000, 101, 13–38. [Google Scholar]
Kaiming, H.; Xiangyu, Z.; Shaoqing, R.; Jian, S. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; Volume 34, pp. 770–778. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization. arXiv 2016, arXiv:1610.02391. [Google Scholar]
Hughes, D.P.; Salathé, M. An open access repository of images on plant health to enable the development of mobile disease diagnostics through machine learning and crowdsourcing. arXiv 2015, arXiv:1511.08060. [Google Scholar]
Hu, G.; Wu, H.; Zhang, Y.; Wan, M. A low shot learning method for tea leaf’s disease identification. Comput. Electron. Agric. 2019, 163, 104852. [Google Scholar] [CrossRef]
Hu, G.; Yang, X.; Zhang, Y.; Wan, M. Identification of tea leaf diseases by using an improved deep convolutional neural network. Sustain. Comput. Inform. Syst. 2019, 24, 100353. [Google Scholar] [CrossRef]
Chen, J.; Liu, Q.; Gao, L. Visual Tea Leaf Disease Recognition Using a Convolutional Neural Network Model. Symmetry 2019, 11, 343. [Google Scholar] [CrossRef]
Datta, S.; Gupta, N. A Novel Approach For the Detection of Tea Leaf Disease Using Deep Neural Network. Procedia Comput. Sci. 2023, 218, 2273–2286. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. arXiv 2015, arXiv:1512.03385. [Google Scholar]
Bhuyan, P.; Singh, P.K.; Das, S.K. Res4net-CBAM: A deep cnn with convolution block attention module for tea leaf disease diagnosis. Multimed. Tools Appl. 2023, 83, 48925–48947. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. CBAM: Convolutional Block Attention Module. arXiv 2018, arXiv:1807.06521. [Google Scholar]
Heng, Q.; Yu, S.; Zhang, Y. A new AI-based approach for automatic identification of tea leaf disease using deep neural network based on hybrid pooling. Heliyon 2024, 10, e26465. [Google Scholar] [CrossRef]
Hu, G.; Wang, H.; Zhang, Y.; Wan, M. Detection and severity analysis of tea leaf blight based on deep learning. Comput. Electr. Eng. 2021, 90, 107023. [Google Scholar] [CrossRef]
Hu, G.; Wei, K.; Zhang, Y.; Bao, W.; Liang, D. Estimation of tea leaf blight severity in natural scene images. Precis. Agric. 2021, 22, 1239–1262. [Google Scholar] [CrossRef]
Jiang, Y.; Lu, L.; Wan, M.; Hu, G.; Zhang, Y. Detection method for tea leaf blight in natural scene images based on lightweight and efficient LC3Net model. J. Plant Dis. Prot. 2024, 131, 209–225. [Google Scholar] [CrossRef]
Bao, W.; Zhu, Z.; Hu, G.; Zhou, X.; Zhang, D.; Yang, X. UAV remote sensing detection of tea leaf blight based on DDMA-YOLO. Comput. Electron. Agric. 2023, 205, 107637. [Google Scholar] [CrossRef]
Lin, J.; Bai, D.; Xu, R.; Lin, H. TSBA-YOLO: An Improved Tea Diseases Detection Model Based on Attention Mechanisms and Feature Fusion. Forests 2023, 14, 619. [Google Scholar] [CrossRef]
Xue, Z.; Xu, R.; Bai, D.; Lin, H. YOLO-Tea: A Tea Disease Detection Model Improved by YOLOv5. Forests 2023, 14, 415. [Google Scholar] [CrossRef]
Soeb, M.J.A.; Jubayer, M.F.; Tarin, T.A.; Mamun, M.R.A.; Ruhad, F.M.; Parven, A.; Mubarak, N.M.; Karri, S.L.; Meftaul, I.M. Tea leaf disease detection and identification based on YOLOv7 (YOLO-T). Sci. Rep. 2023, 13, 6078. [Google Scholar] [CrossRef]
Elaraby, A.; Hamdy, W.; Alanazi, S. Classification of Citrus Diseases Using Optimization Deep Learning Approach. Comput. Intell. Neurosci. 2022, 2022, 9153207. [Google Scholar] [CrossRef]
Gangwar, A.; Dhaka, V.S.; Rani, G.; Shrey. Time and Space Efficient Multi-Model Convolution Vision Transformer for Tomato Disease Detection from Leaf Images with Varied Backgrounds. Comput. Mater. Contin. 2024, 79, 117–142. [Google Scholar] [CrossRef]
Sagi, O.; Rokach, L. Ensemble learning: A survey. WIREs Data Min. Knowl. Discov. 2018, 8, e1249. [Google Scholar] [CrossRef]
Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud Univ. - Comput. Inf. Sci. 2023, 35, 757–774. [Google Scholar] [CrossRef]
Khan, M.A.; Alam, S.; Ahmed, W. Enhanced Skin Cancer Diagnosis via Deep Convolutional Neural Networks with Ensemble Learning. SN Comput. Sci. 2025, 6, 124. [Google Scholar] [CrossRef]
Ali, A.H.; Youssef, A.; Abdelal, M.; Raja, M.A. An ensemble of deep learning architectures for accurate plant disease classification. Ecol. Inform. 2024, 81, 102618. [Google Scholar] [CrossRef]
Jha, P.; Dembla, D.; Dubey, W. Deep learning models for enhancing potato leaf disease prediction: Implementation of transfer learning based stacking ensemble model. Multimed. Tools Appl. 2024, 83, 37839–37858. [Google Scholar] [CrossRef]
Pandiyaraju, V.; Kumar, A.M.S.; Praveen, J.I.R.; Venkatraman, S.; Kumar, S.P.; Aravintakshan, S.A.; Abeshek, A.; Kannan, A. Improved tomato leaf disease classification through adaptive ensemble models with exponential moving average fusion and enhanced weighted gradient optimization. Front. Plant Sci. 2024, 15, 1382416. [Google Scholar] [CrossRef]
Bezabh, Y.A.; Ayalew, A.M.; Abuhayi, B.M.; Demlie, T.N.; Awoke, E.A.; Mengistu, T.E. Classification of mango disease using ensemble convolutional neural network. Smart Agric. Technol. 2024, 8, 100476. [Google Scholar] [CrossRef]
Zhu, H.; Wang, D.; Wei, Y.; Zhang, X.; Li, L. Combining Transfer Learning and Ensemble Algorithms for Improved Citrus Leaf Disease Classification. Agriculture 2024, 14, 1549. [Google Scholar] [CrossRef]
Yao, X.; Lin, H.; Bai, D.; Zhou, H. A Small Target Tea Leaf Disease Detection Model Combined with Transfer Learning. Forests 2024, 15, 591. [Google Scholar] [CrossRef]
Kimutai, G.; Förster, A. Tea sickness dataset. Mendeley Data 2022, 2. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. [Google Scholar]
Keith, L.; Ko, W.H.; Sato, D.M. Identification Guide for Diseases of Tea (Camellia sinensis); Plant Disease; University of Hawaii: Honolulu, HI, USA, 2006. [Google Scholar]
Pandey, A.K.; Sinniah, G.D.; Babu, A.; Tanti, A. How the Global Tea Industry Copes With Fungal Diseases—Challenges and Opportunities. Plant Dis. 2021, 105, 1868–1879. [Google Scholar] [CrossRef] [PubMed]
Sinniah, G.D.; Mahadevan, N. Disease Diagnosis in Tea (Camellia sinensis (L.) Kuntze): Challenges and the Way Forward. In Challenges in Plant Disease Detection and Recent Advancements; IntechOpen: London, UK, 2024. [Google Scholar]
Bhuyan, P.; Singh, P.K. Evaluating Deep CNNs and Vision Transformers for Plant Leaf Disease Classification. In Proceedings of the Distributed Computing and Intelligent Technology; Springer: Cham, Switzerland, 2024; pp. 293–306. [Google Scholar]
Krisnandi, D.; Pardede, H.F.; Yuwana, R.S.; Zilvan, V.; Heryana, A.; Fauziah, F.; Rahadi, V.P. Diseases Classification for Tea Plant Using Concatenated Convolution Neural Network. CommIT (Communication Inf. Technol.) J. 2019, 13, 67–77. [Google Scholar] [CrossRef]
Sun, Y.; Jiang, Z.; Zhang, L.; Dong, W.; Rao, Y. SLIC_SVM based leaf diseases saliency map extraction of tea plant. Comput. Electron. Agric. 2019, 157, 102–109. [Google Scholar] [CrossRef]

Figure 1. The framework of the study.

Figure 2. Examples of images from each class in the dataset.

Figure 3. The proposed ensemble model.

Figure 4. The confusion matrix for the dataset without augmented images. Values are normalized row-wise, so each cell shows the proportion of predictions for a given true or false class. The matrix is color-coded from white (0.0) to dark blue (1.0), and numerical annotations inside each cell indicate the exact percentage values.

Figure 5. The confusion matrix for the augmented images (same coloring as in Figure 4 is used).

Figure 6. Prediction samples of the non-augmented dataset for the ensemble model of Fold 4 subsets (Correct predictions are highlighted in green, whereas incorrect predictions are highlighted in red).

Figure 7. Prediction samples of the augmented dataset for the ensemble model of Fold 4 subsets (correct predictions are highlighted in green, whereas incorrect predictions are highlighted in red).

Figure 8. Grad-CAM visualizations for different models and classes.

Table 1. Dataset statistics.

Classes	Total Images	Split in a 90:10 Ratio		Images with Stratified 5-Fold Split		Augmented Images with Stratified 5-Fold Split
Classes	Total Images	Train	Test	Train	Validation	Train	Validation
Algal Leaf Spot	113	102	11	81	21	324	21
Anthracnose	100	90	10	72	18	288	18
Bird’s Eye Spot	100	90	10	72	18	288	18
Brown Blight	113	102	11	82	20	328	20
Gray Blight	100	90	10	72	18	288	18
Red Leaf Spot	143	128	15	102	26	408	26
White Spot	142	128	14	102	26	408	26
Healthy	74	66	8	53	13	212	13
Total	885	796	89	636	160	2544	160

Table 2. The prediction results of no augmented images.

Fold Number	Metrics	Model
Fold Number	Metrics	Densenet121	EfficientNetB0	MobileNetV2	ResNet50	Ensemble
Fold 1	Precision	0.60	0.78	0.60	0.79	0.89
	Recall	0.66	0.71	0.51	0.76	0.88
	F1-Score	0.62	0.68	0.47	0.76	0.87
Fold 2	Precision	0.63	0.82	0.62	0.91	0.94
	Recall	0.66	0.80	0.63	0.90	0.93
	F1-Score	0.63	0.80	0.58	0.90	0.93
Fold 3	Precision	0.59	0.87	0.60	0.87	0.90
	Recall	0.61	0.85	0.53	0.87	0.89
	F1-Score	0.56	0.85	0.47	0.86	0.88
Fold 4	Precision	0.79	0.86	0.69	0.89	0.95
	Recall	0.69	0.82	0.62	0.88	0.94
	F1-Score	0.67	0.81	0.58	0.88	0.94
Fold 5	Precision	0.74	0.81	0.60	0.88	0.92
	Recall	0.69	0.74	0.52	0.85	0.91
	F1-Score	0.64	0.71	0.45	0.83	0.91
Average	Precision	0.67 ± 0.09	0.83 ± 0.04	0.62 ± 0.04	0.87 ± 0.05	0.92 ± 0.03
	Recall	0.66 ± 0.03	0.78 ± 0.06	0.56 ± 0.06	0.85 ± 0.05	0.91 ± 0.03
	F1-Score	0.62 ± 0.04	0.77 ± 0.07	0.51 ± 0.06	0.85 ± 0.05	0.91 ± 0.03

Table 3. The prediction results of augmented images.

Fold Number	Metrics	Model
Fold Number	Metrics	DenseNet121	EfficientNetB0	MobileNetV2	ResNet50	Ensemble
Fold 1	Precision	0.80	0.92	0.76	0.94	0.96
	Recall	0.76	0.91	0.67	0.93	0.96
	F1-Score	0.76	0.90	0.65	0.93	0.95
Fold 2	Precision	0.81	0.91	0.75	0.91	0.92
	Recall	0.73	0.91	0.69	0.90	0.91
	F1-Score	0.71	0.91	0.67	0.90	0.91
Fold 3	Precision	0.80	0.92	0.50	0.92	0.94
	Recall	0.78	0.91	0.62	0.90	0.92
	F1-Score	0.77	0.91	0.53	0.90	0.92
Fold 4	Precision	0.83	0.92	0.71	0.93	0.96
	Recall	0.74	0.91	0.66	0.93	0.96
	F1-Score	0.71	0.91	0.65	0.93	0.96
Fold 5	Precision	0.73	0.90	0.74	0.90	0.95
	Recall	0.74	0.88	0.67	0.90	0.94
	F1-Score	0.68	0.88	0.67	0.90	0.94
Average	Precision	0.79 ± 0.04	0.91 ± 0.01	0.69 ± 0.11	0.92 ± 0.02	0.95 ± 0.02
	Recall	0.75 ± 0.02	0.90 ± 0.01	0.66 ± 0.03	0.91 ± 0.02	0.94 ± 0.02
	F1-Score	0.73 ± 0.04	0.90 ± 0.01	0.63 ± 0.06	0.91 ± 0.02	0.94 ± 0.02

Table 4. The results of class-based prediction metrics.

Leaf Class	Prediction Metrics
Leaf Class	Precision	Recall	F1-Score
Anthracnose	0.82	0.90	0.86
Gray Blight	1.00	0.90	0.95
Algal Leaf Spot	1.00	1.00	1.00
Healthy	1.00	1.00	1.00
Bird’s Eye Spot	0.90	0.90	0.90
Brown Blight	1.00	0.91	0.95
Red Leaf Spot	1.00	1.00	1.00
White Spot	0.93	1.00	0.97
Weighted Average	0.96	0.95	0.95
Overall Accuracy	0.96

Table 5. Comparison of test results of studies using the same dataset as this study.

Study	Model	Precision	Recall	F1-Score	Accuracy
Bhuyan and Singh [73]	Swin Transformer	0.67	0.67	0.67	0.67
Yücel and Yıldırım [27]	Hybrid Model	-	-	0.91	0.91
Heng et al. [49]	Combination of CNN and Weighted Random Forest (WRF)	0.92	0.92	0.92	0.92
Our study	Ensemble (Augmented-Fold 4)	0.96	0.96	0.95	0.96

Table 6. Comparison of the diseases used in our study and other studies.

Study	Leaf Class
Study	Healthy	Algal Leaf Spot	Anthracnose	Bird’s Eye Spot	Brown Blight	Gray Blight	Red Leaf Spot	White Spot
Hu et al. [42]							X
Hu et al. [43]	X
Krisnandi et al. [74]	X
Chen et al. [44]		X	X	X	X	X	X	X
Sun et al. [75]			X		X
Bhuyan et al. [47]	X				X		X
Lanjewar and Panchbhai [11]							X
Soeb et al. [56]					X
Datta and Gupta [45]	X	X			X	X	X
Our study	X	X	X	X	X	X	X	X

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ozturk, O.; Sarica, B.; Seker, D.Z. Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification. Horticulturae 2025, 11, 437. https://doi.org/10.3390/horticulturae11040437

AMA Style

Ozturk O, Sarica B, Seker DZ. Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification. Horticulturae. 2025; 11(4):437. https://doi.org/10.3390/horticulturae11040437

Chicago/Turabian Style

Ozturk, Ozan, Beytullah Sarica, and Dursun Zafer Seker. 2025. "Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification" Horticulturae 11, no. 4: 437. https://doi.org/10.3390/horticulturae11040437

APA Style

Ozturk, O., Sarica, B., & Seker, D. Z. (2025). Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification. Horticulturae, 11(4), 437. https://doi.org/10.3390/horticulturae11040437

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable and Robust Ensemble Deep Learning Framework for Tea Leaf Disease Classification

Abstract

1. Introduction

2. Related Works

3. Material and Methods

3.1. Dataset

3.2. Data Preprocessing

3.3. Models

3.3.1. ResNet50

3.3.2. MobileNet

3.3.3. DenseNet121

3.3.4. EfficientNetB0

3.3.5. Proposed Ensemble Architecture

3.3.6. Implementation Detail

3.3.7. Evaluation Metrics

4. Results

Understanding Model Decisions for Tea Leaf Disease Classification

5. Discussion

6. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI