GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet

Mustak Un Nobi, Md.; Rifat, Md.; Mridha, M. F.; Alfarhood, Sultan; Safran, Mejdl; Che, Dunren

doi:10.3390/agronomy13092240

Open AccessArticle

GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet

by

Md. Mustak Un Nobi

¹,

Md. Rifat

¹,

M. F. Mridha

¹

,

Sultan Alfarhood

^2,*

,

Mejdl Safran

²

and

Dunren Che

³

¹

Department of Computer Science, American International University-Bangladesh, Dhaka 1229, Bangladesh

²

Department of Computer Science, College of Computer and Information Sciences, King Saud University, P.O. Box 51178, Riyadh 11543, Saudi Arabia

³

School of Computing, Southern Illinois University, Carbondale, IL 62901, USA

^*

Author to whom correspondence should be addressed.

Agronomy 2023, 13(9), 2240; https://doi.org/10.3390/agronomy13092240

Submission received: 2 August 2023 / Revised: 16 August 2023 / Accepted: 24 August 2023 / Published: 26 August 2023

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The guava plant is widely cultivated in various regions of the Sub-Continent and Asian countries, including Bangladesh, due to its adaptability to different soil conditions and climate environments. The fruit plays a crucial role in providing food security and nutrition for the human body. However, guava plants are susceptible to various infectious leaf diseases, leading to significant crop losses. To address this issue, several heavyweight deep learning models have been developed in precision agriculture. This research proposes a transfer learning-based model named GLD-Det, which is designed to be both lightweight and robust, enabling real-time detection of guava leaf disease using two benchmark datasets. GLD-Det is a modified version of MobileNet, featuring additional components with two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, and SoftMax as a classification layer with the last lighter dense layer. The proposed GLD-Det model outperforms all existing models with impressive accuracy, precision, recall, and AUC score with values of 0.98, 0.98, 0.97, and 0.99 on one dataset, and with values of 0.97, 0.97, 0.96, and 0.99 for the other dataset, respectively. Furthermore, to enhance trust and transparency, the proposed model has been explained using the Grad-CAM technique, a class-discriminative localisation approach.

Keywords:

guava leaf disease; deep learning; agriculture; modified MobileNet; Grad-CAM

1. Introduction

The guava is a popular tropical fruit widely consumed in both urban and rural areas of Bangladesh. Belonging to the Myrtaceae family, this fruit originated in the American tropics and found its way to Portugal during the 17th century. Guava is rich in essential nutrients such as Vitamin C, Calcium, Iron, Nicotinic Acid, Vitamin B6, Magnesium, and Phosphorus, and is notably free of cholesterol. The fruit is known for its beneficial effects on various health conditions including diarrhoea, dysentery, blood pressure, diabetes, and immune system support [1,2]. Bangladesh is recognised as one of the largest guava-producing countries globally, with extensive cultivation covering approximately 0.3 million hectares and yielding around 3.7 million metric tons annually [2]. The guava plant is highly adaptable to diverse climates and soil conditions, making it commercially significant in both subtropical and tropical regions. However, the quality and quantity of guava production are significantly impacted by the vulnerability of guava plants to leaf diseases. These diseases not only increase production costs but also hinder food supply in the national and global markets [3]. Currently, the prevailing method for detecting guava leaf diseases in Bangladesh relies on manual labour and visual observation based on farmers’ experiences. This conventional approach is time-consuming, expensive, and unreliable. Moreover, incorrect disease detection can lead to misinterpretation of the situation [1,3]. Alternatively, modern techniques, such as spectrometers [4] and molecular methods like real-time polymerase chain reaction [5], have been introduced for the analysis and detection of plant leaf diseases. However, these methods require highly skilled personnel and involve significant time and cost investment, including the use of crop protection chemicals [3].

In the year 2050, the world’s population is expected to surpass 9 billion, resulting in a twofold rise in food consumption. To meet the future demand, food production will need to increase by a minimum of 70%. Ensuring local and global food security is a crucial and challenging issue for sustainable human development [1,6]. Therefore, it is imperative to have a conducive environment, disease-free conditions, high-quality produce, and high-yielding crops to boost food production and meet future demands [6]. Bangladesh has been ranked 10th in tropical fruit production and 8th in guava fruit production worldwide, as reported by the Food and Agricultural Organisation (FAO) [7]. Due to the presence of various guava plant varieties and diverse environmental factors, such as tropical, sub-tropical, and non-tropical regions [2], guava leaf diseases differ across different regions, making their detection challenging and sensitive. Detecting various guava leaf diseases, such as pest attacks, canker, nutrient deficiency, dot, wilt diseases, mummification, white fungus, and anthracnose, presents a challenging undertaking [3]. To address this challenge, the integration of Computer Vision (CV) [6] with automatic classification tools and pattern recognition algorithms has shown exceptional performance in detecting guava leaf diseases [3]. Artificial intelligence (AI), machine learning (ML), CV, and deep learning (DL) techniques have revolutionised the development of computerised models for the analysis, diagnosis, and accurate and timely detection of plant leaf diseases [3].

CV using DL techniques offers a sustainable mechanism with which to detect leaf diseases from orchards. DL, a subset of CV, enables efficient semantic segmentation and classification of image or video data in precision agriculture. This approach [3] has gained significant traction in various applications such as disease detection, ripeness inspection, determination of harvesting time, fruit quality assessment, counting, sorting, yield prediction, and more. By implementing CV-based monitoring for early prevention, the agricultural sector can ensure the production of high-quality fruits, automate field maintenance on a large scale, achieve successful yields, and prevent substantial output losses [8]. However, it is important to note that the natural environment is complex and dynamic, and characterised by factors like varying illumination conditions, leaf obstructions, changes in brightness, overlapping fruits or leaves, similarity in plant backgrounds, and other challenges. These factors pose additional difficulties in accurately detecting leaf diseases [1]. Therefore, it is crucial to employ a state-of-the-art method that is precise, accurate, efficient, cost-effective, robust, lightweight, and suitable for embedding in automated devices.

Thus, an automated framework utilising images for real-time, large-scale leaf disease detection proves to be a more cost-effective, time-saving, convenient, and practical solution compared to conventional and other existing methods. This framework combines unmanned aerial vehicles (UAVs) and digital cameras integrated with smartphones. Notably, the MobileNet model has demonstrated a remarkable performance in leaf disease detection [3]. MobileNet is a specialised convolutional neural network (CNN) known for its lightweight nature, making it ideal for low- processing-capability devices such as mobile devices. Despite its lightness, MobileNet maintains a favourable level of accuracy and efficiency, ensuring it consumes less memory and processing power. Compared to traditional convolutional models, MobileNet has fewer trainable parameters, making it a cost-effective and computationally efficient choice [3,9,10].

This research introduces a transfer learning-based DL architecture, GLD-Det, based on a modified MobileNet designed specifically for the large-scale detection of guava leaf disease in agriculture. The model utilises the advantages of MobileNet, which makes it faster, more cost-effective, robust, and lightweight. The modification of MobileNet is performed by featuring additional components such as different pooling layers, batch normalisation layers, dropout layers, dense layers, SoftMax as a classification layer, and ReLU as an activation function. Since the manifestation of guava leaf disease’s visual characteristics differs across various regions, this research employs two benchmark datasets from distinct geographical locations—Pakistan and Bangladesh—to guarantee the robustness of the GLD-Det architecture. The preprocessing technique is introduced to streamline feature extraction from both datasets to enhance the classifier accuracy. The performance of the proposed model compares with several existing models based on the same datasets by using evaluation metrics, and also the Grad-CAM technique is used to emphasise the trust and the transparency of the proposed model at the end. Hence, this approach presents an ideal and sustainable disease detection system for guava leaves, particularly when integrated into smartphones.

The study’s overall contributions encompass:

This paper presents GLD-Det, an advanced detection framework for guava leaf diseases that offers enhanced speed, robustness, precision, accuracy, efficiency, cost-effectiveness, and lightweight design. This framework can effectively identify various types of guava leaf diseases.
This paper presents the efficacy of incorporating various components, such as to help reduce the spatial dimensions of feature maps by introducing additional max and global average pooling layers, to tackle the overfitting problem introducing a regularisation technique called dropout layers, to accelerate the training process introducing batch normalisation layers, to provide non-linearity with efficient computation and better training stability introducing dense layers with a ReLU activation function, and to employ multiclass classification, introducing a lighter dense layer with a SoftMax activation function. This study evaluates the impact and effectiveness of these elements in the proposed architecture.
This paper suggests a customised MobileNet architecture based on transfer learning, designed to be device-friendly and can be deployable on portable and resource-constrained computational devices in the future. This approach will enable the implementation of the model directly on devices like smartphones, eliminating the need for a cloud-based prediction system.
This paper includes a comparative analysis between the proposed model and various existing base models, including EfficientNetV2B2, EfficientNetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2.
This paper introduces a model capable of analysing guava leaf images in dynamic and unpredictable real-time settings by using two benchmark datasets to ensure the robustness of the proposed model that does not require leaf segmentation and utilises the ReLU function, resulting in a lighter and more user-friendly solution for farmers. Furthermore, the Grad-CAM technique is applied to elucidate the functioning of the model.

The remaining sections of the paper are structured as follows: Section 2 provides a comprehensive literature review, Section 3 presents the materials and method of the proposed GLD-Det architecture, Section 4 contains the results and discussion of the models, and Section 5 concludes the study and outlines future work.

2. Literature Review

In the past, traditional machine learning techniques, such as random forest (RF), K-nearest neighbour (KNN), and support vector machine (SVM), were utilised for detecting leaf diseases [11,12]. As an example, Song et al. [12] employed an SVM for corn leaf disease detection and achieved a detection accuracy of 89.6%. This research has utilised a multi-class dataset. Similarly, Abirami et al. [11] employed SVM and KNN-based classifiers to detect guava leaf diseases using 125 sample images, achieving detection accuracies of 97.2% and 92%, respectively. However, these proposed methods exhibited relatively low detection accuracies and faced limitations in managing large datasets that contain various essential features.

In recent years, ML has demonstrated remarkable performance and has gained significant popularity in the field of precision agriculture [3]. ML techniques demonstrate proficiency in managing large databases and are adept at executing intricate tasks, including image and disease classification, relation extraction, and pattern analysis [13]. For instance, Ji et al. [14] utilised the PlantVillage dataset and proposed a method based on CNN to detect grape plant diseases. Their approach achieved validation and test accuracies of 99.17% and 98.57%, respectively. Jiang et al. [15] utilised a real-time DL approach to classify apple leaf diseases using the Apple Leaf Diseases Dataset (ALDD). The model is based on CNN and their model achieved a processing speed of 23.13 frames per second (FPS) and a mean absolute precision (mAP) of 78.80%. A VGG-16 architecture was developed by Xu et al. [16] for corn disease identification, employing transfer learning techniques. They achieved an impressive accuracy of 95.33% on a relatively small dataset containing corn disease images taken against challenging field backgrounds. For recognizing anthracnose in walnut leaves, Anagnostis et al. [17] proposed a method that achieved an accuracy within the range of 92.40% to 98.70%. They developed the model based on CNN. To identify nine types of tomato diseases, Maeda-Gutiérrez et al. [18] developed an ensemble model, which achieved an impressive accuracy of 99.72%. They combined AlexNet, GoogleNet, and InceptionV3. For corn leaf disease identification, Wenxia et al. [19] proposed an improved CNN model with an accuracy of 95.74%. Furthermore, Wang et al. [20] devised the AT-AlexNet architecture by incorporating a new down-sampling attention module with the Mish activation function, resulting in an accuracy of 99.53% for corn disease identification. Compared to traditional ML methods, DL approaches have demonstrated higher accuracy in detecting plant diseases. However, DL models tend to have large parameter sizes and longer runtimes, making them less suitable for development on mobile terminals.

In very recent years, the You Only Look Once (YOLO) pipeline has gained popularity for real- time object detection, including applications in fruit and leaf disease detection and classification [3]. The YOLO architecture [21] is preferred due to its faster processing speed, robustness, and comparatively higher accuracy when compared to other object detection pipelines. For instance, Kateb et al. [21] proposed a model called FruitDet, designed on the YOLO pipeline, for the detection of multiple fruits in real-time within orchards. Their model utilised five benchmark datasets comprising eight different fruit classes. FruitDet employed the densely connected CNN (DenseNet) as the backbone architecture and incorporated attentive feature aggregation. This approach outperformed YOLOv3 and also provided a better performance than YOLOv4. To enhance detection performance, the method introduced blackout regularisation, which disregards the object size for head detection mapping, leading to improved results. Xu et al. [8] introduced YOLO-Jujube, a CNN-based method for jujube fruit detection and ripeness inspection using the YOLO architecture. YOLO-Jujube achieved a detection performance with 11.7 giga floating point operations per second (GFLOPs), an average precision (AP) of 88speed of 245 frames per second (FPS). The model was trained and evaluated on three recorded jujube video datasets and a total of 1959 images, outperforming the YOLOv-tiny family, particularly YOLOv3, v4, v5, and v7. Fu et al. [22] proposed YOLO-Banana, a CNN-based rapid detection model for banana bunches and stalks. YOLO-Banana achieved high detection performance with an AP of 98.4% for banana bunches and 85.98% for banana stalks, resulting in an overall mean Average Precision (mAP) of 92.19%. To detect four types of defects in real-time for kiwi fruit, Yao et al. [23] developed a modified version of YOLOv5 using a specifically curated dataset. Their method achieved a mean average precision (mAP) of 94.7% ± 0.5, demonstrating an effective defect detection performance. To identify different diseases on a single apple leaf, Roy et al. [24] proposed a real-time DL method using YOLOv4. Their approach achieved a mean Average Precision (mAP) of 92.2%, 56.9 FPS, and f1 score of 95.9% for disease detection performance. While the YOLO pipeline offers faster FPS speed and higher accuracy based on mAP and AP compared to traditional DL methods, it does have some limitations. The YOLO framework is computationally heavy, with a larger number of trainable parameters, which results in higher time and space complexity. This makes it impractical to embed the pipeline in mobile devices and inconvenient for farmers. Additionally, the YOLO pipeline may not be suitable for detecting all types of plant leaf diseases.

Researchers have conducted several studies on the detection of guava leaf diseases, though the number of studies is very few. Al Haque et al. [25] developed a CNN-based DL method that achieved better performance to detect multiple guava diseases such as rot, canker, and anthracnose. The model achieved 95% accuracy. A novel approach was proposed by Howlader et al. [26] for classifying guava leaf diseases using Deep CNN (DCNN), with a 98% accuracy rate. They used a dataset with multi-class diseases such as rust, whitefly, healthy and algal spot. Based on Red-Green-Blue (RGB) images, Almadhor et al. [27] proposed a model to detect four types of diseases on guava leaves and fruits. The model was built on an AI-driven architecture. They utilised five classifiers, including cubic SVM, bagged tree, fine KNN, boosted tree, and complex tree. Among them, the bagged tree provided the best result with 99% accuracy. However, the main limitation of this method is the small dataset, consisting of only 393 sample images. Perumal et al. [28] proposed an SVM-based approach for detecting a single disease on a guava leaf, achieving an accuracy of 98.17%. Their method focused on identifying a specific disease in individual guava leaves. Mostafa et al. [1] introduced a DCNN-based method for guava fruit disease detection. They utilised five different CNNs and achieved a classification accuracy of 97.74% using ResNet-101. Rashid et al. [3] proposed a model to detect guava leaf disease that utilises a hybrid DL framework based on the YOLOv5 model. The method incorporates several components, including a modified MobileNetv2 and U-Net with the leaf segmentation method. The researchers collected two datasets specifically for this study. The model achieved 92% accuracy for U-Net, 73% precision, 73% recall, and 71% ± 0.5 mAp for detection performance.

The aforementioned studies primarily focused on the development of large-sized models without proper consideration for model optimisation or reducing their size for practical application on end-user devices. However, there have been some research efforts in this direction. Yang et al. [29] utilised transfer learning techniques and proposed a model based on MobileNet and InceptionV3 architectures for plant leaf disease identification on mobile phones. They developed two crop disease classification models with a focus on mobile device implementation. To detect crop disease, Yu et al. [30] introduced the CDCNNv2 model, which is based on the ResNet50 architecture. This architecture can be embedded into Android applications. Similarly, Fan et al. [31] constructed a model by using an improved version of VGG16 architecture combined with transfer learning. Their model was developed for the identification of grape leaf diseases specifically designed for Android mobile phones. Overall, these studies aimed to optimise models and adapt them for mobile device applications, taking into account factors such as model size, computational efficiency, and compatibility with specific mobile operating systems. Notably, Howard et al. [10] proposed a streamlined DL architecture called MobileNets in 2017 based on depth-wise separable CNN, initially developed for mobile and embedded vision applications by building a lightweight deep neural network.

Therefore, there is a crucial need for additional research to gain a deeper understanding and knowledge of lightweight CNN-based architectures that can effectively detect leaf diseases. By reducing the model’s complexity and parameters, improved accuracy can be achieved. Inspired by this concept, this research has developed the GLD-Det model, based on a modified MobileNet, which is a lightweight CNN. This proposed model offers enhanced speed, efficiency, and robustness, and can accurately identify various types of diseases present on guava leaves, even in complex scenarios.

3. Materials and Methods

This research proposed a transfer learning-based DL method to detect leaf disease from guava orchards in real-time scenarios. The methodology pertains to the overall approach and rationale of a research project. It involves familiarizing oneself with the techniques and theories employed in the field to develop a strategy that aligns with the research objectives. The chosen approach and techniques should be carefully considered, ensuring that they are well-suited to the research goals and are capable of producing valid and dependable outcomes. By doing so, the methodology can provide clarity regarding the decision-making process behind the research design, demonstrating its appropriateness and potential for generating authentic and trustworthy findings. For this purpose, a 7-stage module has been constructed, which is depicted in Figure 1.

3.1. Data Collection

A dataset is comprised of a collection of unprocessed statistics and analytical materials. For this research, two guava leaf disease datasets have been chosen, constructed by experienced researchers from Pakistan and Bangladesh. To reduce the complexity, this research named these datasets Dataset D1 and Dataset D2, respectively, to track and describe these datasets throughout this research. Both datasets are benchmarked and accessed from Mendeley Data. Dataset D1 consists of 1842 images. Dataset D2 consists of 2243 images. The reason for choosing multiple datasets is to ensure the robustness of this research.

3.2. Dataset Description

Dataset D1 has four distinguished guava leaf disease classes: canker, dot, mummification, and rust, followed by a healthy class; as shown in Table 1. This dataset originates from the tropical areas of Pakistan, created and supervised by experienced researchers in this field during early 2021. The pixel size of all the images is 6000 × 4000 with 300 dpi regulation.

Dataset D2 has been divided into original image and augmented image types. Each type has four distinguished disease classes: phytophthora, red rust, scab, and styler end rot, followed by a disease-free class, shown in Table 2. The pixel size of all the images is 512 × 512. This dataset originates from Bangladesh and was obtained from a large guava garden in early 2022. The data collection was carried out by a proficient team from Bangladesh Agricultural University, located in Mymensingh, Dhaka. They used a Nikon Digital Camera as advanced photographic equipment, which is a single-lens reflex (SLR) type. The model number is D3200 and has an F mount. The focal length of the camera is 1.5× and the resolution is 23.2 × 15.4 mm, which allows an efficient field of view. The camera has a CMOS sensor. During the image-capturing process, they set fps as 4 with manual focus, shutter speed as 1/250 s, and default values for other settings.

The sample images of Dataset D1 and Dataset D2 are shown in Figure 2 with five distinct classes.

3.3. Data Preprocessing

This research rescaled the pixel values of 6000 × 4000 and 512 × 512 of Datasets D1 and D2, respectively, to a lower pixel size. This procedure accelerated the training time and facilitated proficient training. For EfficientNetB2 and EfficientNetV2B2, the input size was set to 260 × 260 × 3; and 240 × 240 × 3 for EfficientNetB1. The input image size was set to 224 × 224 × 3 for the proposed architecture, EfficientNetB0, and MobileNetV2. To prepare the machine learning model using various algorithms, it is common to convert raw data into pixel array formats. Prior to model training, the images in the dataset underwent preprocessing to streamline feature extraction. This process also enhanced the classifier accuracy. To represent the image data, RGB coefficients were employed, with values ranging from 0 to 255. However, dealing with higher values poses challenges. To overcome this issue, a scaling factor of 1/255 was employed to normalise the images in both datasets. As a result, all pixel values were transformed to a range between 0 and 1.

Data augmentation is an essential step in data preprocessing that involves generating additional training examples by applying different transformations to existing ones. The main purpose of data augmentation is to artificially increase the size of the training dataset, thereby enhancing its diversity. This is achieved without the necessity of collecting new data, which can be expensive and time-consuming. In this research, for both base models and the proposed model, decent data augmentation was used by applying several combinations, where parameters were set as “width_shift_range” = 0.2, “height_shift_range” = 0.2, “rotation_range” = 0.2, “vertical_flip” = True, and “horizontal_flip” = False. It is also noted that the shuffle was set as “True” during training and “False” during validation and testing.

The datasets were split into two parts for testing and training purposes. This research randomly selected 75% of images for testing and 25% of images for training from Dataset D1 and 80% and 20% from Dataset D2, respectively. Dataset D1 consists of 1842 images in total, where for training purposes 1377 images and for testing purposes 465 images were chosen. Dataset D2 consists of 5426 images including 4899 augmented images. As this research already constructed a data augmentation procedure, only original images were considered with a total number of 527 images, where 422 and 105 images were chosen for training and testing purposes, respectively. Table 3 and Table 4 show the split section for the training and testing of Datasets D1 and D2, respectively.

3.4. Environment Setup

This research used Python coding for the detection mechanisms. The TensorFlow [32] and Keras library, which are open-source and freely available tools for data flow and DL models, were utilised for training all the pre-trained models. This research used Anaconda as an environment for Python, and Jupyter Notebook as a text editor. It is noted that input size, augmentation parameter, batch size, number of epochs, an optimiser with a learning rate, an activation function, an Explainable Artificial Intelligence (XAI) tool, evaluation metrics, etc., have been considered in this environment. Optimiser = “Adam”, Batch Size = “16”, Learning Rate = “

1 \times 10^{- 5}

”, Loss = “categorical_crossentropy”, Activation Function = “ReLU”, Epochs Size = “80”, Patience = “3”, and Performance Metrics = “Accuracy, Precision, Recall, AUC” have been used in this research across all experiments; both base models and proposed model. This research set monitor = “val_loss”, mode = “min” for early stopping. The “restore_best_weights” was set to True for overfitting problems for all the implementation processes. The GradeCam was set for XAI for only the proposed model. The input size was set as the model’s requirement definition. The environmental setup details of this research are presented in Table 5, as summarised below.

3.5. Proposed GLD-Det Architecture

This research introduced transfer learning to construct a model called GLD-det for guava leaf disease detection from Dataset D1 and Dataset D2. The model is based on modified MobileNet by adding extra layers with MobileNet. Due to limited data availability, transfer learning methods can be valuable in reducing both training time and computational expenses. This research has made modifications to a well-established and robust pre-trained model called MobileNet [10], which was originally trained on the ImageNet dataset. MobileNet comprises 28 convolution layers, serving as the foundational feature extraction component of the model. The MobileNet model utilises depth-wise separable convolutions that use 9 times less computation [10] than standard convolution, a type of factorised convolution that decomposes a regular convolution into two parts: a 3 × 3 Depth-wise (dw) convolution and a 1 × 1 point-wise convolution. By employing width and resolution multipliers, MobileNet achieves a smaller and faster model by sacrificing a moderate level of accuracy. Approximately 95% of MobileNet’s computation time is allocated to 1 × 1 convolutions, which, in turn, account for about 75% of the model’s parameters. The computational time for MobileNet per layer is shown in Table 6. This trade-off allows for a reduction in size and latency while still maintaining reasonable performance.

The layer with two combined convolutions—where the first one is a depth-wise separable convolution—runs 5× times. Applying the width multiplier to any model structure enables the creation of a smaller model with a balanced trade-off between accuracy, latency, and size. The computational cost for MobileNet associated with a depth-wise separable convolution, considering a width multiplier of

α

, is defined below.

D_{K} \cdot D_{K} \cdot α M \cdot D_{F} \cdot D_{F} + α M \cdot α N \cdot D_{F} \cdot D_{F},

(1)

where

α \in [0, 1]

with typical settings of 1, 0.75, 0.5, and 0.25.

α = 1

is the baseline MobileNet;

D_{K}

= is the spatial dimension of the kernel assumed to be square;

M = is number of input channels;

D_{F}

= is the spatial width and height of a square input feature map;

N = is the number of output channels.

In the proposed GLD-Det architecture, the input data enter to MobileNet, then through additional layers that have been added after MobileNet. After every convolution and separable convolution layer, batch normalisation is applied. The proposed framework, the GLD-Det architecture workflow, is shown in Figure 3. Furthermore, Table 7 displays the parameters of the additional layers of the proposed GLD-Det architecture.

At first, this research resized the Dataset D1 and Dataset D2 images to 224 × 224 pixels and split all images for testing and training purposes. Dataset D1 consists of 1842 images in total, where for training purposes 1377 images and for testing purposes 465 images were chosen. Dataset D2 consists of 527 images in total where for training purposes 422 images and for testing purposes 105 images were chosen. Then data augmentation was used to solve the overfitting problem and to ensure better prediction accuracy. To perform five class problem default classification, a layer was removed. Subsequently, the flattened layer was replaced by max pooling. Max pooling reduces the spatial dimensions of the input feature maps, making subsequent layers computationally less expensive. This reduction helps control the growth of model complexity and memory usage. Additionally, global average pooling was incorporated after the first dense and batch normalisation layers, as it aligns better with the convolutional structure by establishing connections between feature maps and categories. Moreover, these pooling techniques help mitigate overfitting concerns, resulting in a reduction in the architecture’s parameter count.

However, this research has explored various regularisation techniques to enhance the model’s performance. One such technique involved applying dropout regularisation with a rate of 0.3 [33]. During training, dropout randomly deactivates specific neurons, leading to enhanced accuracy and decreased loss as the training progresses. Figure 4 provides a visual representation of how dropout operates.

In GLD-Det architecture, this research incorporated four dense layers with a ReLU activation function after each batch normalisation layer. With the incorporation of these dense layers, the model efficiently categorises the abundant features extracted from the convolutional layers. In the GLD-Det model, the inclusion of dense layers enhances the network’s capability to organise and utilise the extracted elements more efficiently. The ReLU function is a popular choice in DL models as it aids in overcoming optimisation challenges and promotes non-linearity, which makes it capable of learning complex patterns and features in the data, and enhances the network’s ability to learn and generalise from data effectively. By eliminating negative values, ReLU ensures that the network remains sparse and efficient. It reduces the risk of encountering the vanishing gradient problem, which has the potential to impede the training process. This activation function is described below.

f (x) = m a x (0, x),

(2)

where

x = the input to the function;

m a x (0, x)

= the maximum value between 0 and x;

if

x \geq 0

; the function returns x itself;

if x < 0; the function outputs 0.

Moreover, in this research, batch normalisation [34] layers were introduced to normalise the activation of hidden layers. This method expedites the training process while also resolving internal covariate shift problems by ensuring that the input for each layer is distributed around a consistent mean and standard deviation. Batch normalisation stabilises and regularises the network during training, promoting faster convergence and improved performance. The mathematical formulation is shown below.

μ_{B} = \frac{1}{m} \sum_{i - 1}^{m} X_{i}

(3)

σ \frac{2}{B} = \frac{1}{m} \sum_{i - 1}^{m} {(X_{i} - μ_{b})}^{2},

(4)

where

X_{i} =

input over a minibatch;

m =

minibatch size;

μ_{B} =

means;

σ \frac{2}{B} =

variance.

Now, the samples are normalised to have zero means and unit variance. To ensure numerical stability and prevent division by zero, the term

ϵ

is introduced in the denominator. This adjustment helps in maintaining a stable and effective normalisation process, where

{\hat{X}}_{i}

= activation vector as shown below.

{\hat{X}}_{i} = \frac{x_{i} - μ_{B}}{\sqrt{σ \frac{2}{B} + ϵ}} .

(5)

At the end, this research used the following formula:

y_{i} = γ {\hat{x}}_{i} + β,

(6)

where

y_{i} =

output;

γ =

adjustable parameters during training process;

β =

learning parameters during training process.

To create a classification layer with five classes (canker, dot, mummification, rust, healthy for Dataset D1 and phytophthora, red rust, scab, styler end rot, disease-free for Dataset D2), this research utilised a dense layer consisting of five neurons. A SoftMax activation function [35] is also applied in this dense layer. The SoftMax function is frequently used for multiclass classification tasks. This function assigns probabilities to each class, and it makes sure that the probabilities are within the range of 0 to 1. The formula of this function is described below.

s o f t m a x (z_{i}) = \frac{e x p (z_{i})}{\sum_{j} e x p (z_{i})},

(7)

where

z = number of neurons of the output layer;

Exponential function (

e x p

) = non-linear transformation.

In this research, the proposed model utilised Adam as an optimiser. The learning rate is set to

1 \times 10^{- 5}

. Given that the research involved a multi-class detection problem, the loss function is set as categorical cross-entropy. It is specifically designed for multi-class detection problems with SoftMax output units. The formula of the categorical cross-entropy is shown below.

L_{i} = - \sum_{j} t_{i, j} l o g (p_{i, j}),

(8)

where

p = prediction;

t = targets;

i = data points;

j = class.

This research used a confusion matrix and other matrices to evaluate the performance of all models, including proposed and base models. The results showed that the GLD-Det architecture outperformed all the base models for both Dataset D1 and Dataset D2. Moreover, the proposed architecture achieved a lower loss compared to the base models. Additionally, Grad-CAM was utilised to provide explanations and insights into how the proposed model operates.

3.6. Model Explainability Using Grad-CAM

In this research, gradient-weighted class activation mapping (Grad-CAM) [36] was incorporated to provide visual explanations for the proposed GLD-Det model, ensuring trust and confidence in its predictions. Grad-CAM is a technique used to visualise the attention of a CNN by highlighting the important regions of an original image. The algorithm computes gradients of a specific convolutional layer’s output concerning the feature maps of that layer followed by a ReLU. These gradients are then weighted by the global average pooling of the gradients, resulting in a heatmap that highlights the most salient regions of the image. Grad-CAM offers an interpretable way to comprehend the internal workings of deep neural networks, aiding in debugging, model enhancement, and the communication of research findings. The mathematical details of the Grad-CAM are provided below.

α_{k}^{c} = \frac{1}{Z} \sum_{i}^{} \sum_{j}^{} \frac{\partial y^{c}}{\partial A_{i j}^{K}}

(9)

L_{G r a d - C A M}^{c} = R e L U (\sum_{K}^{} α_{k}^{c} A^{k}),

(10)

where

y^{c}

= The score of class c in a network before the SoftMax activation;

A^{k}

= Feature map activations;

α_{k}^{c}

= Neuron weights;

Z = Number of pixels in the feature map.

For this research, the Grad-Cam images are described and analysed in the Results Section.

3.7. Evaluation Metrics

For evaluating the results of all models, this research used various evaluation metrics. Furthermore, a confusion matrix was employed as a visualisation tool. The confusion matrix compares the actual labels (ground truth) with the predicted labels from the model. Key metrics are derived from the information provided by the confusion matrix shown in Table 8, enabling a comprehensive evaluation of the model’s performance.

Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)};

(11)

Precision = \frac{TP}{(TP + FP)};

(12)

Recall = \frac{TP}{(TP + FN)} .

(13)

4. Results and Discussion

This research introduces the GLD-Det model, employing the transfer learning method to detect guava leaf disease in real-time using Dataset D1 and Dataset D2. The proposed model is compared with five existing CNN models, including EfficientNetV2B2, EfficientNetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2, to assess its robustness. Accuracy, precision, recall, and AUC values are calculated to evaluate the models’ effectiveness. For Dataset D1, the base model EfficientNetV2B2 achieved the best performance with an accuracy of 0.85, whereas the base model MobileNetV2 achieved an unsatisfactory performance with an accuracy of 0.56. However, in Dataset D2, the MobileNetV2 achieved the best performance with an accuracy of 0.84, whereas the EfficientNetV2B2 achieved an unsatisfactory performance with an accuracy of 0.74. The EfficientNets family is heavy-weight in comparison with the MobileNet family. This research intends to propose a model that is light-weighted, provides better accuracy and precision, and is also robust at the same time, to detect guava leaf disease in real-time. The robustness of a model means that the model performs best in different settings with different types of datasets. Taking into account all the considerations and ideas discussed, MobileNet has been chosen for further modification by adding additional layers. As guava leaf disease is very infectious and lethal for guava plants, it is important to detect the disease more accurately and precisely with robustness. The GLD-Det architecture was constructed after several modifications of the base MobileNet model. The GLD-det architecture outperformed all existing models. For Dataset D1, the proposed model provided the values with an accuracy of 0.98, a precision of 0.98, a recall of 0.97, an AUC of 0.99, and Dataset D2 provided the values of 0.97, 0.97, 0.96, and 0.99, respectively. It is also noteworthy that the loss value is the smallest for both Dataset D1 and Dataset D2, compared to the other base models. The GLD-Det architecture extended the MobileNet with an additional two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, SoftMax as a classification layer with the last lighter dense layer, and Adam as an optimiser, which provides the best performance. Table 9 and Table 10 for Dataset D1 and Dataset D2, respectively, show the results of the models.

This research employed a confusion matrix of Dataset D1 and Dataset D2, shown in Figure 5. The diagrams display a deep blue colour on the diagonal, indicating the number of instances correctly predicted by the model compared to their corresponding ground truth values.

Figure 6 and Figure 7 are for Dataset D1 and Dataset D2, respectively. These figures show the proposed model’s performance in terms of training and validation. The blue curves are for training and the red curves are for validation.

For further clarification of the proposed model, the AUC outputs of Dataset D1 and Dataset D2 are shown in Figure 8.

The visualisation of the AUC graphs from Figure 8 reveals that the proposed model achieved an impressive AUC score, approaching 1 for both Dataset D1 and Dataset D2. When the AUC value is higher, it indicates that the proposed GLD-Det model exhibits a robust ability to effectively detect different classes. As a result, the model’s performance in the detection task was outstanding.

Through the visualisation of graphs from Figure 6 and Figure 7, it becomes evident that the validation curves for accuracy, precision, and recall consistently outperformed the corresponding training curves. While some minor underfitting was observed in the recall curve at the beginning in Figure 6 and the precision and recall curve in Figure 7, the proposed model’s performance improved over epochs, as the gap between training and validation lines decreased. To tackle both underfitting and overfitting concerns, this research implemented several measures. To prevent overfitting, early stopping was applied by continuously monitoring the validation loss for three consecutive epochs, using the patience of three. From the loss graph, it was clear that training was higher than validation loss for both datasets. It indicated that the proposed model performed well and there were no underfitting problems for both datasets. Visualising the precision and recall graphs for both datasets, it was clear that the training curves did not exceed the testing curve. It proved that the proposed model has no underfitting problems for both datasets. Overall, these efforts contributed to the robustness and performance of the proposed GLD-Det model during training and testing.

The models’ parameters and floating point operations per second (FLOPs) of EfficientNetV2B2 [37], EfficientNetB0, EfficientNetB2, EfficientNetB1 [38], MobileNetV2 [9], and MobileNet [10] are shown in Table 11 based on the ImageNet dataset. It shows that MobileNetV2 has the lowest parameter count of 3.4 M with 0.30 B FLOPs value, whereas MobileNet has the second lowest parameter count of 4.2 M but has twice the FLOPs rather than MobileNetV2 with a value of 0.60 B. However, EfficientNetV2B2 has the highest parameter count and FLOPs with values of 10.1 M and 1.7 B, respectively. The parameter count of EfficientNetV2B2 is more than double that of MobileNetV2 and MobileNet. This research aims to construct a model using transfer learning to detect guava leaf disease, which is both lightweight and robust, has a faster computational speed, and can be implemented in mobile devices in the future. It is noted that the MobileNets architectures were originally designed for applications in mobile and embedded vision [10]. Thus, it is clear from Table 11 that MobileNetV2 and MobileNet are suitable for this criteria. However, MobileNetV2 showed inconsistency in performance based on Dataset D1 and Dataset D2, which are geographically located in two different regions—Pakistan and Bangladesh, respectively. Hence, in terms of robustness, MobileNetV2 performed inadequately. After several considerations, for the proposed GLD-Det architecture, MobileNet has been chosen for further modification, which has a lower parameter count and also has a favourable computational speed with 0.60 B FLOPs. The proposed GLD-Det architecture outperformed all existing models for both Dataset D1 and Dataset D2.

In this research, the proposed model was elucidated using Grad-CAM to analyse various convolutional layers. Grad-CAM is a technique that provides insights into how the model makes classifications based on specific areas of an input image. By generating a heatmap, Grad-CAM visualises the crucial regions in the image that influence the model’s decision-making process. This visualisation aids in making informed decisions and understanding the model’s focus on important areas within the data image. To reduce the complexity, this research has shown the Grad-CAM images of Dataset D1 only in Figure 9.

This research introduced four convolution layers, which are conv1, conv_pw_5, conv_pw _10, conv_pw_13_relu to generate Grad-CAM images for Dataset D1, which has five classes. In Figure 9, the first column represents the input images. The second column represents the output image of conv1, the third output image of conv_pw_5, the fourth output image of conv_pw_10, and the fifth output image of conv_pw_13_relu. In the initial convolution layer, the visualisation shows that the model focuses on detecting contours and borders in the images. As the process progresses through the subsequent convolution layers, it becomes evident that the layers are attempting to identify different concepts and features present in the images. To understand the GLD-Det model’s performance exclusively in detecting relevant parts of the image, this research focused on the last convolution layer (conv_pw_13_relu). It was impressive that Grad-CAM highlighted the infected part extremely well. It verified that GLD-Det architecture detected leaf disease by paying attention to the most highlighted areas in the image.

Overall, the proposed GLD-Det model demonstrated substantial enhancements, achieving the highest accuracy and precision for both Dataset D1 and Dataset D2. The utilisation of Grad-CAM further ensured trust and transparency in the model’s predictions. However, more extensive research is required to explore lightweight guava leaf disease detection further and compare the robustness of different models, providing valuable insights and understanding for future advancements in this domain. It is important to highlight that the types of diseases affecting guava leaves vary across various regions, which adds complexity to their detection using CV-based image processing. The limitation of guava leaf disease datasets further compounds the challenges faced by researchers in training their DL models. Therefore, the creation of additional guava leaf disease datasets is imperative within this domain. However, the GLD-Det architecture has been constructed based on modified MobileNet using the transfer learning technique and has provided an outstanding performance based on two benchmark datasets from distinct geographical locations—Pakistan and Bangladesh—to detect guava leaf disease. Observations indicate that the MobileNets architectures [10] were initially created with the intent of being used in mobile and embedded vision applications. Therefore, potential opportunities for enhancing future research involve integrating the proposed GLD-Det model into mobile devices. This would enable farmers to detect guava leaf disease using their smartphones without relying on cloud services, thereby providing them with direct benefits.

5. Conclusions

Guava leaf disease poses a significant threat to the health of guava plants, as it is highly infectious and can lead to plant death. Furthermore, it has a negative impact on both the quality and quantity of guava fruit. Utilising deep learning for early leaf disease detection can aid in mitigating these issues and assist farmers in achieving a successful harvest. However, detecting guava leaf disease is challenging due to factors such as illumination variation, leaf obstruction, changing brightness, and leaves overlapping. Existing leaf disease detection models often rely on heavy-weight neural networks and leaf segmentation, which can be resource-intensive. In response to these challenges, this research explored a transfer learning-based approach to detect guava leaf disease in real-time. The aim was to create a lightweight yet robust model that delivers an improved performance. Two benchmark datasets of guava leaf disease from Pakistan and Bangladesh were used for experimentation. Various pre-trained models, including EfficientNetV2B2, EfficientNetB0, EfficientNetB2, EfficientNetB1, and MobileNetV2, were tested to achieve optimal results. However, all models yielded unsatisfactory outcomes without exhibiting robustness when tested on the two datasets. Following numerous experiments, this research introduced the GLD-Det model by incorporating various enhancements into the base model of MobileNet. These enhancements included the additional components with two pooling layers such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, and SoftMax as a classification layer with the last lighter dense layer. These supplementary blocks proved to be robust, enabling the extraction of more informative features, faster model training, and improved overall performance. The proposed GLD-Det model outperformed all base models that have been compared in terms of evaluation matrices for both datasets. Subsequently, the proposed model was further elucidated using Grad-CAM to enhance its transparency and provide deeper insights into its decision-making process.

To summarise, the future work of this research will concentrate on integrating the proposed model into mobile devices, enabling marginal farmers to detect guava leaf disease in real-time using their smartphones, without relying on any cloud service. Additionally, the research will explore and compare another explainable model to enhance the transparency of the detection system further.

Author Contributions

Conceptualisation, M.M.U.N. and M.R.; Data curation, M.M.U.N. and M.R.; Formal analysis, M.M.U.N., M.R., M.S. and S.A.; Investigation, M.M.U.N., M.R. and M.S.; Methodology, M.M.U.N., M.R., M.F.M., M.S.; Software, M.M.U.N. and M.R.; Supervision, M.F.M., M.S. and S.A.; Validation, M.F.M., D.C.; Visualisation, M.M.U.N., M.R., M.S. and D.C.; Writing—original draft, M.M.U.N., M.R. and M.F.M.; Writing—review & editing, M.M.U.N., M.F.M., M.S., S.A. and D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deputyship for Research and Innovation, “Ministry of Education” in Saudi Arabia (IFKSUOR3-010-3).

Data Availability Statement

There is no statement regarding the data.

Acknowledgments

The authors extend their appreciation to the Deputyship for Research and Innovation, “Ministry of Education” in Saudi Arabia for funding this research (IFKSUOR3-010-3).

Conflicts of Interest

The authors declare no conflict of interest.

References

Mostafa, A.M.; Kumar, S.A.; Meraj, T.; Rauf, H.T.; Alnuaim, A.A.; Alkhayyal, M.A. Guava Disease Detection Using Deep Convolutional Neural Networks: A Case Study of Guava Plants. Appl. Sci. 2022, 12, 239. [Google Scholar] [CrossRef]
Barua, H.; Saha, S.R.; Ivy, N.A.; Rasul, G.; Islam, A.A. Genetic divergence of guava (Psidium guajava L.) genotypes in Bangladesh: Guava Genotypes in Bangladesh. SAARC J. Agric. 2022, 20, 15–28. [Google Scholar] [CrossRef]
Rashid, J.; Khan, I.; Ghulam, A.; Rehman, S.; Alturise, F.; Alkhalifah, T. Real-Time Multiple Guava Leaf Disease Detection from a Single Leaf Using Hybrid Deep Learning Technique. Comput. Mater. Contin. 2022, 74, 1235–1257. [Google Scholar] [CrossRef]
Sasaki, Y.; Okamoto, T.; Imou, K.; Torii, T. Automatic Diagnosis of Plant Disease-Spectral Reflectance of Healthy and Diseased Leaves. IFAC Proc. Vol. 1998, 31, 145–150. [Google Scholar] [CrossRef]
Koo, C.; Malapi-Wight, M.; Kim, H.S.; Cifci, O.S.; Vaughn-Diaz, V.L. Development of a Real-time Microchip PCR System for Portable Plant Disease Diagnosis. PLoS ONE 2013, 8, e82704. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.-H.; Su, W.-H. Convolutional Neural Networks in Computer Vision for Grain Crop Phenotyping: A Review. Agronomy 2022, 12, 2659. [Google Scholar] [CrossRef]
CRI. Bangladesh: Towards Achieving Food Security 2009–2019; Centre for Research and Information (CRI): Dhaka, Bangladesh, 2019. [Google Scholar]
Xu, D.; Zhao, H.; Lawal, O.M.; Lu, X.; Ren, R.; Zhang, S. An Automatic Jujube Fruit Detection and Ripeness Inspection Method in the Natural Environment. Agronomy 2023, 13, 451. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Wey, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
Abirami, M.T.S. Application of image processing in diagnosing guava leaf disease. Int. J. Sci. Res. Manag. 2017, 5, 5927–5933. [Google Scholar]
Song, K.; Sun, X.; Ji, J. Corn leaf disease recognition based on support vector machine method. Trans. CSAE 2007, 23, 155–157. [Google Scholar]
Rashid, J.; Khan, I.; Ali, G.; Almotiri, S.H.; AlGhamdi, M.A. Multi-level deep learning model for potato leaf disease recognition. Electronics 2021, 10, 2064. [Google Scholar] [CrossRef]
Ji, M.; Zhang, L.; Wu, Q. Automatic grape leaf diseases identification via united model based on multiple convolutional neural networks. Inf. Process. Agric. 2020, 7, 418–426. [Google Scholar]
Jiang, P.; Chen, Y.; Liu, B.; He, D.; Liang, C. Real-time detection of apple leaf diseases using deep learning approach based on improved convolutional neural networks. IEEE Access 2019, 7, 59069–59080. [Google Scholar] [CrossRef]
Xu, J.; Shao, M.; Wang, Y.; Han, W. Image recognition of corn diseases based on convolutional neural network based on transfer learning. Trans. Chin. Soc. Agric. 2020, 51, 230–236+253. [Google Scholar]
Anagnostis, A.; Asiminari, G.; Papageorgiou, E.; Bochtis, D. A convolutional neural networks-based method for anthracnose infected walnut tree leaves identification. Appl. Sci. 2020, 10, 469. [Google Scholar] [CrossRef]
Maeda-Gutiérrez, V.; Galvan-Tejada, C.E.; Zanella-Calzada, L.A.; Celaya-Padilla, J.M.; Galván-Tejada, J.I.; Gamboa-Rosales, H.; Luna-Garcia, H.; Magallanes-Quintar, R.; Guerrero Méndez, C.A.; Olvera-Olvera, C.A. Comparison of convolutional neural network architectures for classification of tomato plant diseases. Appl. Sci. 2020, 10, 1245. [Google Scholar] [CrossRef]
Bao, W.; Huang, X.; Hu, G.; Liang, D. Recognition of corn leaf diseases based on improved convolutional neural network model. Trans. Chin. Soc. Agric. Eng. 2021, 37, 160–167. [Google Scholar]
Wang, Y.; Tao, J.; Gao, H. Corn disease recognition based on attention mechanism network. Axioms 2022, 11, 480. [Google Scholar] [CrossRef]
Kateb, F.A.; Monowar, M.M.; Hamid, M.A.; Ohi, A.Q.; Mridha, M.F. FruitDet: Attentive Feature Aggregation for Real-Time Fruit Detection in Orchards. Agronomy 2021, 11, 2440. [Google Scholar] [CrossRef]
Fu, L.; Yang, Z.; Wu, F.; Zou, X.; Lin, J.; Cao, Y.; Duan, J. YOLO-Banana: A Lightweight Neural Network for Rapid Detection of Banana Bunches and Stalks in the Natural Environment. Agronomy 2022, 12, 391. [Google Scholar] [CrossRef]
Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J. A real-time detection algorithm for kiwifruit defects based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Roy, A.M.; Bhaduri, J. A deep learning enabled multi-class plant disease detection model based on computer vision. AI 2021, 2, 413–428. [Google Scholar] [CrossRef]
Al Haque, A.F.; Hafiz, R.; Hakim, M.A.; Islam, G.R. A computer vision system for guava disease detection and recommend curative solution using deep learning approach. In Proceedings of the 2019 22nd International Conference on Computer and Information Technology (ICCIT), Dhaka, Bangladesh, 18–20 December 2019; pp. 1–6. [Google Scholar]
Howlader, M.R.; Habiba, U.; Faisal, R.H.; Rahman, M.M. Automatic recognition of guava leaf diseases using deep convolution neural network. In Proceedings of the 2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), Cox’sBazar, Bangladesh, 7–9 February 2019; pp. 1–5. [Google Scholar]
Almadhor, A.; Rauf, H.T.; Lali, M.I.U.; Damaševičius, R.; Alouffi, B.; Alharbi, A. AI-driven framework for recognition of guava plant diseases through machine learning from DSLR camera sensor based high resolution imagery. Sensors 2021, 21, 3830. [Google Scholar] [CrossRef] [PubMed]
Perumal, P.; Sellamuthu, K.; Vanitha, K.; Manavalasundaram, V.K. Guava leaf disease classification using support vector machine. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 1177–1183. [Google Scholar]
Yang, L.; Quan, F.; Shuzhi, W. Plant disease identification method and mobile application based on lightweight CNN. Trans. Chin. Soc. Agric. Eng. 2019, 35, 194–204. [Google Scholar]
Yu, X.D.; Yang, M.J.; Zhang, H.Q.; Li, D.; Tang, Y.Q.; Yu, X. Research and application of crop pest detection method based on transfer learning. Trans. Chin. Soc. Agric. Eng. 2020, 51, 252–258. [Google Scholar]
Fan, X.; Xu, Y.; Zhou, J.; Li, Z.; Peng, X.; Wang, X. Grape leaf disease detection system based on transfer learning and improved CNN. Trans. Chin. Soc. Agric. Eng. 2021, 37, 151–159. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M. Tensorflow: Large-scale machine learning on heterogeneous systems. arXiv 2015, arXiv:1603.04467. [Google Scholar]
Wan, L.; Zeiler, M.; Zhang, S.; Le Cun, Y.; Fergus, R. Regularisation of neural networks using dropconnect. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1058–1066. [Google Scholar]
Ioffe, S.; Szegedy, C. Batch normalisation: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
Sharma, S.; Sharma, S.; Athaiya, A. Activation Functions in Neural Networks. Towards Data Sci. 2017, 4, 310–316. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localisation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Tan, M.; Le, Q. Efficientnetv2: Smaller models and faster training. In Proceedings of the 38th International Conference on Machine Learning, PMLR 2021, Online, 18–24 July 2021; pp. 10096–10106. [Google Scholar]
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the 36th International Conference on Machine Learning, PMLR 2019, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]

Figure 1. The figure depicts the methodology flowchart for this research.

Figure 2. Sample images of Dataset D1: (a) Canker, (b) Dot, (c) Healthy, (d) Mummification, (e) Rust and Dataset D2: (f) Disease-Free, (g) Phytopthora, (h) Red rust, (i) Scab, and (j) Styler end rot.

Figure 3. The figure shows all the inside components of the proposed GLD-Det architecture. Two pooling layers, such as max and global average, three batch normalisation layers, three dropout layers, ReLU as an activation function with four dense layers, and SoftMax as a classification layer with the last lighter dense layer have been added with MobileNet. The MobileNet body architecture is also shown in this figure.

Figure 4. The figure demonstrates the functionality and operation of the dropout layer.

Figure 5. The figure shows the confusion matrix for the proposed GLD-Det model. The left figure is for Dataset D1 and the right figure is for Dataset D2.

Figure 6. The images of this figure are for Dataset D1. The top left corner shows the graph of training and validation accuracy, the top right corner shows the graph of training and validation loss, the bottom left corner shows the graph of training and validation precision, and the bottom right corner shows the graph of training and validation recall.

Figure 7. The images in this figure are for Dataset D2. The top left corner shows the graph of training and validation accuracy, the top right corner shows the graph of training and validation loss, the bottom left corner shows the graph of training and validation precision, and the bottom right corner shows the graph of training and validation recall.

Figure 8. The figure illustrates the training and validation value of AUC. The left corner image is for Dataset D1 and the right corner image is for Dataset D2.

Figure 9. The Grad-CAM images are for Dataset D1, which has five classes. The top first row is for canker, second row is for dot, the third row is for healthy, the fourth row is for mummification, and the fifth row is for rust. The first column is for the input image, and the other four columns are for the output heatmap images generated by Grad-CAM.

Table 1. Dataset D1 image count.

Class	Total Images
Canker	223
Dot	143
Mummification	220
Rust	224
Healthy	1032
Total	1842

Table 2. Dataset D2 image count.

Class	Original Image	Augmented Image	Total
Phytopthora	114	942	1056
Red rust	87	1154	1241
Scab	106	864	970
Styler end rot	94	1063	1157
Disease Free	126	876	1002
Total	527	4899	5426

Table 3. Dataset D1 split summary.

Class	Training Images	Test Images	Total Images
Canker	166	57	223
Dot	106	37	143
Mummification	165	55	220
Rust	167	57	224
Healthy	773	259	1032
Total	1377	465	1842

Table 4. Dataset D2 split summary.

Class	Training Images	Test Images	Total Images
Phytopthora	91	23	114
Red rust	71	16	87
Scab	84	22	106
Styler end rot	75	19	94
Disease Free	101	25	126
Total	422	105	527

Table 5. The table provides an overview of the environmental setup.

Method	GPU Name	Input Size (Pixels)	Batch Size	Optimizer, Learning Rate	Epoch, Patience [Pn]	Activation Function	Data Augmentation
EfficientNet-V2B2	NVIDIA GeForce RTX 3080	260 × 260 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied
EfficientNet-B0	NVIDIA GeForce RTX 3080	224 × 224 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied
EfficientNet-B2	NVIDIA GeForce RTX 3080	260 × 260 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied
EfficientNet-B1	NVIDIA GeForce RTX 3080	240 × 240 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied
MobileNet-V2	NVIDIA GeForce RTX 3080	224 × 224 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied
GLD-Det (Proposed Model)	NVIDIA GeForce RTX 3080	224 × 224 × 3	16	Adam, $1 \times 10^{- 5}$	80 [Pn:3]	ReLU	Applied

Table 6. The table shows the computational time and corresponding parameters for MobileNet.

Type	Mult-Adds	Parameters
Conv 1 × 1	94.86%	74.59%
Conv DW 3 × 3	3.06%	1.06%
Conv 3 × 3	1.19%	0.02%
Fully Connected (FC)	0.18%	24.33%

Table 7. The table displays the parameters of the additional layers of the proposed GLD-Det architecture.

Additional Layers	Output	Parameters
“max_pooling2d (MaxPooling2D)”	(None, 3, 3, 1024)	0
“dense (Dense)”	(None, 3, 3, 2048)	2,099,200
“batch_normalisation (BatchNormalisation)”	(None, 3, 3, 2048)	8192
“global_average_pooling2d (GlobalAveragePooling2D)”	(None, 2048)	0
“dropout (Dropout)”	(None, 2048)	0
“dense_1 (Dense)”	(None, 1024)	2,098,176
“batch_normalisation_1 (BatchNormalisation)”	(None, 1024)	4096
“dropout_1 (Dropout)”	(None, 1024)	0
“dense_2 (Dense)”	(None, 512)	524,800
“batch_normalisation_2 (BatchNormalisation)”	(None, 512)	2048
“dropout_2 (Dropout)”	(None, 512)	0
“dense_3 (Dense)”	(None, 128)	65,664
“dense_4 (Dense)”	(None, 5)	645

Table 8. The table illustrates the confusion matrix concept.

	Predicted Positive	Predicted Negative
Actual Positive	True Positive (TP)	False Negative (FN)
Actual Negative	False Positive (FP)	True Negative (TN)

Where, TP = the model correctly predicted a positive outcome; FP = the model incorrectly predicted a positive outcome; TN = the model accurately predicted a negative outcome; FN = the model wrongly predicted a negative outcome. The formula of accuracy, precision, and recall are described below.

Table 9. The table presents the performance evaluation for Dataset D1. The proposed model’s results are highlighted in bold within the corresponding column.

Model Name	Loss	Accuracy	Precision	Recall	AUC
Efficient Net V2B2	0.4546	0.8559	0.8833	0.7978	0.9733
Efficient Net B0	0.7351	0.6925	0.7655	0.6387	0.9305
Efficient Net B2	0.5976	0.7828	0.8160	0.7441	0.9583
Efficient Net B1	2.7399	0.0796	0.0799	0.0796	0.4826
MobileNet V2	1.8567	0.5677	0.5897	0.5656	0.8349
GLD-Det (Proposed Model)	0.0485	0.9806	0.9806	0.9785	0.9997

Table 10. The table presents the performance evaluation for Dataset D2. The proposed model’s results are highlighted in bold within the corresponding column.

Model Name	Loss	Accuracy	Precision	Recall	AUC
Efficient Net V2B2	1.3641	0.7457	0.7910	0.7367	0.8157
Efficient Net B0	1.6796	0.2762	0.2800	0.1333	0.6414
Efficient Net B2	4.0642	0.2095	0.2095	0.2095	0.4756
Efficient Net B1	1.7453	0.1810	0.2143	0.0286	0.5607
MobileNet V2	0.5443	0.8481	0.8522	0.8217	0.9198
GLD-Det (Proposed Model)	0.1684	0.9714	0.9712	0.9619	0.9927

Table 11. Comparison of parameters and floating point operations per second (FLOPs) of different models based on the ImageNet dataset.

Model	Parameters (Million)	FLOPs (Billion)
EfficientNetV2B2	10.1 M	1.7 B
EfficientNetB0	5.3 M	0.39 B
EfficientNetB2	9.2 M	1.0 B
EfficientNetB1	7.8 M	0.70 B
MobileNetV2	3.4 M	0.30 B
MobileNet	4.2 M	0.60 B

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mustak Un Nobi, M.; Rifat, M.; Mridha, M.F.; Alfarhood, S.; Safran, M.; Che, D. GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet. Agronomy 2023, 13, 2240. https://doi.org/10.3390/agronomy13092240

AMA Style

Mustak Un Nobi M, Rifat M, Mridha MF, Alfarhood S, Safran M, Che D. GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet. Agronomy. 2023; 13(9):2240. https://doi.org/10.3390/agronomy13092240

Chicago/Turabian Style

Mustak Un Nobi, Md., Md. Rifat, M. F. Mridha, Sultan Alfarhood, Mejdl Safran, and Dunren Che. 2023. "GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet" Agronomy 13, no. 9: 2240. https://doi.org/10.3390/agronomy13092240

APA Style

Mustak Un Nobi, M., Rifat, M., Mridha, M. F., Alfarhood, S., Safran, M., & Che, D. (2023). GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet. Agronomy, 13(9), 2240. https://doi.org/10.3390/agronomy13092240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GLD-Det: Guava Leaf Disease Detection in Real-Time Using Lightweight Deep Learning Approach Based on MobileNet

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Collection

3.2. Dataset Description

3.3. Data Preprocessing

3.4. Environment Setup

3.5. Proposed GLD-Det Architecture

3.6. Model Explainability Using Grad-CAM

3.7. Evaluation Metrics

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI