*Article* **Guava Disease Detection Using Deep Convolutional Neural Networks: A Case Study of Guava Plants**

**Almetwally M. Mostafa 1,\*, Swarn Avinash Kumar 2, Talha Meraj 3, Hafiz Tayyab Rauf 4, Abeer Ali Alnuaim <sup>5</sup> and Maram Abdullah Alkhayyal <sup>1</sup>**


**Abstract:** Food production is a growing challenge with the increasing global population. To increase the yield of food production, we need to adopt new biotechnology-based fertilization techniques. Furthermore, we need to improve early prevention steps against plant disease. Guava is an essential fruit in Asian countries such as Pakistan, which is fourth in its production. Several pathological and fungal diseases attack guava plants. Furthermore, postharvest infections might result in significant output losses. A professional opinion is essential for disease analysis due to minor variances in various guava disease symptoms. Farmers' poor usage of pesticides may result in financial losses due to incorrect diagnosis. Computer-vision-based monitoring is required with developing field guava plants. This research uses a deep convolutional neural network (DCNN)-based data enhancement using color-histogram equalization and the unsharp masking technique to identify different guava plant species. Nine angles from 360◦ were applied to increase the number of transformed plant images. These augmented data were then fed as input into state-of-the-art classification networks. The proposed method was first normalized and preprocessed. A locally collected guava disease dataset from Pakistan was used for the experimental evaluation. The proposed study uses five neural network structures, AlexNet, SqueezeNet, GoogLeNet, ResNet-50, and ResNet-101, to identify different guava plant species. The experimental results proved that ResNet-101 obtained the highest classification results, with 97.74% accuracy.

**Keywords:** data augmentation; deep learning; guava disease; plant disease detection

#### **1. Introduction**

Food production is currently one of the greatest challenges with the growing global population. It is estimated that food consumption will double by 2050. Therefore, food production needs a more high-yielding and sustainable environment to increase the plant yield [1,2]. Guava is an important plant that belongs to the Myrtaceae plant family. It was initially allocated in the American tropics; guava was discovered in Portugal in the early 17th Century [3]. It is popular in tropical and nontropical countries such as Bangladesh, India, Pakistan, Brazil, and Cuba [4]. Guava contains phosphorus, calcium, nicotinic acid, and many other essential food components [5]. Furthermore, it normalizes blood pressure, has benefits for diabetes, provides immunity against dysentery, and eliminates diarrhea [6]. Regarding guava's growing environment, it can grow in a variety of soils with a wide range of pH (4.4 to 4.9), where it can also sustain intensive and extensive climate change [7].

**Citation:** Mostafa, A.M.; Kumar, S.A.; Meraj, T.; Rauf, H.T.; Alnuaim, A.A.; Alkhayyal, M.A. Guava Disease Detection Using Deep Convolutional Neural Networks: A Case Study of Guava Plants. *Appl. Sci.* **2022**, *12*, 239. https://doi.org/10.3390/ app12010239

Academic Editors: Anselme Muzirafuti and Dimitrios S. Paraforos

Received: 28 September 2021 Accepted: 17 December 2021 Published: 27 December 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The delightful aroma of the generally spherical guava fruit makes it attractive [8]. The growing age of fruits and vegetables may change and pass from various stages, making it challenging to recognize various factors that make them behave differently during different stages. Therefore, image acquisition of vegetables and fruits is the first important step to effectively analyze quality attributes such as color and texture. Illumination also affects receiving these features from the sensor in fruit image collection [9]. Computer-visionbased fruit and vegetable disease detection can lead to large-scale automatic vegetable and fruit monitoring [10]. This helps in taking earlier steps to take care of specific hazards that disturb actual yields, such as the need for fertilizers to be applied to increase the growth rate [11]. Various diseases affect the production of guava fruits, such as anthracnose [5], canker, dot, mummification, and rust. Farmers are very knowledgeable about these diseases, but they mostly do not know of early prevention methods to protect them against further loss. This ultimately leads to significant loss in guava production [12]. These different diseases are caused by different factors of guava plants; for example, canker is caused by algae and was first discovered by Ruehle [13].

Similarly, Dastur is another guava disease that is caused by dry rot [14]. These types of diseases affect guava production, which leads to economic and environmental loss [15]. Environmental loss is any kind of loss, including energy, water, clean air, and land loss, where as far as the economic loss is concerned, this results in financial loss in production. Pakistan is a country in the Asian Pacific whose economy is mostly based on agricultural production. The agricultural significance of Pakistan can be analyzed from its gross domestic product (GDP), with agriculture being 25% of its annual GDP [16]. Many agricultural countries produce guava as a domestic product, and Pakistan is globally fourth in guava production, as it annually produces 1,784,300 t [17]. To diagnose guava diseases in a timely manner, accurate detection is necessary, as false detection may lead to the poor production of guava species. Manual observation may be time consuming and lead to the wrong interpretations.

This led us to produce an automatic system for guava disease detection [18], as the production of guava fruits creates severe issues in developed and underdeveloped countries [19,20]. The automation of disease detection is currently the fastest, least expensive, and most accurate solution [21]. It could cost more, but it can lead to a colossal time reduction by automating the disease detection process [22]. For prediction models, the RGB color channel images are primarily used, which are visually distinguishable by color. The color features could be strong descriptors to distinguish different diseases. However, obtaining deep feature-based models could be more robust, as this covers many other aspects such as geometry, pattern, texture, and other local features. For this, a local guava disease-based RGB image dataset was collected by a high-display-quality camera here. It contains four types of disease, namely canker, dost, rust, and mummification, with the fifth category as the healthy class. Further details of the dataset are discussed in Section 3. The proposed study was inspired by deep learning (DL), and a comparative analysis of various pretrained models is proposed. The main contributions are as follows:


The rest of the manuscript is divided into three sections. Section 2 presents the related work. Section 3 is the methodology. Section 4 outlines the results and discussion.

#### **2. Related Work**

Plant disease detection is becoming increasingly automated. However, both machine learning and deep learning methods are used [23] in order to provide intelligent automated solutions, with a few recent studies on both categories are discussed below.

#### *2.1. Machine-Learning-Based Plant Disease Detection*

Some experts confirmed the labeling to identify unhealthy from healthy guava fruits. Handcrafted features named local binary patterns (LBPs) are extracted and further reduced using principal component analysis (PCA). The multiple types of machine-learning (ML) classifiers are used, where a cubic support vector machine performed the best among the various methods in [24]. Edge- and threshold-based segmentation is performed for plant disease detection using images of leaves. Multiple features of color, texture, and shape are extracted, which are fed into a neural network classifier that classifies different plant diseases [25].

Shadows are removed from the background by enhancing, resizing, and isolating the region of interest (ROI), with clustering performed by using K-means for pomegranate fruit disease detection in [26]. Guava disease detection was performed using basic color transformation functions in image processing to detect the actual diseased parts of plant leaves. Classification was performed using support vector machine (SVM) and K-nearest neighbor (KNN) in [27]. Apple disease detection was performed using the spot segmentation method, where feature extraction and fusion were also performed. The decorrelation method was used for the fusion of extracted features in [28]. A soil-based analysis to recognize the soil indicator that plays an important role in plant yield was also used in [29]. Similarly, the weather forecasting history could play an important role in plant monitoring systems to avoid any natural hazards, as in [30]. The hue–saturation–intensity (HSI) color space was initially used, where unhealthy areas were detected using textures. Multiple features were extracted after the color conversion of the data. Features such as homogeneity, energy, and other cluster-based features were extracted. Lastly, SVM was used for classification in [31].

#### *2.2. Deep-Learning-Based Plant Disease Detection*

Demand for deep-learning (DL)-based studies is increasing due to their promising results. Big data are used for DL model training for the prediction of automated detection. Therefore, a similar study used more than 54,000 images of 14 crop diseases with 26 different diseases types. A deep convolutional neural network (DCNN) was proposed with a 99.35% accuracy achieved on the held-out test dataset. Lastly, smartphone-assisted automatic crop disease detection was proposed and an app was suggested for development in [32]. A similar big data dataset was used for plant disease detection. The open dataset of more than 87,000 images was used with 25 different plant categories. Multiple DCNN architectures were used, and the best-performing network achieved a 99.53% accuracy. The reported results showed that this tool model can be used for real-time plant disease identification [33]. Symptom-based gaps found by researchers that cover it by proposing their own CNN with a visualization technique were also missing in previous architectures.

Modified networks for plant disease identification were applied that improved the results of [34]. In-depth features and transfer learning using famous architectures were applied on famous models' architectures. Deep-feature-based classification using SVM and other ML classifiers showed better results than those of the transfer-learning method. Moreover, the fully connected layer of state-of-the-art architectures such as VGG-16, VGG-19, and AlexNet showed better accuracy than that of other fully connected layers [35]. The images of specific conditions and various symptoms were acquired in real time, and these were missed in public datasets. To tackle this limitation, data augmentation was performed, which took a single input leaf image from multiple views that covered certain conditions on the same leaf input image. It also covered multiple diseases affecting leaves. Augmentation-based predictions increased the accuracy by 12%. Furthermore, the data limitation suggested using data augmentation in [36]: the Plant–Village dataset contained differently annotated apple black rot images. Fine-tuned DL models were trained, and the best accuracy was achieved by VGG-16, at 90.4% [37]. Different ML- and DL-based methods are shown in Table 1.


**Table 1.** Summary of recent studies on the detection of guava diseases.

Although DL has shown excellent results in plant disease detection, it still faces some challenges. The big data challenge compromises previous studies because they used limited data. The data limitation can be reduced by using various strategies that also produce a more confident model by covering a different aspect of a specific input sample of plant disease [44].

#### **3. Methodology**

The automation of plant disease monitoring is taking the place of manual monitoring. Many researchers have used real-time experimentation of plant disease monitoring and achieved satisfying results. A local Pakistani dataset was collected for guava plant and fruit disease detection in the proposed study. Data augmentation was used to meet the big data usage challenge, where it was also used to cover model overfitting problems. Data augmentation was performed using the affine transformation method; to enhance the region of interest (ROI), unsharp masking and the histogram equalization method were performed, better sharpening the ROI and removing any existing noise in the augmented data. The final augmented and enhanced data were fed into various fine-tuned state-of-theart classification methods. All the steps are shown in Figure 1.

In the framework, augmented and enhanced images were given to 5 different predefined architectures by replacing their last layer according to the given data classes. AlexNet was the first model in the ImageNet competition that changed the image classification and object detection using deep-learning models. SqueezeNet, GoogLeNet, and ResNet followed with many others. Famous ones with different kinds of architectures were used to check the effectiveness of a given local guava dataset. The proposed method showed an initial step on a newly collected local Pakistani dataset, where more methods can be adopted using a different ML and DL technology. The details of the dataset before and after augmentation and the details of other used CNN architectures are shown in Section 3.1 by discussing their fine-tuned parameters, with the weights in Sections 3.2–3.4. The achieved results on the validation data are shown in Section 4.

**Figure 1.** Proposed framework for guava plant disease detection.

#### *3.1. Dataset Normalization*

The dataset was initially collected using a high-definition camera with different resolutions. Images were of different angles and orientations, which may lead to the misguidance of the prediction model due to the illusion factor, spatial resolution changes, camera settings, background changes, and many other real-time factors. Therefore, the data were first resized to be equal in size using the bicubic interpolation method. This uses 4 by 4 neighborhood pixels to interpolate the 16 nearest pixels, primarily used in many image-editing tools. This improved the results as compared to those of the bilinear and nearest-neighbor methods. Interpolation was used to resize the image. The resized image was again augmented and enhanced; its histogram-based representation is shown in Figure 2.

**Figure 2.** (**top**, **left**) Original and (**top**, **right**) enhanced. (**bottom**, **left**) Original and (**bottom**, **right**) enhanced image histogram.

The histogram shows that the data intensity levels were equalized after the preprocessing of data resizing and enhancement.

#### *3.2. Data Augmentation and Enhancement*

Resized images were rotated or transformed using the affine transformation method. Different angles with a 360 rotation were used with an angle difference of 45◦. There were 9 angles in total in each sample instance applied for all classes, namely 0◦, 45◦, 90◦, 135◦, 180◦, 225◦, 270◦, 315◦, and 360◦. Affine transformation was calculated as given in Equation (1), and the applied augmentation sample for each category is shown in Figure 3.

$$Rotation = \begin{bmatrix} \cos(a) & \sin(a) & 0\\ -\sin(a) & \cos(a) & 0\\ 0 & 0 & 1 \end{bmatrix} \tag{1}$$

The angle of rotation according to an affine rotation is shown in Equation (1). A is the angle value that was changed nine times for an image to obtain the new rotated image.

Data enhancement was applied with a combination of unsharp masking and color histogram equalization, and both of these methods were applied and calculated using Equations (2) and (3).

$$f\ (I) = \mathfrak{a}f - \beta f l \tag{2}$$

The output enhanced image was calculated in *f(I)*, where *α* and *β* are constant, to be the input image that is multiplied where the original image is processed and subtracted via low-pass filter process mask *f <sup>l</sup>*. Histogram equalization was used in the image processing to enhance a given RGB image, and the three channels were individually evaluated using Equation (3).

$$T\_k = (L - 1) \text{cdf}(P) \tag{3}$$

The cumulative distribution of the given intensity was calculated over the probability of occurrences, as calculated in Equation (4)

$$\mathbb{C}df\ (P) = \sum\_{k=-\infty}^{P}Prob(k)\tag{4}$$

This calculated accumulative distributive value was then multiplied with maximal value intensity, and the newly calculated transformed intensity was calculated and mapped to the corresponding pixels throughout the given image.

**Figure 3.** Five types of guava species image samples with their enhanced and rotated 9 angle images in circular view in 4 circles.

Rotated images covered a different aspect of the actual time occurrence, which could be any of the orientations for the user. The training and predictions of rotated augmented data offered promising results.

#### *3.3. Basics of Convolutional Neural Networks*

There are many proposed CNN architectures used in various aspects of intelligent classification and object-detection systems. These architectures have slight differences in their networks, and the analyzed primary layers and components are discussed here. We discuss these basics before explaining state-of-the-art models of classification that were also used.

#### 3.3.1. Convolutional Layer

The convolutional layer is called as such when at least one convolutional operation is used in the input layers of an architecture. The convolutional operation uses various parameters such as kernels, where the kernel size is specified for parameter initialization. Similarly, padding and stride size are also initialized and used in convolutional operations. The convolutional operation is summarized in Equation (5).

$$Conv\_i^l = Bias\_i^l + \sum\_{j=1}^{a\_i^{(l-1)}} w\_{i,j}^{(l-1)} \ast \mathbb{C}\_i^l \tag{5}$$

In Equation (5), *Conv<sup>l</sup> <sup>i</sup>* is the output of a convolved operation in which *Bias<sup>l</sup> <sup>i</sup>* is the bias matrix, with the *ith* iterative region of operation on which convolved window *w* is evolving, and *i*, *j* represents the window size of the rows and columns. Iterated convolved window *Cl <sup>i</sup>* is multiplied with the corresponding pixels of the given image, where the selected area is defined by window size *wi*,*j*.

#### 3.3.2. Batch Normalization

Batch normalization is a normalization operation, such as the min–max data normalization performed in data cleaning. Batch normalization is a normalization in which a batch of input data is normalized, and it can be written as in Equation (6).

$$\mathbf{x}'\_i = \frac{\mathbf{x}\_i - \mu\_B}{\sigma\_B^2} \tag{6}$$

It normalizes data, where the data transformation has taken place, such as a mean output close to 0, and the standard deviation output remains close to 1. In Equation (6), input *x* of a particular instance is subtracted from the mean (μ) of batch *b*, where after subtraction, a ratio is calculated over the square of the standard deviation (*σ*) of that particular batch (*B*) where instance *x* belongs, and a normalized value of *x <sup>i</sup>* is returned as output.

#### 3.3.3. Pooling Layer

Pooling pools over a specific item from some scenarios, where pooling in CNN is used to calculate a max, min, and average pool to take a single output value from a defined kernel window. The stride is also used as a parameter to define the ongoing or iterating step for a pooling value. The pooling value is calculated as in Equation (7).

$$D\_{output} = \mathbf{x}\_h \* \mathbf{x}\_{\text{iv}} \* \mathbf{x}\_d \tag{7}$$

The output dimension after performing pooling is represented as *Doutput*, where *x* is the input instance and instance height, the width represented as *h*, *w*, and the color channel dimension is represented as *d*, for instance *x*.

#### 3.3.4. Rectified Linear Unit

ReLU is an activation unit where other activation units such as tanh and sigmoid are also used, and it is used in various studies. ReLU is also called the piecewise linear function, and it simply outputs an identical input variable to the input if it is >0; otherwise, it is 0. Lastly, it maximally excludes the misguiding value in calculating an output class by a prediction model of artificial intelligence. We can simply write the ReLU as in Equation (8).

$$RelLI = \max(0, \mathbf{x}) \tag{8}$$

The linear behavior of this activation function makes it a commonly used activation function. In Equation (8), the output is shown as ReLU, where input *x* is to be taken as the max, which is calculated very straightforwardly if the input value is positive >0; then, it outputs the simple input as it is where <0, or negative values are only taken as 0.

#### 3.3.5. Softmax

The softmax function returns probability values in the range of 0–1, where maximum likelihood returns a higher probability value. It is somehow matched to multilinear regression, where multiple classes are predicted using internal values. The softmax function can be calculated as Equation (9).

$$\sigma(\left.\overrightarrow{z}^{i}\right)\_{i} = \frac{e^{z\_{i}}}{\sum\_{j=1}^{k} e^{z\_{j}}} \tag{9}$$

In Equation (5), the softmax operation is calculated as input vector −→*z* where *zi* are all the input values of vector *z*. Exponential function *e* is applied over each value that gives a positive value greater than 0. The denominator value confirms all values sum up to give one value. The final *K* is the output class number that changes from application to application.

#### *3.4. Classification Using the AlexNet Architecture*

AlexNet was the first model in deep learning to change the trend of image identification and classification tasks. It was initially proposed for detecting and classifying objects using a benchmark dataset from ImageNet. Using the AlexNet architecture, the image size for the input layer is taken to be 227 × 227 × 3. In this architecture, there are only 5 convolutional layers and 3 fully connected layers, giving 25 layers in total. The last layers were altered in the proposed framework, and then, we used fine-tuned network parameters. The modifications are shown in Table 2.

In Table 2, all layers above remain the same as in AlexNet, where Fc-8 is first altered with four layers, and the two layers of the softmax activation class output correspondingly give the output for the five categories of guava species.


**Table 2.** AlexNet for guava disease detection.


**Table 2.** *Cont*.

#### *3.5. Classification Using GoogLeNet Architecture*

Google developers focused on the proposed AlexNet model and then introduced the inception module and changed it sequentially by stacking up layers. This introduced different and smaller kernel size windows with more layers in them. It became the winner of the 2014 ILSVRC competition. The inception modules that were the fundamental contribution by GoogLeNet are shown for the trained architecture on the guava dataset, and the layer-by-layer parameters are shown in Table 3.



The overall architecture remains similar, where the last three layers are altered, and the upper-layer connections remains connected.

#### *3.6. Classification Using the SqueezeNet Architecture*

SqueezeNet was introduced with five modules. It is claimed that the 3 × 3 kernel size should be reduced to 1 × 1, reducing the size of the overall parameter. Downsampling is also reduced into layers. More feature maps are thus learned by the layers. The introduced fire module contains the squeeze layer, and it has 1 × 1 filters. They are fed an expanding layer that is a mixture of 1 × 1 and 3 × 3 kernels. The SqueezeNet layer architecture and parameters with altered layers are shown in Table 4.


**Table 4.** SqueezeNet network used for guava disease detection.

The fire modules in the hyperparameter continuation produce three tunable parameters, namely s1 × 1, e1 × 1, and e3 × 3. AlexNet's level of accuracy was achieved by the actual SqueezeNet with 50× fewer parameters, and the model size was reduced to just 0.5 MB because of the decrease in the kernel sizes and the fire modules used in this architecture.

#### *3.7. Classification Using the ResNet-50 Architecture*

ResNet was introduced with the residual block concept mainly to answer the overfitting issue created in DL models. It uses a considerable number of layers, such as 50, 101, and 152. As ti is suggested by GoogLeNet to use a small kernel size, it uses small convolutional kernels where denser or more layers are used to meet or improve the validity of the data. The introduced residual block uses a 1 × 1 layer that reduces the dimension, a 3 × 3 layer, and a 1 × 1 layer used to restore the dimensions of the given input. The layer-based ResNet was used, so we used a 50- and 101-layer architecture; the 50-layer architecture of ResNet is shown in Table 5.


**Table 5.** ResNet-50 network used for guava disease detection.

The basic residual branches of ResNet-50 with their learned parameters in guava disease detection are shown in Table 5. The residual block re-concatenates spatial information from the previous block to preserve information in each calculated feature map of the residual block. For the proposed framework, the last layers are altered with five categories to classify them on the basis of previous learning on the augmented guava data of the proposed study.

#### *3.8. Classification Using ResNet-101 Architecture*

The ResNet-based study produced many variants of its introduced residual blocks, such as 18, 19, 34, 50, and 101, and the densest of 152. Making it increasingly denser did not improve the accuracy after a certain point. This may be due to many factors, such as learning saturation, loopholes in the proposed architecture, and hyperparameter optimization. Therefore, the mainly used networks were ResNet-50 and 101. The learnedweight-based architecture of the residual blocks using the ResNet-101 architecture for guava disease detection are described in Table 5; the difference between 50 and 101 is in their architecture. There are 347 layers in total in ResNet-101 and 177 layers in the ResNet-50 model. Learning mainly changed after Res-branch 4a, as hundreds of layers are added after it learns in different ways, as described in the ResNet-101 architecture.

#### **4. Results and Discussion**

The proposed study used augmented data of actual given locally collected data in Pakistan for guava disease detection. The data had enough images to train the DL model. The dataset details for before and after augmentation are shown in Table 6.


**Table 6.** Dataset description with and without augmentation.

#### *4.1. Evaluation Measure*

There are mainly four types of prediction instances, which we can consider in the formulation of these above-mentioned evaluation measures: true positive (*TP*), false positive (*FP*), true negative (*TN*), and false negative (*FN*). These are described in detail below.

#### 4.1.1. Accuracy

Accuracy is the most commonly used measure in the ML and DL domains for classification. It can briefly be described as truly predicted instances over total instances, including wrong and right predictions. In terms of the four types used above, the equation for accuracy can be written as follows:

$$Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)}\tag{10}$$

The equation can be described as the ratio of the summation of *TP* and *TN* over the summation of *TP*, *TN*, *FP*, and *FN*.

#### 4.1.2. Specificity

This is the measure among the right predictions over the total that were from both the positive and negative classes. Briefly, negatively labeled objects are measured over the total of true- and false-negative instances. It can be written as:

$$Specificity = \frac{TN}{(TN + FP)}\tag{11}$$

The specificity equation is defined as the ratio over *TN* and the summation of *TN* and *FP*.

#### 4.1.3. F1 Score

The F1 score is an essential measure, as it is calculated for both the essential measures of recall and precision. Recall is also known as sensitivity and is the measure to detect positive class predictions among true positives and false negatives. For the F1 score, we need to calculate the sensitivity as follows:

$$Recall\ (Sensitivity) = \frac{TP}{(TP + FN)}\tag{12}$$

The second measure, precision, is also separately calculated and used in the F1 score measurement. Precision is calculated to obtain a truly predicted class over true and false positives. It is represented as:

$$Precision = \frac{TP}{(TP + FP)}\tag{13}$$

After obtaining the precision and recall, the F1 score can be calculated. After having *TP* over *TP* and *FN* by recall and by obtaining *TP* over *TP* and *FP*, we can obtain more precise measurements for true-positive predictions. The final F1 score can be calculated as:

$$F1 - score = 2 \times \frac{Precision \* Recall}{Recall + Precision} \tag{14}$$

Therefore, the F1 score equation can be defined as two multiplied by the ratio of the product and summation of precision and recall.

#### 4.1.4. Kappa–Cohen Index

The last measure is to have confidence about the statistical analysis of the results, as statistical analysis is broadly used in many aspects of scientific work. Therefore, a statistical measure that gives confidence over confusion matrix values was calculated for evaluation. The kappa index gives the confidence over a certain range of confusion-matrix-based calculated values. If its value range lies in the range of 0–20, it promises that 0–4% of the data are reliable to use for prediction. If its index value lies in the range of 21–39, it promises a 4–15% data reliability. If it lies between 40 and 59, it promises 15–35% data reliability. If it lies between 60 and 79, it promises a 35% to 63% reliability. If it lies between 80 and 90, then it promises strong data reliability, and if it lies at more than 90, that means 82% to 100% reliability. It can be calculated as:

$$Agreement = \frac{\left(\frac{cm\_1 \circ rm\_1}{n}\right) + \left(\frac{cm\_2 \circ rm\_2}{n}\right)}{n} \tag{15}$$

The agreement type was calculated using *cm*1, *cm*2, *rm*1, and *rm*2, where *cm*<sup>1</sup> represents Column 1 and *cm*<sup>2</sup> represents Column 2. *rm*<sup>1</sup> and *rm*<sup>2</sup> represent Rows 1 and 2 of any twoclass confusion matrix. This formulation is the general form of a two-class confusion matrix; in the case of our proposed methodology, there are five columns and rows that extend the formulation to up to five rows and columns.

The total augmented data were later split into a 70/30 ratio for training and testing data, where the fine-tuned parameters for each of the five training models remained different and showed different results. The training and testing data number of instances became 2023 and 866, respectively.

There were five different kinds of architectures applied to classify guava diseases using the augmented image data. The individual class testing data prediction results are discussed with their overall results. The evaluation measures accuracy, sensitivity, specificity, precision, recall, and the kappa index were also used as statistical measures. The results are shown in Table 7.


**Table 7.** Results obtained on the basis of individual class testing data.

The individual class and overall results for each model were evaluated. Several evaluation measures were used: accuracy, specificity, F1 score, precision, and kappa.

The first model, AlexNet, showed 98% accuracy, 99% specificity, a 97.60% F1 score, 97.14% precision, and a 0.5296 value for kappa as the canker class prediction results of the testing data. Accuracy is a general measure over all positive and negative instances that are either wrongly or correctly predicted. The 98% accuracy of the canker class showed accurate predictions of positive and negative classes and mainly showed excellent and satisfactory results in TN predictions over TN and FP; specificity showed that true negatives were mostly predicted right among TN and FP. Precision showed TP over TP and FP, with a 97.14% value; this means that it had less accurate predictions. For a positive class to see the combined effect of TP and TN, the F1 score measure was used. The F1 score showed a 97% value, which summarizes both of the above measures. The kappa index showed a weak level of agreement for the canker class. The dot class showed 98.04% accuracy, 99.69% specificity, 98.52% F1 score, 99% precision, and a 0.53 kappa value. The dot class showed less accuracy than that of the canker class, but it needs to be discussed with another evaluation measure to analyze the predictions of positive and negative instances. Specificity, which was 99.7% in the case of the dot class, represented TN among TN and FP where the precision was 99% for TP over TP and FP. The F1 score of this measure showed 98.52%, which summarizes the recall and precision, which could be considered to be more promising factors compared to sensitivity, specificity, and precision. The last statistical measure showed a weak level of agreement for this class. The third and healthy class showed accurate results in all networks where the data agreement value was also 0.97, showing an extraordinary level of agreement on the given data. The next class, mummification, showed 98.66% accuracy, 99.53% specificity, a 98.66% F1 score, a 99.53% precision value, and a 0.485 value for kappa. The accuracy for the mummification class was slightly better than that for the canker and dot classes. Other values such as specificity were lower than those for canker and dot, but with large differences. The precision value was higher than that of the canker class and lower than that of the dot class. To summarize both precision and specificity, the F1 score was used, which was nearer to that of the dot class and higher than that of the canker class. The kappa value is a decision-maker index, with a 0.48 value, which was less than that of the canker and dot classes, but also had weak agreement with the data reliability. The kappa value for the last class was 0.56 due to the one value for both precision and specificity, but 56 also lies in the weak agreement class. Lastly, the overall or mean results were evaluated, showing 98.50% accuracy, 99.60% specificity, a 98.75% F1 score, 98.75% precision, and a 0.9531 value for the kappa index. The mean or overall value was an actual representation of a model that produced good results; it was either about the accuracy, specificity specifically for TN, and precision for the TP value, and the F1 score represents the recall and precision. A kappa value of more than 90% is a strong agreement level, and the data reliability is also high when kappa is more than 90. Therefore, AlexNet overall showed satisfactory results.

GoogLeNet showed testing results on the canker class as follows: 96.635% accuracy, 99.69% specificity, 97.81% F1 score, 99.01% precision, and 0.5285 kappa value. Accuracy showed promising results where, if examining the specificity value over TN values, it showed a value of 99.69%. Similarly, the precision value over the TP values showed 99%, and the F1 score over precision and recall showed a value of 97.81%, which showed more confidence than precision and specificity did. The last important measure is the kappa index, 0.5285, which showed weak promise as it was of only one class over other classes. The mean value showed the actual effect of the kappa stat. The other dot class showed values of 97.56% accuracy, 98.638% specificity, a 96.618% F1 score, 95.69% precision, and a 0.5269 kappa value. The accuracy value as compared to that of the canker class was lower. In the case of the dot class, where the specificity value was also slightly lower, its precision value was lower than that of the canker class. The F1 score based on precision and recall showed a slightly higher score in the canker class, where the last kappa Cohen index was similar and lied on weak agreement of data reliability. The mummification showed a 99.42% value of the accuracy, a 99.37% value of the specificity, a 97.29% value for the F1 score, a 98.18% precision value, and a 0.49 kappa value. For an accuracy value higher than those of the canker and dot classes, where the specificity value was lower than that of the canker, and slightly lower than that of the dot class, this means that some TN instances had variation in these cases. The F1 score showed a lower score than that of the dot class and higher than that of the canker class. This means the combined effect of TP and FP was more promising for mummification compared to that for the dot class. The last class of rust showed 99.47% accuracy, 99.114% specificity, a 98.172% value for the F1 score, 96.907% for precision, and 0.56 for the Cohen index. The accuracy value was higher than that of the canker, dot, and mummification classes. The F1 score showed a 98.17% value that was nearest the canker, mummification, and dot classes. Therefore, the accuracy value was the measure to analyze the test prediction results where other values such as the F1 score also mattered and made the results distinguishable.

SqueezeNet testing showed results for the canker class of 97.596% accuracy, 99.392% specificity, a 97.838% F1 score, 98% precision, and a 0.52 kappa value. The accuracy value, a general assumption of model performance, had a lower value than that of specificity, precision, and the F1 score. The specificity value was much higher, which means that true

negatives over the TN and FP had higher rates for the canker class. TP cases were also predicted with a 98% value over the FP and TP as the precision scores. The recall- and precision-based F1 score showed 97.83%, which is intermediate between the precision and specificity. The kappa value was 0.52 and lied on the weak agreement of the data. The second dot-class-based results showed 100% accuracy, 100% specificity, a 96.68% F1 score, 100% precision, and a 0.51 kappa value. Although the accuracy was good for this class, specificity showed a 100% result. The F1 score that covered recall and precision validity had a 96.68% score and 0.52 kappa value. The mummification class showed a predictivity of 98.214% accuracy, 98.91% specificity, a 97.56% F1-score, a 96.916% precision value, and 0.48 for the kappa index. The last class of rust showed predictivity measures of 91.53% accuracy, 100% specificity, a 95.58% F1 score, and 100% precision. The accuracy value was lower than that of the other three classes where precision and specificity were 100% in this case. The global or overall results for the all classes had a primary issue. Accuracy was 97.11%, lower than that of AlexNet and GoogLeNet, where specificity was lower than that of both AlexNet and GoogLeNet, and the F1 score, precision, and kappa were lower for this model testing the data predictions. The kappa index showed promise for these data.

ResNet-50 and -101 had much more improved results than those of AlexNet, SqueezeNet, and GoogLeNet. ResNet-50 showed canker class results of 99.51 accuracy, 99.68% specificity, a 99.28% of F1 score, a 99.03% precision value, and 0.51 for precision value of the kappa index. Compared to the previous cases of AlexNet, GoogLeNet, and SqueezeNet, the results were overall improved for all measures. Similarly, for the dot class, the accuracy value was 100%, 99.697% specificity, a 99.515% F1 score, a 99.034% precision value, and a 0.52 kappa index. The canker class results were not better than those of the dot class if we look at the accuracy measures, with only a slight difference in the F1 scores. Mummification results showed 98.661% accuracy, 100% specificity, a 100% F1 score, 98.25% precision, and a 0.48 kappa value. Although it had 100% accuracy as an individual class, the specificity and precision values were also higher than those of the canker and dot classes. The rust class showed 100% accuracy, 100% specificity, a 100% F1 score, and a 100% precision value. The overall results were improved as compared to the above models' mean results. The global mean results were 99.54% accuracy, which was better than that of SqueezeNet, GoogLeNet, and AlexNet. Specificity was also better than that in the three models. The F1 score and precision were both better than those in the previously discussed three models. Although it had a higher value than that of the previous models, the last kappa had the same class of confidence. The data reliability was higher for all models.

The last model of the proposed study also achieved good results as compared to the other models. The canker class showed predictivity values of 99.519% accuracy, 98.784% specificity, an F1 score of 97.87%, a precision of 96.27%, and a 0.51 kappa index. ResNet-50's accuracy had a dominant result compared to the previous class results of the other models, while other evaluation values also showed more improvement in this model. If the accuracy value was improved, other values were not so improved, but different cases showed overall improvement of the results. In ResNet-101, the dot class showed 98.049% accuracy, 99.84% specificity, 99.505% precision, and a 0.53 kappa value. Accuracy, specificity, precision, and the F1 score were overall improved for each class, which did not happen in the previous models' results for any class. The mummification class again showed 100% accuracy in this model testing. The mummification class showed 100% accurate results in other models where other values were not improved to such an extent: 98.41% accuracy, 99.84% specificity, 98.93% F1 score, and 99.47% precision. The last class showed consistency in the improvement of the results for each class by also showing promising results here. Lastly, the overall mean results of ResNet-50 proved it to be more accurate than the four other models. The accuracy was also better than that of the others. Similarly, other measures also showed excellent performance. The graphical illustration of all five models' mean testing results is shown in Figure 4.

**Figure 4.** Guava disease classification result visualization.

The overall results showed that the kappa value had overall excellent data reliability for all models. The mummification, the healthy and dot classes were more distinct classes to distinguish them from the other two classes, as they showed 100 true results many times. The densest model with more residual connections showed more accurate results, which means that using a small kernel size with an increasingly denser network improved the classification results.

For individual cases or data-based testing analysis, the confusion matrices were designed and evaluated, and they are shown in Table 8.


**Table 8.** Confusion matrix obtained using several state-of-the-art networks.

The confusion matrices of all architectures showed that the rust class was the most distinguishable among other guava diseases. If we discuss the AlexNet model results, four wrong cases were predicted as wrong in the rust and mummification classes, four wrong predictions were found for the dot class, where the wrong predictions lied on canker and rust. There were three wrong predictions for mummification, in the class of canker, and there were two wrong predictions in the rust class; these wrong predictions were two for dot. The second GoogLeNet architecture made 201 correct predictions in the canker class, and seven wrong predictions lied in the dot, mummification, and rust class, while no prediction lied in the healthy class. The five wrong predictions for the dot class were two predicted as mummification and three was rust, whereas 200 were predicted as right. This makes it one case less accurate than AlexNet, as that made four wrong predictions in the dot class. Mummification was predicted as nine wrong classes in the canker, dot, mummification, and rust categories, where two-hundred sixteen cases were rightly predicted. In rust, 188 cases were rightly predicted. One was predicted wrongly in the dot class. It was highly more efficient than AlexNet was in this category, as that predicted two wrong cases in the dot class, and GoogLeNet predicted only one wrong. The SqueezeNet architecture model made five wrong predictions, with two-hundred and three correct predictions of the canker class. The five wrong predictions were in the mummification class. In the dot class, there was no wrong prediction, and all 205 test instances were rightly predicted. In mummification, there were four wrongly predicted cases and two-hundred and twenty rightly predicted cases. Most were predicted in the dot class with three instances, with one predicted in the canker class. Rust had 16 wrong cases, and most were in the dot class—11 out of 16. ResNet-50 was similar to ResNet-101, where the difference was mainly of several layers and its parameters. ResNet-50 predicted one wrong cases for the canker class with one wrong prediction in dot, and two-hundred and seven were correctly predicted. All dot class predictions (205) were correctly predicted in the dot class. The mummification class in this model's predictions had 221 correct predictions in dot. It has one wrong and two wrong predictions in the canker class. ResNet-101 also had higher accuracy results as compared to those of all other models. According to its confusion matrix, it made one wrong prediction for the canker class into the rust class. As compared to ResNet-50, it had one wrong case prediction, but in a different class. Regarding the dot class, ResNet-50 made no wrong predictions, and ResNet-101 made four wrong predictions. In the third case of the mummification class, there were four wrong predictions, and ResNet-50 made three wrong predictions. There were three wrong predictions in the last class, rust. There were 186 correct predictions for ResNet-101.

The above analysis of the five models shows that ResNet-50 had an overall high rate of correct predictions; for wrong predictions in the dot class data, it was highly difficult for each model, as it was predicted as the wrong class in most cases. The other classes also misled the models, where the most challenging and least robust class was dot. Therefore, the dot class may need more confident and robust approaches to classify it from other classes. The mummification class had no wrong predictions in ResNet-50 and -101, where it only had a higher rate of wrong predictions in the cases of SqueezeNet with four, where the two remaining models also did not make very many accurate predictions for this class. The healthy class overall in all models remained accurate with no wrong prediction by any model. It made the normal class easily distinguishable by any model. However, the challenge was differentiating the guava disease categories.

#### **5. Conclusions**

Guava is an important plant to monitor with the growing population; its production demand is also increasing. Pakistan is a leading global guava producer. Hence, for automatic monitoring, the study proposed a DL-based guava disease detection system. Data were preprocessed and enhanced using a color histogram and unsharp masking method. Enhanced data were then augmented over the nine angles using the affine transformation method—augmented enhanced data used by five DL networks by altering their last layers. The AlexNet, GoogLeNet, SqueezeNet, ResNet-50, and ResNet-101 architectures were used. The results of all networks showed adequate measurements, and ResNet-101 was the most accurate model. Future work should use more data augmentation methods such as generative adversarial networks. Other federated-learning-based DL architectures can be applied for classification to obtain more robust and confident results for this Pakistani guava disease dataset.

**Author Contributions:** Conceptualization, A.M.M., S.A.K., T.M., H.T.R., A.A.A. and M.A.A.; funding acquisition, A.M.M., A.A.A. and M.A.A.; methodology, A.M.M., S.A.K., T.M., H.T.R., A.A.A. and M.A.A.; software, A.M.M. and S.A.K.; visualization, H.T.R.; writing—original draft, A.M.M., S.A.K., T.M., H.T.R., A.A.A. and M.A.A.; supervision, H.T.R. All authors have read and agreed to the published version of the manuscript.

**Funding:** The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through Research Group No. RG-1441-425.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare that there is no conflict of interest.

#### **References**

