FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases

Ali, Safdar; Hassan, Mehdi; Kim, Jin Young; Farid, Muhammad Imran; Sanaullah, Muhammad; Mufti, Hareem

doi:10.3390/app12073514

Open AccessArticle

FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases

by

Safdar Ali

¹,

Mehdi Hassan

^2,3

,

Jin Young Kim

^3,*,

Muhammad Imran Farid

⁴,

Muhammad Sanaullah

⁵ and

Hareem Mufti

⁶

¹

Directorate General National Repository, Islamabad 44000, Pakistan

²

Department of Computer Science, Air University, Islamabad 44000, Pakistan

³

Department of ICT Convergence System Engineering, Chonnam National University, Gwangju 500757, Korea

⁴

Department of Electrical and Computer Engineering, Air University, Islamabad 44000, Pakistan

⁵

Department of Computer Science, Bahauddin Zakariya University, Multan 60800, Pakistan

⁶

Department of Physics, Allama Iqbal Open University, Islamabad 44310, Pakistan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2022, 12(7), 3514; https://doi.org/10.3390/app12073514

Submission received: 18 March 2022 / Revised: 23 March 2022 / Accepted: 26 March 2022 / Published: 30 March 2022

(This article belongs to the Special Issue Intelligent Systems Applications to Multiple Domains Based on Innovative Signal and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Crop leaf disease management and control pose significant impact on enhancement in yield and quality to fulfill consumer needs. For smart agriculture, an intelligent leaf disease identification system is inevitable for efficient crop health monitoring. In this view, a novel approach is proposed for crop disease identification using feature fusion and PCA-LDA classification (FF-PCA-LDA). Handcrafted hybrid and deep features are extracted from RGB images. TL-ResNet50 is used to extract the deep features. Fused feature vector is obtained by combining handcrafted hybrid and deep features. After fusing the image features, PCA is employed to select most discriminant features for LDA model development. Potato crop leaf disease identification is used as a case study for the validation of the approach. The developed system is experimentally validated on a potato crop leaf benchmark dataset. It offers high accuracy of 98.20% on an unseen dataset which was not used during the model training process. Performance comparison of the proposed technique with other approaches shows its superiority. Owing to the better discrimination and learning ability, the proposed approach overcomes the leaf segmentation step. The developed approach may be used as an automated tool for crop monitoring, management control, and can be extended for other crop types.

Keywords:

deep learning; ResNet50; feature fusion; PCA; Alternaria solani; LDA

1. Introduction

Smart agriculture is gaining interest of machine learning researchers to solve the real and challenging problem of crop disease detection. Owing to leaf diseases, in the USA, more than USD 6 billion are lost by either crop management or yield reduction [1]. Excessive use of agro-chemicals without precise knowledge of crop disease is increasing production cost along with adversely affecting the environment [2].

Several diseases may attack on crop leaves simultaneously. For instance, the Phytophthora infestans and Alternaria solani diseases commonly known as ‘Late and Early Blight’ affecting potato leaves. Late blight is a fungal disease which spreads very quickly in warm and humid weather conditions. Owing to this disease, potato tubers are affected, hence a swift tuber rot [3]. On the other hand, ‘Early Blight’ is also a fungal disease which damages crop foliar. Major factors of ‘Early Blight’ include relative humidity, temperature, and wetness duration of leaf [4]. At initial stages, differentiation of the most leaf diseases are difficult for common growers and they applied agro-chemicals excessively. The domain experts can differentiate infected crop leaves even at initial stages. Such experts especially in developing countries is not easily available for crop health assessment. The early detection of crop leaf diseases may prevent a huge economic loss. In this scenario, an intelligent and automatic computer-aided diagnostic system is inevitable to identify the leaf diseases precisely.

In this paper, we present a new technique to classify crop leaf diseases by feature fusion followed by PCA based feature reduction and LDA for classification. Owing to the fusion of deep and handcrafted hybrid features, the proposed approach bypasses the major step of segmentation. Potato crop leaf images are used as a case study for system validation. The proposed approach maybe used as SaaS with an ordinary camera to acquire leaf images for better monitoring of crop health.

The rest of the paper is organized as follows. In Section 2, the related work is presented. The proposed approach is described in Section 3. Moreover, Section 4 explains the experimental results. The discussion is presented in Section 5. Finally, Section 6 provides conclusion and future recommendation.

2. Related Work

Several methods based on RGB, multi-spectral, and hyper-spectral imaging technologies are developed for plant leaf disease classification [5,6,7]. Dhaygude and Kumbhar proposed an imaging-based leaf disease severity system by utilizing segmentation approach. It composed two stages: (i) segmentation and (ii) classification. The classification efficiency highly depends upon quality segmentation. A slight variation in segmentation may considerably decline the classification performance [8].

Bindushree and Sivasankari proposed a plant leaf disease detection system by employing K-means algorithm for segregation of region-of-interest followed by feature extraction and SVM classification. Their approach mainly depends on precise segmentation of affected leaf areas [9]. Similarity, Revathi and Hemalatha reported a cotton leaf disease detection technique, which utilizes edge identification for segmentation to identify the affected region. Their approach requires user intervention and may not be useful when applied on massive scale analysis [10].

Tian et al. [11] proposed image segmentation and performed statistical analysis for plant leaf disease identification. The acquired images are passed through a kernel and then compared with the original images. Statistical analysis of the results defined disease type. Although the process seems useful, however, the optimal selection of kernel size for various diseases may significantly affect the whole disease identification process.

Hu et al. [12] proposed a hyper-spectral-image-based potato leaf disease detection approach. This technique produced acceptable results; however, the technology for acquiring hyper-spectral imaging might be inaccessible for an ordinary grower.

Recently, deep-learning-based approaches have gained the attention of researchers to solve real world problems in computer vision, object detection in videos and images, human disease diagnosis, and crop disease detection [13,14,15,16]. It has an advantage of automatic image feature extraction by the stacking of convolution and pooling layers. On the other hand, conventional approaches utilized handcrafted features for classification.

Several deep learning based approaches reported in the literature for classification of crop disease diagnosis [13]. Selvarj et al. [17] proposed banana plant disease detection using deep learning and obtained an accuracy of 90.0%. In another study, Zeng and Li [18] proposed deep learning model based on self-attention for crop disease detection. Saleem et al. [19] used convolutional neural networks for plant disease classification and achieved F-score value of 97.0%. Several other researchers used deep learning to develop different types of models for crop disease predictions [20,21,22,23].

In summary, conventional approaches have some associated challenges such as: quality image segmentation, numbers and types of handcrafted features for classification, and user intervention at certain levels. To minimize these limitations, it is highly desired to develop an automatic, efficient, scalable approach for crop disease classification. To this end, we have proposed feature fusion and the most discriminant features-selection-based PCA-LDA classifier for plant disease diagnosis. This will not only enhance accuracy but also reduce the computational complexity of the classifier.

3. Material and Methods

In this section, first, we describe the material used in this research and followed by the proposed methodology for plant leaf disease detection.

3.1. Material

The benchmark dataset of plant leaf disease images is obtained from PlantVillage (plantvillage.psu.edu). It comprises a variety of crops with labeled images. For validation of the proposed model, potato images dataset is considered. It has three classes, namely, Healthy, Early, and Late Blight. Class-wise distribution of Healthy, Early, and Late Blight are 142, 234, and 961 having different image sizes. Figure 1 shows sample class images of potato leaves.

3.2. Methods

The proposed approach comprises four parts: (i) handcrafted feature extraction, (ii) deep features extraction, (iii) feature fusion and selection of discriminant features by PCA, and (iv) LDA-based classification. The benchmark dataset was split into 75%:25% for training and testing. The 75% data used for model training. During model development, 5-fold cross validation was used on training data, where 4 folds were used for training and a fifth was used for validation. This process continued for several iterations. Once model training is finished, the unseen 25% data were fed to the trained model for evaluation. The schematic block diagram of the proposed approach is shown in Figure 2. Subsequently, the methodology of the proposed technique is explained in detail.

3.3. Handcrafted Feature Extraction

In this step, four different types of handcrafted features, namely, moments of gray level histogram

M G H

, local binary patterns

L B P

, histogram-oriented gradient

H O G

, and gray level co-occurrence matrix

G L C M

are extracted from an input image. These features are widely reported in the literature for object identification [24].

3.3.1. Moments of Gray-Level Histogram

The MGH features utilize intensity histogram, and it has an advantage to represent the image. Nine MGH features are extracted from the input image [24,25]. Generally, these features are extracted using nth order moment from expression (1):

u_{k} = \sum_{i = 0}^{L - 1} {(x_{i} - m)}^{n} \cdot p (z_{i})

(1)

where z shows the intensity level, and

p (z_{i})

represents the frequency of a given image intensity. The total intensity levels and the mean of the given image is represented by L and m, respectively.

The moment features (MF) are computed from potato leaf images using the following expressions:

MF 1 = average = \sum_{i = 0}^{L - 1} z_{i} \cdot p (z_{i})

(2)

MF 2 = S \cdot D = {\sqrt{u}}_{2}

(3)

MF 3 = Smoothness = 1 - \frac{1}{1 + M F 2}

(4)

MF 4 = 3^{r d} moment = \sum_{i = 0}^{L - 1} {(z_{i} - m)}^{3} \cdot p (z_{i})

(5)

MF 5 = Uniformity = \sum_{i = 0}^{L - 1} p^{2} (z_{i})

(6)

MF 6 = Entropy = - \sum_{i = 0}^{L - 1} p (z_{i}) \cdot {log}_{2} (p (z_{i}))

(7)

MF 7 = \sum_{i = 0}^{L - 1} p^{3} (z_{i})

(8)

MF 8 = \sum_{i = 0}^{L - 1} p^{4} (z_{i})

(9)

MF 9 = \sum_{i = 0}^{L - 1} p^{5} (z_{i})

(10)

3.3.2. Local Binary Patterns

LBPs use the spatial information of an image pixel within a certain radii. Binary patterns are generated by employing a radius which depicts the texture of neighboring pixels by considering the central pixel as a threshold [26]. LBPs have better representation capability to cater to the affected area of leaf disease. A total of 256 LBP features are obtained by selecting a radius of 1, 2, and 3 pixels, empirically. Details of the LBP can be found in [27].

The following expressions are used to obtain LBP features:

L B P_{R, P} (x, y) = \sum_{p = 1}^{P - 1} s (i_{p} - i_{c}) 2^{p}

(11)

where

i_{p}, i_{c}

, and R represent the pixel, its spatial information, and its radius, respectively. The pictorial representation of the LBP is shown in Figure 3.

The function s in Equation (11) can be defined as:

s (x) = \{\begin{matrix} 1, & i f x \geq 0 \\ 0, & otherwise \end{matrix}

(12)

3.3.3. Histogram of Oriented Gradients

HOG considered shape of the object present in an input image described by intensity gradients by splitting an image into smaller regions. A total of 81 features of potato leaf images are obtained [28,29]. The HOG features better described object shapes and are thus appealing to use since a disease spot forms variations in crop leaves.

3.3.4. Gray Level Co-Occurrence Matrix

The GLCM are statistical features to explore spatial image relationships. Different types of relationship (e.g., horizontal, diagonal, and off-diagonal) exist in images which need to be explored [25,30] the frequency of spatial adjacency. A total of 22 GLCM features are extracted from the image using the following expressions:

contrast = \sum_{n = 0}^{N_{g} - 1} n^{2} \{\sum_{i = 1 | i - j | = n}^{N_{g}} \sum_{j = 1}^{N_{g}} p (i, j)\}

(13)

Correlation = \frac{\sum_{i} \sum_{j} p (i, j) - u_{x} u_{y}}{σ_{X} σ_{Y}}

(14)

Sum of squares = \sum_{i} \sum_{j} {(i - μ)}^{2} p (i, j)

(15)

Inverse Difference = \sum_{i} \sum_{j} \frac{1}{1 + {(i - j)}^{2}} p (i, j)

(16)

Sum Average = \sum_{i = 2}^{2 N_{g}} i \cdot p_{x + y} (i)

(17)

Sum Variance = \sum_{i = 2}^{2 N_{g}} {(i - f_{g})}^{2} p_{x + y} (i)

(18)

Sum Entropy = \sum_{i = 2}^{2 N_{g}} (i) \log_{2} \{p_{x + y} (i)\}

(19)

Sum Entropy = - \sum_{i} \sum_{j} p (i, j) \log_{2} (p (i, j))

(20)

Difference Variance = variance of P_{x - y}

(21)

Difference Entropy = - \sum_{i = 0}^{N_{g} - 1} p_{x - y} (i) \log_{2} (p_{x - y} (i))

(22)

\begin{matrix} Information Measure of Correlation (IMC) : \\ HMC 1 = \frac{HXY - HXY 1}{(m a x \{HX, HY\})} \\ IMC 2 = {(1 - exp [- 2.0 (HXY 2 - HXY)])}^{0.5} \\ HXY = - \sum_{i} \sum_{j} p (i, j) \log_{2} (p (i, j)) \\ where, HX and HY are entropies of p_{x} and p_{y}, and \\ HXY 1 = - \sum_{i} \sum_{j} p_{x} (i, j) \log_{2} (p_{x} (i) p_{y} (j)) \\ HXY 2 = - \sum_{i} \sum_{j} p_{x} (i) p_{y} (j) \log_{2} (p_{x} (i) p_{y} (j)) \end{matrix}

(23)

Homogeneity = \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} \frac{p_{i, j}}{{(i - j)}^{2}}

(24)

Dissimilarity = \sum_{i = 0}^{N - 1} \sum_{j = 0}^{N - 1} p_{i, j} | i - j |

(25)

Additionally, auto-correlation, maximum probability, and cluster prominence are extracted.

In this part, four handcrafted extracted features of MGH, LBP, HOG, and GLCM are concatenated to form a hybrid feature vector of size 437. These combined handcrafted features are later fused with the deep features used for the development of the classification model.

3.4. Deep Image Feature Extraction

There are several advantages of deep feature extraction over common methods, such as user intervention not being required to define the kernel size, types, and numbers of features for any types of crop images. In comparison, in conventional feature extraction, user intervention is vital, and it is not necessarily the case that a universal configuration of features will work for all types of images. This encourages us to extract deep features and combine them with handcrafted hybrid features for the improved classification of crop disease detection.

Deep learning models have the ability to learn hidden patterns of input images at various convolutional and pooling layers. In this research, we have modified the ResNet50 model by employing the transfer learning (TL) concept for feature extraction. TL utilizes the partial weights of the original model and replaces new layers to solve new challenging problems [31]. In case of relatively less annotated data, TL is an effective way of solving classification problems.

The original ResNet50 was trained for 1000 classes and having the last 3 layers of ResNet50, namely, ‘FC_1000’, ‘FC1000_Softmax’, and ‘FC_Classification. In the current study, ResNet50 is modified by replacing the mentioned layers with the new layers of ‘FC_3’, ‘FC3_Softmax’, and ‘Class_output’ for our three classes ‘Healthy’, ‘Early’, and ‘Late Blight’. The ‘avg_pool’ layer of the TL-ResNet50 has 2048 neurons connected with succeeding layers of 3 neurons, each representing a crop class.

The pictorial representation of the modified ResNet50 deep feature extraction using the TL concept is shown in Figure 4. The numbers in blue blocks ‘3, 4, 6, and 3’ of Figure 4 show that each block of ResNet50 perform total number of operations in a particular block. For example, ‘conv2_x’ block has 3 layers of 1 × 1, 64; 3 × 3, 64; and 1 × 1, 256, and these are repeated in total 3 time, so providing 9 layers in total in this part. Similarly, the ‘conv3_x’, ‘conv4_x’, and ‘conv5_x’ blocks have a total of 3 × 4 = 12, 3 × 6 = 18, and 3 × 3 = 9 layers in each part, respectively. Owing to the residual function, ResNet has better learning and generalization ability compared to its competitors.

The benefit of features extracted using deep learning algorithms is that the network learns image features automatically layer-by-layer. Generally, the last layer of any deep learning network such as ResNet50 produces an output class predication using softmax classification. However, high-level features can be extracted before the FC layers.

In this research, deep (high-level) features are extracted using TL-ResNet50 model after the fifth residual block (Conv5_x, Figure 4) at the ‘avg_pool’ layer. This output serves as a deep image feature extractor. The dimension of the deep feature vector is of size 2048. These features are obtained once training of the TL-ResNet50 is completed. Details of the deep feature extraction and utilization for classification can be found in [14]. For better performance, the network hyper-parameter tuning of TL-ResNet50 is set empirically as follows: a learning rate of 0.0001, a batch size of 32, data augmentation of (−30, 30), and the number of epochs is set to 600.

3.5. Feature Fusion

This paper comprises two feature extraction parts: (i) handcrafted hybrid features and (ii) deep features from TL-ResNet50. A total of 437 handcrafted features are concatenated with 2048 deep features to form a single fused vector of length 2485. The combined features are utilized to design the PCA-LDA classification model.

3.6. Dimension Reduction Using PCA

PCA uses orthogonal transformations to identify the correlated variables and convert them into uncorrelated ones. For efficient classification, the dimension of the fused feature (FF) vectors is reduced by employing PCA. Consequently, the most discriminant features are obtained for the construction of the model. FF has the n dimensions

F F = x_{1}, x_{2}, \dots, x_{n}

, which needs to be reduced into

k ≪ n

dimensions. The following steps are used by the PCA to obtain a reduced fused feature (RFF) set:

Data scaling:

$x_{j}^{i} = \frac{x_{j}^{i} - \bar{x_{j}}}{σ_{j}}$

(26)
Co-variance matrix computation:

$\sum = \frac{1}{m} \sum_{i}^{m} (x_{i}) {(x_{i})}^{T}, \sum \in R^{n \times n}$

(27)
Eigenvector and Eigenvalue calculation:

$u^{T} \sum = λ μ U = [\begin{matrix} | & | & | \\ u_{1} & u_{2} \dots & u_{n} \\ | & | & | \end{matrix}], u_{i} \in R^{n}$

(28)
Eigenvalue selection. We have selected the top 100 Eigenvalues from K-dimensional space to be used in LDA classification, as shown in the following equation:

$x_{i}^{n e w} = [\begin{matrix} u_{1}^{T} x^{i} \\ u_{2}^{T} x^{i} \\ \dots \dots \\ \dots \dots \\ u_{k}^{T} x^{i} \end{matrix}] \in R^{k}$

(29)

3.7. PCA-LDA Model Development

The classification model is developed using the

R F F

set:

Z = R F F \times W

(30)

During model development, the LDA searches the linear combinations, where the optimal separation of multiple classes is achieved. It finds the optimal weights matrix

W = {w_{1}, w_{2}, \dots, w_{l}}

, where l number of solutions exists, and selects the solution to maximize the rate between and within class scatters, as shown in the following equation:

C S_{b} c = \sum_{i = 1}^{C} (μ_{i} - μ) {(μ_{i} - μ)}^{T}

(31)

where

C S_{b} c

represents the scatter between classes. Scatters within classes are shown in the expression below:

C S_{w} c = \sum_{i = 1}^{C} \sum_{j = 1}^{m_{j}} (μ_{j} - μ_{i}) {(μ_{j} - μ_{i})}^{T}

(32)

where

μ_{i}

is ith class mean, observation of ith class

m_{j}

, and T represents the transpose. The objective function obtained from combining (31) and (32) is as follows:

J {(W)}_{=} \frac{W^{T} C S_{b} c W}{W^{T} C S_{w} c W}

(33)

It requires finding weight vector

W^{*}

, which is associated with discrimination function in a way that J is maximum. The resultant Z matrix comprehensively represents the original features for the discrimination of one class among others. Details of the LDA can be found in [14,32].

3.8. Model Testing

The testing step is straightforward, in which the trained FF-PCA-LDA model is used at an independent dataset to predict ‘Healthy’, ‘Early’, or ‘Late Blight’ classes. As mentioned in Section 3.2, the model is tested on an unseen 25% of the data which were not presented during training. The developed model is quantitatively evaluated at various standard quality measures, which are given below.

Evaluation Measures

Performance evaluation of the developed model is carried out using various standard quantitative measures such as accuracy, sensitivity, specificity, F-score, and AUC. Moreover, the ROC of the developed model is plotted. For multi-class problems, a one-vs.-all strategy is employed to compute the following evaluation measures:

Accuracy: It is a measure used to evaluate the model effectiveness for identification of correct classifications:

$Accuracy = \frac{T P + T N}{T P + T N + F P + F N}$

(34)

where $T P$ and $T N$ are true positives and negatives. respectively, whereas, $F P$ and $F N$ are false positives and negatives, respectively.
Sensitivity: It is used to evaluate the ability of a classifier to identify correct positive samples:

$Sensitivity = \frac{T P}{T P + F N}$

(35)
Specificity: It is used to check the model performance on the identification of negative samples in the dataset:

$Specificity = \frac{T N}{T N + F P}$

(36)
The F-Score: It is a weighted harmonic mean of precision and recall having a range between 0 and 1:

$\begin{matrix} Precision = \frac{T P}{T P + F P} \\ Recall = \frac{T P}{T P + F N} \\ F - Score = \frac{2 \times Precision \times Recall}{Precision \times Recall} \end{matrix}$

(37)
ROC Curve and Area Under the Curve (AUC): An ROC curve is one of the effective parameters to indicate model performance. It is the ratio between the false and true positive rates. The AUC value is associated with the ROC and is used to assess the overall classification performance [33].

4. Results

The performance of the proposed FF-PCA-LDA approach is evaluated on the benchmark dataset Section 3.1. The experimental setup involves a Core i5 PC equipped with 16 GB RAM, along with an NVIDIA GeForce GTX 1050 TI GPU and Matlab 2020(a). Experiments are performed to determine the capability of feature fusion along with PCA-based feature reduction followed by LDA classification using ordinary RGB potato leaf images. Figure 5a,b show the graphical representation of PC1 and PC2 of the handcrafted and deep features, respectively. The feature fusion visualization with respect to its first and second principal components, PC1 and PC2, is shown in Figure 5c. The performance evaluations in terms of sensitivity, specificity, and F-Score are shown in Figure 6. Class-wise ROC curves of the developed approach are depicted in Figure 7.

Table 1 shows the classification performance of the LDA on 437 handcrafted features, achieving an accuracy of 92.21%. The feature dimension is reduced by employing PCA, and the top 100 PCs (principal components) are utilized to build the LDA model. The

P C A - L D A_{H H}

classification model offers an average accuracy of 96.53%. Although the average accuracy of handcrafted-features-based PCA and LDA classifications is sufficiently high, among classes, we can observe that the Healthy class’s performance is relatively low.

Table 2 shows the LDA classification performance on deep features extracted by the modified ResNet50, which offered an average accuracy of 97.96%. Moreover, the PCA-based FF and LDA offer an average accuracy of 98.20%.

The comparison of the proposed approach with other classifiers is shown in Table 3. The developed approach performance is compared with published approaches in Table 4. The average AUC of 98.46% is obtained from the ROC curves (Figure 7).

5. Discussion

In smart agriculture, the automatic identification of plant leaf diseases is vital for yield, quality, and management control. It will not only reduce production costs but also reduce the excessive use of agro-chemicals. The advancement of digital technologies and machine learning algorithms may help to improve crop disease diagnosis with high precision.

Deep learning is being used to solve real-world problems efficiently in medical imaging, computer vision, and object recognition. It is gaining the attention of researchers to use it for smart agriculture. It reduces the problems of conventional approaches such as user intervention, threshold and kernel size selection, and optimal feature extraction for segmentation and classification. These limitations make systems complex, and even a small variation in image acquisition or in parameter selection may reduce classification performance.

To address these challenges, a novel approach is developed by the fusion of handcrafted hybrid and deep features followed by PCA-LDA (FF-PCA-LDA) classification. As a case study, we have considered a potato plant leaf disease identification problem using benchmark dataset. Classification is performed on images acquired by an ordinary camera without performing segmentation and data oversampling steps. Experiments are conducted to validate the efficacy of the proposed approach.

In this research, for the performance comparison of the proposed approach, we designed following four types of models:

$L D A_{H H}$ model using all handcrafted hybrid features without PCA;
$P C A - L D A_{H H}$ model using only handcrafted hybrid (HH) features;
$T L - R e s N e t 50$ deep learning model without PCA-LDA;
$P C A - L D A_{D F}$ model using only Deep features (DF);
$P C A - L D A_{F F}$ model using only fused features.

It has been observed from Table 1 that handcrafted hybrid features based

L D A_{H H}

models offer an average accuracy of 92.21%. However, it is possible that some features might be redundant and less contributing in model development. PCA has been employed for dimension reduction, and the most important 100 PCs are selected and fed to LDA for the development of the model.

P C A - L D A_{H H}

offers improved average accuracy by 4.32%. This indicates the existence of redundant and less important features, which may not have contributed in classification. The visual representation of features using PC1 and PC2 is shown in Figure 5a. It can be observed that except for the ‘Late Blight’ class, the other class features have low intraclass similarity.

Although the

P C A - L D A_{H H}

model has improved, the average performance for potato disease detection. However, if we considered other crops, this case may not be true because every crop’s leaves have their own feature representation. The current parameters setting of handcrafted hybrid features for potato crop may not offer such high performance for any other plant disease. Hence, it is highly desired of an automatic feature extraction technique to classify crop leaf images with high precision.

To overcome the problems of handcrafted hybrid features mentioned previously, we develop TL-ResNet50 deep learning model for the classification of potato crop leaf diseases. From Table 1 and Table 2, it is inferred that TL-ResNet50 model improves accuracy compared to

L D A_{H H}

and

P A C - L D A_{H H}

models. Furthermore, we extracted deep features by employing the modified ResNet50 deep learning model. It automatically mimicked and extracted the most dominant image features at the ‘avg_pool’ layer of the network, as shown in Figure 4.

P C A - L D A_{D F}

classification is performed on deep features, and an average accuracy of 97.96% is achieved for potato crop leaf disease identification, as shown in Table 2. The high average accuracy indicates that TL-ResNet50 has successfully learned the image features followed by efficient LDA classification. It improved performance by 6% compared to the

L D A_{H H}

model, as shown in Table 1. Visually, Figure 5b shows the low interclass scatter of the deep features using the first two principal components, i.e., PC1 and PC2.

In this research, the fusion of handcrafted hybrid and deep features is performed, followed by a dimension reduction using PCA and plant disease classification with LDA. A fused feature vector of dimension 2485 for every input image is obtained and fed to PCA for dimension reduction. Visual representation of reduced features with PC1 and PC2 is shown in Figure 5c. It is observed that PCA has efficiently mapped the features with low intraclass and high interclass separation. An average accuracy of 98.20% is achieved by the

P C A - L D A_{F F}

model for the identification of potato leaf diseases, as shown in Table 2. Sufficient improvement is observed over handcrafted hybrid and deep-features-based classification models.

The average performance offered by the developed approach in terms of sensitivity, specificity, and F-score is 98.79%, 92.67%, and 95.33%, respectively, and shown in Figure 6. It is observed that

P C A - L D A_{F F}

successfully identified all samples of ‘Healthy’ and ‘Early Blight’ classes with 100% sensitivity values, with 96.0% for ‘Late Blight’ class. The high values of these performance parameters indicate the effectiveness of the FF-PCA-LDA approach.

Overall, it is deduced that the feature-fusion-based

P C A - L D A_{F F}

model enhanced the classification accuracy by 1.45% over the TL-ResNet50 model for potato crop leaf diseases. Similarly, the proposed

P C A - L D A_{F F}

model also performed better over the handcrafted-hybrid-based

P C A - L D A_{H H}

and

L D A_{H H}

models and the deep-features-based

P C A - L D A_{D F}

models. This fact can be observed from Table 1 and Table 2. One of the objectives of this study is to improve the performance of the proposed FF-PCA-LDA crop leaf disease diagnosis system. It is achieved through the fusion of deep and handcrafted hybrid features followed by PCA-LDA classification.

Similarly, class-wise performance in terms of ROC curves of potato crop leaf diseases are shown in Figure 7. ROC curves for all classes are very close to the vertical axis, which depicts the efficacy of the developed model. The average value of the AUC is obtained as 98.46% by the FF-PCA-LDA approach. The high value of the AUC shows the usefulness of the approach; hence, it can be used for plant leaf disease classification models. For comparison purposes, we have developed various classifiers such as MLBPNN, KNN, and PNN using a five-fold cross validation technique on the same dataset. The performance of the proposed approach with different classifiers is presented in Table 3. The analysis shows that the proposed approach offers superior performance compared to other classifiers.

Moreover, the developed model is compared with some of the existing potato leaf disease detection techniques, as shown in Table 4. The proposed FF-PCA-LDA approach offers high performance as compared to other state-of-the-art techniques. The high performance is evident because the fused features are used to design the

P C A - L D A_{F F}

classification model.

In this study, we have developed an efficient approach using feature fusion followed by

P C A - L D A_{F F}

classification and achieved 98.20% accuracy for Potato leaf disease detection. It does not required any specialized image acquisition equipment. RGB images are utilized that are acquired by a smartphones or an ordinary camera to monitor crop health. The developed approach works without any user intervention for parameter selection compared to segmentation-based techniques.The proposed approach can be extended for the development of mobile or web apps, and growers can obtain benefits for accurate and swift crop health monitoring. It is an automatic, straightforward, cost-effective, efficient, scalable, and reproducible approach. It may be extended for leaf disease diagnosis of other crops such as sugar beet, cotton, tomato, and apple.

6. Conclusions and Future Directions

In this paper, we developed a novel FF-PCA-LDA system for crop leaf disease detection based on feature fusion, reduction by PCA, and followed by LDA classification. Deep features are extracted using a modified ResNet50 and combined with handcrafted hybrid image features. A considerable improvement of 6% in the average accuracy has been observed over the handcrafted-hybrid-features-based models. The developed approach offers average accuracy, sensitivity, specificity, F-Score, and AUC values of 98.21%, 98.79%, 92.67%, 95.33%, and 98.46%, respectively. It outperformed in comparison to the MLBPNN, KNN, and PNN classifiers on the same dataset. These improvements are due to feature fusion and dimension reduction followed by LDA classification. Performance indicators show that the developed approach can be used for crop leaf disease detection. It can be extended for other crop disease identification. Moreover, it might be used as SaaS for the development of mobile or web apps and available around the clock for better crop health monitoring and management.

Author Contributions

S.A. contributed to the model development, data analysis and interpretation, and manuscript writeup. M.H. designed the study, idea development, conducted experiments, and manuscript writeup. J.Y.K. performed data analysis and study design validation. M.I.F., M.S. and H.M. helped in the discussion of the manuscript and data analysis. The manuscript was revised by all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the BK21 FOUR Program (Fostering Outstanding Universities for Research, 5199991714138) funded by the Ministry of Education (MOE, Korea) and the National Research Foundation of Korea (NRF).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and code are available for further study. Please contact the corresponding authors for the computer code and data.

Conflicts of Interest

The authors declare that they have no conflict of interest. No funding agency has been involved to conduct the experiments.

References

Haverkort, A.; Boonekamp, P.; Hutten, R.; Jacobsen, E.; Lotz, L.; Kessel, G.; Visser, R.; Van der Vossen, E. Societal costs of late blight in potato and prospects of durable resistance through cisgenic modification. Potato Res. 2008, 51, 47–57. [Google Scholar] [CrossRef]
Vibhute, A.; Bodhe, S. Applications of image processing in agriculture: A survey. Int. J. Comput. Appl. 2012, 52, 34–40. [Google Scholar] [CrossRef]
Guenthner, J.; Michael, K.; Nolte, P. The economic impact of potato late blight on US growers. Potato Res. 2001, 44, 121–125. [Google Scholar] [CrossRef]
Yellareddygari, S.; Taylor, R.J.; Pasche, J.S.; Zhang, A.; Gudmestad, N.C. Predicting potato tuber yield loss due to early blight severity in the Midwestern United States. Eur. J. Plant Pathol. 2018, 152, 71–79. [Google Scholar] [CrossRef]
Wahabzada, M.; Mahlein, A.K.; Bauckhage, C.; Steiner, U.; Oerke, E.C.; Kersting, K. Plant phenotyping using probabilistic topic models: Uncovering the hyperspectral language of plants. Sci. Rep. 2016, 6, 22482. [Google Scholar] [CrossRef] [Green Version]
Nejat, N.; Vadamalai, G. Diagnostic techniques for detection of phytoplasma diseases: Past and present. J. Plant Dis. Prot. 2013, 120, 16–25. [Google Scholar] [CrossRef]
Thomas, S.; Kuska, M.T.; Bohnenkamp, D.; Brugger, A.; Alisaac, E.; Wahabzada, M.; Behmann, J.; Mahlein, A.K. Benefits of hyperspectral imaging for plant disease detection and plant protection: A technical perspective. J. Plant Dis. Prot. 2018, 125, 5–20. [Google Scholar] [CrossRef]
Patil, S.B.; Bodhe, S.K. Leaf disease severity measurement using image processing. Int. J. Eng. Technol. 2011, 3, 297–301. [Google Scholar]
Kamlapurkar, S.R. Detection of Plant Leaf Disease Using Image Processing Approach. Int. J. Sci. Res. Publ. 2016, 6, 73–76. [Google Scholar]
Revathi, P.; Hemalatha, M. Classification of cotton leaf spot diseases using image processing edge detection techniques. In Proceedings of the International Conference on Emerging Trends in Science, Engineering and Technology (INCOSET), Tiruchirappalli, India, 13–14 December 2012; pp. 169–173. [Google Scholar]
Tian, Y.; Wang, L.; Zhou, Q. Grading method of Crop disease based on Image Processing. In Proceedings of the International Conference on Computer and Computing Technologies in Agriculture, Nanchang, China, 22–25 October 2011; pp. 427–433. [Google Scholar]
Hu, Y.; Ping, X.; Xu, M.; Shan, W.; He, Y. Detection of Late Blight Disease on Potato Leaves Using Hyperspectral Imaging Technique. Guang Pu Xue Yu Guang Pu Fen Xi = Guang Pu 2016, 36, 515–519. [Google Scholar]
Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 1–18. [Google Scholar] [CrossRef] [PubMed]
Hassan, M.; Ali, S.; Alquhayz, H.; Safdar, K. Developing intelligent medical image modality classification system using deep transfer learning and LDA. Sci. Rep. 2020, 10, 12868. [Google Scholar] [CrossRef] [PubMed]
Ali, S.; Hassan, M.; Saleem, M.; Tahir, S.F. Deep transfer learning based hepatitis B virus diagnosis using spectroscopic images. Int. J. Imaging Syst. Technol. 2021, 31, 94–105. [Google Scholar] [CrossRef]
Lai, Z.; Deng, H. Medical image classification based on deep features extracted by deep model and statistic feature fusion with multilayer perceptron. Comput. Intell. Neurosci. 2018, 2018, 2061516. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Selvaraj, M.G.; Vergara, A.; Ruiz, H.; Safari, N.; Elayabalan, S.; Ocimati, W.; Blomme, G. AI-powered banana diseases and pest detection. Plant Methods 2019, 15, 92. [Google Scholar] [CrossRef]
Zeng, W.; Li, M. Crop leaf disease recognition based on Self-Attention convolutional neural network. Comput. Electron. Agric. 2020, 172, 105341. [Google Scholar] [CrossRef]
Saleem, M.H.; Khanchi, S.; Potgieter, J.; Arif, K.M. Image-based plant disease identification by deep learning meta-architectures. Plants 2020, 9, 1451. [Google Scholar] [CrossRef]
Zhong, Y.; Zhao, M. Research on deep learning in apple leaf disease recognition. Comput. Electron. Agric. 2020, 168, 105146. [Google Scholar] [CrossRef]
Sharma, P.; Berwal, Y.P.S.; Ghai, W. Performance analysis of deep learning CNN models for disease detection in plants using image segmentation. Inf. Process. Agric. 2020, 7, 566–574. [Google Scholar] [CrossRef]
Saleem, M.H.; Potgieter, J.; Arif, K.M. Plant disease classification: A comparative evaluation of convolutional neural networks and deep learning optimizers. Plants 2020, 9, 1319. [Google Scholar] [CrossRef]
Sujatha, R.; Chatterjee, J.M.; Jhanjhi, N.; Brohi, S.N. Performance of deep learning vs machine learning in plant leaf disease detection. Microprocess. Microsyst. 2021, 80, 103615. [Google Scholar] [CrossRef]
Iscan, Z.; Yüksel, A.; Dokur, Z.; Korürek, M.; Ölmez, T. Medical image segmentation with transform and moment based features and incremental supervised neural network. Digit. Signal Process. 2009, 19, 890–901. [Google Scholar] [CrossRef]
Hassan, M.; Chaudhry, A.; Khan, A.; Kim, J.Y. Carotid artery image segmentation using modified spatial fuzzy c-means and ensemble clustering. Comput. Methods Programs Biomed. 2012, 108, 1261–1276. [Google Scholar] [CrossRef] [PubMed]
Ojala, T.; Pietikainen, M.; Maenpaa, T. Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 971–987. [Google Scholar] [CrossRef]
Huang, D.; Shan, C.; Ardabilian, M.; Wang, Y.; Chen, L. Local binary patterns and its application to facial image analysis: A survey. IEEE Trans. Syst. Man Cybern. Part C 2011, 41, 765–781. [Google Scholar] [CrossRef] [Green Version]
Dalal, N.; Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Diego, CA, USA, 21–23 September 2005; Volume 1, pp. 886–893. [Google Scholar]
Du, J.X.; Wang, X.F.; Zhang, G.J. Leaf shape based plant species recognition. Appl. Math. Comput. 2007, 185, 883–893. [Google Scholar] [CrossRef]
Tahir, M. Pattern analysis of protein images from fluorescence microscopy using Gray Level Co-occurrence Matrix. J. King Saud Univ. Sci. 2018, 30, 29–40. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Amin, A.; Ghouri, N.; Ali, S.; Ahmed, M.; Saleem, M.; Qazi, J. Identification of new spectral signatures associated with dengue virus infected sera. J. Raman Spectrosc. 2017, 48, 705–710. [Google Scholar] [CrossRef] [Green Version]
Heagerty, P.J.; Lumley, T.; Pepe, M.S. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000, 56, 337–344. [Google Scholar] [CrossRef]
Islam, M.; Dinh, A.; Wahid, K.; Bhowmik, P. Detection of potato diseases using image segmentation and multiclass support vector machine. In Proceedings of the IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE), Windsor, ON, Canada, 30 April–3 May 2017; pp. 1–4. [Google Scholar]
Athanikar, G.; Badar, P. Potato leaf diseases detection and classification system. Int. J. Comp. Sci. Mob. Comput. 2016, 5, 76–78. [Google Scholar]

Figure 1. Sample images of potato leaf dataset: (a,d) Healthy, (b,e) Late Blight, and (c,f) Early Blight.

Figure 2. Block diagram of the proposed ‘FF-PCA-LDA’ system for plant leaf disease classification.

Figure 3. The radius (R) and its corresponding pixels.

Figure 4. Modified ResNet50 architecture for Deep feature extraction.

Figure 5. Representation of class discrimination of various feature types by PCs.

Figure 6. Performance evaluation of the proposed approach on test dataset.

Figure 7. Class-wise ROC curves of the proposed FF-PCA-LDA approach.

Table 1. Classification accuracy of handcrafted hybrid (HH)-features-based models for potato leaf disease identification.

	Accuracy (%)
Class/Model	${LDA}_{HH}$	$PCA - {LDA}_{HH}$
Healthy	89.26	96.20
Early Blight	93.43	98.20
Late Blight	90.96	95.20
Average	92.21	96.53

Table 2. Performance classification accuracy (%) of TL-ResNet50,

P C A - L D A_{D F}

and

P C A - L D A_{F F}

models using Transfer learning, deep features and fused features, respectively.

Table 2. Performance classification accuracy (%) of TL-ResNet50,

P C A - L D A_{D F}

and

P C A - L D A_{F F}

models using Transfer learning, deep features and fused features, respectively.

Class/Model	TL-ResNet50	$PCA - {LDA}_{DF}$	Proposed $PCA - {LDA}_{FF}$
Healthy	96.76	98.53	98.20
Early blight	96.51	94.73	99.10
Late blight	96.26	97.63	97.31
Average	96.51	97.96	98.20

Table 3. Performance comparison of the proposed approach with other classifiers.

Classifiers	Accuracy (%)
MLBPNN (Hidden units = 50)
Multi-layer back-propagation neural networks	72.00
KNN (with k = 3) K-nearest neighbor	76.90
PNN (probabilistic neural networks)	71.25
The proposed FF-PCA-LDA approach	98.20

Table 4. Performance comparison of the proposed and existing approaches for Potato leaf disease classification.

Techniques	Description	Accuracy (%)
Islam et al. [34]	Segmentation-based technique	95.00
Hu YH et al. [12]	Hyper-spectral imaging and segmentation	94.87
Athanikar et al. [35]	Segmentation-based technique	94.00
The proposed approach	Feature fusion, PCA and LDA	98.20

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ali, S.; Hassan, M.; Kim, J.Y.; Farid, M.I.; Sanaullah, M.; Mufti, H. FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases. Appl. Sci. 2022, 12, 3514. https://doi.org/10.3390/app12073514

AMA Style

Ali S, Hassan M, Kim JY, Farid MI, Sanaullah M, Mufti H. FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases. Applied Sciences. 2022; 12(7):3514. https://doi.org/10.3390/app12073514

Chicago/Turabian Style

Ali, Safdar, Mehdi Hassan, Jin Young Kim, Muhammad Imran Farid, Muhammad Sanaullah, and Hareem Mufti. 2022. "FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases" Applied Sciences 12, no. 7: 3514. https://doi.org/10.3390/app12073514

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

FF-PCA-LDA: Intelligent Feature Fusion Based PCA-LDA Classification System for Plant Leaf Diseases

Abstract

1. Introduction

2. Related Work

3. Material and Methods

3.1. Material

3.2. Methods

3.3. Handcrafted Feature Extraction

3.3.1. Moments of Gray-Level Histogram

3.3.2. Local Binary Patterns

3.3.3. Histogram of Oriented Gradients

3.3.4. Gray Level Co-Occurrence Matrix

3.4. Deep Image Feature Extraction

3.5. Feature Fusion

3.6. Dimension Reduction Using PCA

3.7. PCA-LDA Model Development

3.8. Model Testing

Evaluation Measures

4. Results

5. Discussion

6. Conclusions and Future Directions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI