The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis

Xing, Bingqian; Wang, Dian; Yin, Tianzhen

doi:10.3390/f14071290

Open AccessArticle

The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis

by

Bingqian Xing

¹,

Dian Wang

^1,*

and

Tianzhen Yin

²

¹

School of Technology, Beijing Forestry University, Beijing 100083, China

²

National R&D Center for Agro-Processing Equipment, College of Engineering, China Agricultural University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(7), 1290; https://doi.org/10.3390/f14071290

Submission received: 13 May 2023 / Revised: 12 June 2023 / Accepted: 16 June 2023 / Published: 22 June 2023

(This article belongs to the Section Forest Health)

Download

Browse Figures

Versions Notes

Abstract

Extensive research suggested that the core of how to use pesticides scientifically is the careful and accurate determination of the severity of crop diseases. The existing grading standards of plant leaf diseases have been excessively singular. Thus, the diseases roughly fall into general and severe grades. To address the above problems, this study considered the effect of the distribution of disease spots, and two evaluation indicators (termed the imbalance degree and main vein distance) were newly added to optimize the grading criteria of apple leaf diseases. Combined with other factors, the grade evaluation indicator was determined through PCA principal component analysis. A gradual multivariate logistic regression algorithm was proposed to evaluate apple leaf disease grade and an optimized apple leaf disease grade evaluation model was built through PCA-logistic regression analysis. In addition, three common apple leaf diseases with a total of 4500 pictures (i.e., black rot, scab, and rust) were selected from several open-source datasets as the subjects of this paper. The object detection algorithm was then used to verify the effectiveness of the new model. As indicated by the results, it can be seen from the loss curve that the loss rate reaches a stable range of around 70 at the epoch. Compared with Faster R-CNN, the average accuracy of Mask R-CNN for the type and grade recognition of apple leaf disease was optimized by 4.91%, and the average recall rate was increased by 5.19%. The average accuracy of the optimized apple leaf disease grade evaluation model was 90.12%, marking an overall increase of 20.48%. Thus, the effectiveness of the new model was confirmed.

Keywords:

apple leaf diseases; ratio of leaf spot area to total leaf area; distribution of disease spots; grading standards

1. Introduction

China has a history of over 2000 years. Apple is one of the vital fruit crops in China, and the area of cultivation has been expanding on a year-to-year basis [1,2]. From 1978 to 2021, China’s annual apple production has risen from 2,275,200 tons to 45,973,400 tons [3]. With the upgrading of the conventional apple industry, China is a major source of energy for the apple industry, currently ranking first worldwide. Apple disease control has aroused significant attention from farmers who usually identify apple diseases using costly and ineffective methods (i.e., following experience, books, the internet, and expert advice) [4]. Accordingly, automatic detection methods should be adopted to identify disease types and levels more accurately in apple production [5,6,7,8]. The method can be employed to monitor apple health in the long term and effectively assess apple diseases, such that subsequent timely control can be facilitated, and economic efficiency can be increased [9].

In modern agriculture, monitoring based on plant leaf spot characteristics has become a research hotspot for automatic plant disease monitoring [10,11,12,13,14,15]. In 2009, Dae Gwan Kim, Thomas F. Burks, Jianwei Qin [16], Savita N. Ghaiwat, Parul Arora, et al. [17] suggested that tools (e.g., probabilistic neural networks, principal component analysis, artificial neural networks, and fuzzy logic) have been adopted to build leaf classification models. With the advancing technology in computer vision, deep learning has been employed more often for disease detection for its convenience and high accuracy (e.g., cassava [18], potato [19], and apple [20]). Tobias Bauma, Aura Navarro-Quezada, Wolfgang Knogge, et al. [21] classified fungal colonies using Genetic algorithms. Moreover, support vector machines, k-nearest neighbor algorithms, artificial neural networks, and decision trees have been extensively employed for addressing classification problems [22,23,24,25].

Convolutional neural networks exhibit strong feature learning capability and prominent generalization ability for multi-classification tasks. Since convolutional neural networks have been adopted to solve classification tasks, several classical classification networks have been developed (e.g., Alexnet [26], Image net [27], and Vgg16 [28]). Novel networks are constantly proposed to increase the classification accuracy of the net. The Internet hardware configuration bars have been poor in remote planting areas of China. Thus, the complexity of the net and the size of the model are involved in the selection criteria of the network. As novel networks are proposed one after another, the accuracy of the network is rising, while the number of layers of the network and that of parameters are also increasing. In addition, difficulties exist in deploying the above-described high-performance networks in real-life applications. For instance, VGG16 covers nearly 138 million parameters, whereas considerable network parameters hinder their application. Chowdhury R. Rahman, Preetom S. Arko, Mohammed E. Ali, et al. [29] proposed an optimized lightweight model for the model extension of rice pest identification.

A numerical model refers to a mathematical structure expressed in general or approximate terms through quantitative analysis using mathematical language regarding the characteristics or quantitative dependence of a system of things [30,31,32,33,34]. The occurrence of crop diseases refers to a series of qualitative morphological, physiological, and biochemical pathological changes under the effects of biotic or abiotic factors. The numerical model construction of disease is elucidated as follows: the qualitative practical problems of leaf disease occurrence are converted into corresponding quantitative mathematical problems, and the weighting ratios of various factors leading to disease occurrence are analyzed, and then the results are returned to the production reality. Regression analysis in mathematical and statistical models exhibits the advantages of solving a wide range of problems, requiring a moderate amount of data, and being easy to operate. Furthermore, it is most generally adopted to predict a wide variety of diseases [35,36,37,38].

Wei, Yingwen, et al. [39] developed a model to predict the incidence of cyanobacteria by analyzing soil and meteorological factors in Leizhou, Guangdong, based on a gray system and multiple regression methods. Shi Mingwang et al. [40] suggested that the gray system forecasting model exhibits higher accuracy than that of the multiple linear regression model through the statistics of climatic conditions and the incidence of cyanobacteria in the Wuchuan area. Pan, Haize, Huang, Yuanchun, et al. [41] employed mathematical methods (e.g., the Marxian distance method) to build a tunnel seepage and water leakage disease grade evaluation model. Zhao and Changzhou [42], in 2018, analyzed the degree of effect of different meteorological factors (e.g., atmospheric temperature, atmospheric humidity, and sunshine hours) on disease prevalence using a multiple regression method. They suggested that the above-mentioned factors are significantly correlated with the development of disease. Afterward, they built a prediction model of disease and elucidated the respective statistical analysis indicator of the model to determine the theoretical basis of this model.

To alleviate the harm arising from diseases, spraying pesticides has been confirmed as one of the fastest and most effective ways to prevent diseases [43,44,45]. In addition, the scientific use of pesticide dosage following the severity of the disease is a problem that should be solved. The solution to the above problem can alleviate the harm arising from drug sites and drug residues. The core problem of solving the above problems is how to accurately determine the degree of crop disease promptly and on time. From a traditional perspective, trained experts can assess the severity of plant diseases by visual inspection. However, the expensive labor and time costs do not allow for the inconvenience of disease monitoring [46,47,48,49].

With powdery mildew and stripe rust of wheat as the research object, Wenxia Bao, Jian Zhao, Gensheng Hu, [50] et al. proposed an algorithm to identify wheat leaf diseases and their severity through elliptic maximum edge criterion (E-MMC) metric learning using a combination of elliptic metric and maximum spacing criterion to indicate the nonlinear transformation of a spatial structure or semantic information of wheat leaf disease images. Tao Fang, Peng Chen, Jun Zhang, et al. [51] determined the ratio of the number of pixels in the diseased area to the number of pixels in the diseased leaf area. They incorporated the ratio as the classification threshold of disease classes into the convolutional neural network Resnet-50 to classify ten diseases of eight plants.

In the above-mentioned research, the disease spot area ratio was a single threshold to judge the disease class classification. Good results were achieved; however, the method has some limitations. For instance, leaves are missing leaf edges and areas at the late stage of disease infection under the effect of the fusion of multiple spots, such that inaccurate spot area ratio values are obtained. Consequently, disease classes are inaccurately judged, i.e., a certain correlation exists between leaf mutilation rates and disease classes. The relationship may vary for different crops, different tree species, or different diseases. Thus, the assessment and judgment should be combined with specific situations in practical applications.

M.Bertalmio, L. Vese, G. Sapiro, et al. [52] proposed an algorithm to fill both texture and structure in regions with missing image information. The basic idea is elucidated as follows. The image is first decomposed into the sum of two functions with different basic features, and then each function is reconstructed separately using structure and texture-filling algorithms. Jian Sun, Lu Yuan, and Jiaya Jia [53] presented a method to automatically generate synthetic patches of unknown regions in the input image. The method generates synthetic patches from known regions in the input image. Thus, one or more curves are generated to provide missing structures to the unknown regions. Deepak Pathak, Philipp Krahenbuhl, and Jeff Donahue [54] yielded an unsupervised visual feature learning algorithm based on contextual pixel prediction using the surrounding image information. The aim of this algorithm is to infer the image information of the lost part. Chao Yang, Xin Lu, Zhe Lin, et al. [55] proposed a multi-scale CNN matching method combining image content and texture. The proposed method can preserve the texture structure and generate high-frequency details by extracting the intermediate layer of the classification network. Thus, the problem of high-definition image complementation can be addressed to a certain extent. Haoyun Wang and Haihong Xiao et al. [56] proposed a leaf-shape complementation network based on a multi-scale feature extraction module combined with a point cloud pyramid decoder (MSF-PPD) to achieve global extraction and fusion and multi-stage generation of leaf point clouds for complementation in natural backgrounds (e.g., obscured green leaves).

The leaf veins are one of the important organs of plants, whose main function is to transport water and nutrients. The primary leaf veins (i.e., the main veins) are located at the center of the leaf and are the core of the leaf veins. As revealed by numerous studies, leaf veins cover the most critical physiological and genetic information regarding plants. The leaf veins, especially the primary veins, serve as highly valuable indicators to identify plant species and analyze the growth and development of plants. They take on critical significance in research on the classification of leaf diseases. Xin Cui [57] et al. investigated the spatial distribution pattern of black spot disease spots on leaves of four-season mallow in Changchun. As indicated by the results, the lower the number of spots per leaf, the greater the relative clustering indicator will be. Using diffusion coefficient C and other methods, Tang Xiaoqin [58] studied the spatial distribution pattern of rust disease on the leaves of apple trees in Gongbu Nature Reserve in southeastern Tibet using apple trees. Liu, Zeyong, and Zhang et al. [59] investigated the distribution of ash narrow girdling on ash trees with different degrees of damage over the vertical height of the tree. Their result suggested that the degree of aggregation of spots, i.e., the distribution of spots on the leaves can, exert a certain effect on the judgment of the disease grade. Q Xiang [60] used the fruit image classification method of MobileNetV2, a lightweight neural network based on transfer learning technology, to recognize fruit images. Amarasingam N [61] evaluated the performance of the existing DL models such as YOLOv5, YOLOR, DETR, and Faster R-CNN to recognize WLD in sugarcane crops. Albarrak K [62] used convolutional neural networks to identify and classify date fruits through deep learning models. Gulzar Y [63] proposed a classification system for seeds by employing CNN and transfer learning, which contains a model that classifies 14 commonly known seeds with the application of advanced deep learning techniques.

In brief, given the effect of the distribution of disease spots in the leaves on the determination of apple leaf disease severity, this study placed a focus on how to optimize the grading criteria of apple leaf diseases and built a novel grading evaluation model. Based on the target detection algorithm and convolutional neural network, an apple leaf disease grading evaluation method was proposed in this study based on PCA-logistic regression analysis by integrating the conventional disease spot area ratio and the newly added evaluation indicators through statistical correlation analysis. Furthermore, an optimized novel model was built for apple leaf disease severity identification.

2. Materials and Methods

This section mainly introduces the experimental design, data collection, and data preprocessing of the study, as well as the definition of grade evaluation indicators and apple leaf disease grade evaluation criteria.

2.1. Study Design

This study aimed to use the PCA-logistic regression analysis method to establish a multi-scale evaluation model for the severity of apple leaf diseases. The method used was divided into three main stages as follows. At the first stage, pre-processing work was performed on the dataset of three common apple leaf disease images using the conventional disease spot segmentation and data enhancement methods. At the second stage, SPSS 26.0 software was used to apply PCA principal component analysis to refine apple leaf disease rank evaluation indicators. In addition, stepwise multiple logistic regression analysis was applied to obtain optimized before and after apple leaf disease rank evaluation models. At the third stage, the evaluation indicator parameters of apple leaf diseases were identified using a target detection algorithm. Thus, the workload of manual measurement of disease characteristic parameters was reduced, and the identification accuracy was compared with that of the model with the disease spot area ratio as a single grading criterion. Details of the respective stages are presented as follows.

2.2. Data Collection and Data Processing

To ensure the authenticity of the experiments, the dataset used in the study was first taken from PlantVillage [64] (https://github.com/spMohanty/PlantVillage-Dataset, accessed on 1 May 2023), a public dataset that has been often used due to the large number of images in the dataset and the single laboratory background, as well as the classification of the dataset into general and severe grade labels based on disease severity. It has been commonly employed in the field of crop pest and disease severity identification. In this study, a dataset containing three common leaf diseases of apples (e.g., black rot, scab, and rust) was selected as the original dataset for this experiment.

First, the three diseases in the original dataset were meticulously classified into four classes, 1–4, based on expertise in plant pathology through the identification of botanical experts, which were the severity of four apple leaf diseases, i.e., mild, average, moderate, and severe diseases. Due to the small number of leaves at grade 4 in this dataset, this study selected serious apple leaf disease pictures from the PlantDoc [65] public dataset for data augmentation. The number of samples for grade 4 diseases was adopted for completing the sample dataset.

Since leaves of each disease grade exhibit uneven numbers, the dataset was folded by flipping and other methods after grading to ensure that there were four disease grades of each of the three apple leaf disease pictures in the dataset, with 1500 pictures of each disease leaf. In other words, the dataset used in this study has three types of apple leaf diseases, such as black rot, scab, and rust. Each disease is divided into four grades, each grade averages 375 pictures of apple leaf disease, and the dataset totals 4500 pictures. Subsequently, all pictures in the dataset were resized to 256 × 256 (Figure 1). In the subsequent experiments, 90% of the data randomly fell into the training set to fit the model, and 10% of the data served as the test set to evaluate the model performance and observe the convergence speed and change trend.

The real values of disease-spot-associated parameters should be labeled during data preprocessing, with the aims of facilitating the subsequent establishment of a rank evaluation model for apple leaf diseases and verifying the validity of the experiment. In this study, all 4500 images of the perfected sample dataset were segmented and labeled with the parameter values (Figure 2) using conventional preprocessing methods (e.g., HSV color threshold segmentation and smoothing denoising). The parameter values (with all units as pixel values) corresponding to each image can be determined (e.g., the area of the disease spots, the total area of the leaf, the distance between the disease spots, and the distance between the disease spots, as well as the main veins). Finally, the above process was summarized into an Excel table, preparing for follow-up research and subsequent statistical analysis.

2.3. Variables Selection

2.3.1. Definition of Spot Area Ratio

In general, the grade of plant foliar disease severity was examined using the ratio of leaf spot area to leaf area [50]. On that basis, for the respective leaf, the ratio of leaf spot area to total leaf area (RST) was calculated with the following. Where A1 denotes the area of the leaf spot; A represents the total area of the leaf; N1 expresses the pixel value of the leaf spot in the picture; N is the total pixel value of the leaf in the picture.

RST = \frac{A 1}{A} = \frac{N 1}{N}

(1)

Since the healthy part of a diseased apple leaf always remains largely intact whether or not it is mutilated, the spot area ratio can be determined based on the following variation when the spot area is not well identified, which denotes the number of pixel points to fit the healthy area of the leaf and denotes the number of pixel points fitted to the overall area of the leaf.

RST = 1 - \frac{Garea}{Marea}

(2)

2.3.2. Definition of Leaf Blade Mutilation

The blade mutilation rate is the percentage of the difference in the area of the blade before and after restoration. Since the proportion of mutilated leaves in the dataset was relatively small, given the uncontrollable factors (e.g., humans or the weather in real life), the root cause of leaf edge mutilation cannot be determined, such that only the black mutilated part on the green background of the leaf in the pixel map was considered. Given this type of data, to facilitate the calculation and summarize the analysis of the grade evaluation model, it was considered as the most severe grade of the disease spot, i.e., grade 4 disease for the study and analysis.

2.3.3. Definition of Imbalance Degree

Considerable research [57] has suggested that the distribution of the disease on the leaf will have an impact on the grading of the leaf disease. It is known that the diffusion coefficient is a basic indicator to test the spatial distribution of biological populations, and the spatial distribution type of the disease can be determined by the diffusion coefficient, as shown in the following equation, where S² is the variance of the data, and X is the mean value. When C = 1, it means the distribution is random (Poisson), C > 1 is aggregated distribution, and C < 1 is a uniform distribution.

C = \frac{S^{2}}{X}

(3)

The conventional spatial distribution indicator method describes the distribution of all disease spots on all leaves, and the main object of this study was the distribution of disease on a single leaf. Thus, a new grade evaluation indicator, imbalance, was added to facilitate the study of the degree of aggregation between different spots on the same leaf on the final disease grade evaluation weight ratio. Moreover, the grading criteria of imbalance were summarized with the diffusion coefficients, i.e., the grade was recorded as grade 1 when the imbalance exceeded 100 pixels, grade 2 when the imbalance was between 50 and 100 pixels, grade 3 when the imbalance was between 20 and 50 pixels, and grade 4 when the imbalance was less than 20 pixels. It should be noted that the distance unit expressed under the indicator was pixel value. In addition, for the accuracy of the parameter measurement, when the number of spots exceeded four, only the three spots with the top three area percentages were selected, i.e., this evaluation indicator had at most three-pixel values on the same leaf. The minimum straight line distance between the disease spots was selected in the actual measurement. Furthermore, the average value was taken for the respective group of three measurements, which was adopted to increase the accuracy to reduce the error.

2.3.4. Definition of Main Vein Distance

The proximity of the lesion to the main vein can affect the transport of water and nutrients by the main vein. Thus, the correlation between the location of the lesion and the main vein will affect the determination of the disease grade. In this study, the evaluation indicator of main vein distance was introduced, and the grading standard of main vein distance was summarized. To be specific, the grade was recorded as grade 1 under the main vein distance over 50 pixels, grade 2 under the main vein distance ranging from 30 to 50 pixels, grade 3 under the main vein distance from 15 to 30 pixels, and grade 4 under the main vein distance of less than 15 pixels. The imbalance grade was recorded as grade 3 under the distance between the main veins ranging between 15 and 30 pixels, and grade 4 was recorded under the distance between main veins of less than 15 pixels. The shortest vertical distance between the edge of the lesion and the main vein was selected. Moreover, only the top three lesions in terms of area were selected at the number of lesions over four, i.e., this evaluation indicator achieved a maximum of three-pixel values on the same leaf. The respective group was examined three times to take the average value in the actual measurement process, which was adopted to increase the accuracy to reduce the error.

2.3.5. Optimized Grading Standards

Table 1 lists the grading criteria of four grade evaluation indicators, i.e., spot area ratio, number of spots, unbalance degree, and main vein distance, for a single evaluation indicator under the condition of control variables. The number of apple leaf disease pictures with a spot area ratio over 20% in the dataset was very small and leaves with too large a spot area ratio cannot effectively be used to carry out subsequent control measures in the process of practical application. In this study, the problem of how to effectively and more carefully classify leaf disease grades with a spot area ratio within 25% was primarily addressed. Notably, under the distance between the spots of a 0 pixel value, if the leaf has only one spot, the imbalance was 0, and the single imbalance under the evaluation indicator of the leaf disease grade was recorded as 0 grade. Moreover, when the distance between the spot and the main vein was a 0 pixel value, the spot grew on the main vein. Subsequently, the leaf disease grade under the single main vein distance evaluation indicator was recorded as grade 4.

To be specific, RST represents the ratio of leaf spot area to total leaf area, NUM denotes the number of disease spots. Furthermore, IID expresses the indicators of imbalance degree and MVD is the main vein distance.

2.4. Model Development

Following classification criteria in combination with the original spot area ratio [10], two new evaluation indicators, imbalance degree and main vein distance, were added considering the effect of the number of spots, the distance between spots, and the distance between spots to the main veins on the classification of apple leaf disease classes. Accordingly, to build an apple leaf disease model that can finely distinguish different classes and improve the multi-scale class classification criteria of plant leaf disease severity, the following statistical analyses, including correlation analysis, quantitative ordinal analysis, principal component analysis, and stepwise multiple regression analysis, were conducted in this study using SPSS26.0 software.

2.4.1. Correlation Analysis

Correlation analysis [66] is a statistical analysis method that analyzes two or more elements of variables for measuring the closeness of correlation between two variable factors. It is also a statistical analysis method that does not consider the causal relationship between variables but only studies the correlation between the analyzed variables. A simple correlation analysis was first conducted, i.e., a judgment by the calculation of the correlation coefficient between two variables, regardless of their significant correlations. Under other variables in multivariate correlation analysis, the correlation coefficients of two variables revealed the nature of the two variables superficially. Moreover, these coefficients often do not truly reveal the degree of linear correlation between variables. It is imperative to conduct partial correlation analysis, i.e., to control other evaluation indicator variables, and only investigate the effect of a single grade evaluation indicator on apple leaf disease grade. The results are listed in the table below, suggesting the correlation matrix coefficients between the respective rank evaluation indicator and leaf disease rank.

Spearman’s correlation analysis was conducted between four grade evaluation indicators, i.e., single spot area ratio, number of spots, imbalance and main vein distance, and apple leaf disease grade. This coefficient value has been extensively used to measure the degree of correlation between two variables. Figure 3 presents the heat map of the correlation coefficient between indicators and grades. As indicated by the results, the significance values were less than 0.01. This result suggested a strong correlation between the indicators, i.e., the correlation was significant and passed the correlation test. Moreover, there were different degrees of correlation between the variables. To be specific, the strongest positive correlation existed between disease spot area ratio and leaf disease grade (correlation coefficient was 0.87). However, the weakest correlation existed between disease spot number and leaf disease grade (correlation coefficient was 0.51). Since the principal component analysis required a certain correlation between independent variables, all four grade evaluation indicators could be independent variables in subsequent modeling. In brief, the area ratio of leaf spots on the leaves was still the main factor leading to the occurrence of apple leaf diseases. It was followed by the imbalance (the aggregation distribution between spots), the main vein distance (the distribution of the distance between the spots and the main veins), and finally the number of spots on the leaves.

2.4.2. Quantitative Ordered Analysis

The common logistic regression analysis in statistics can be generally classified into 3 categories, which are binary logistic regression analysis, i.e., the study of the effect relationship between two sets of categorical data; multi-categorical logistic regression analysis, i.e., the study of the effect relationship between multiple and unordered categorical data and ordered logistic regression analysis, i.e., the study of the effect relationship between multiple and ordered.

This study focused on the correlation between apple leaf disease classes and the effect of classes under four single evaluation indicators: single spot area ratio, single spot number, single imbalance, and single main vein distance class, where all five variables were ordered multi-categorical variables. Ordered multi-categorical variables are very common forms of variables that usually have multiple possible values in the variable, and hierarchical relationships exist between the values. Unlike unordered multi-categorical variables, the options of ordered multi-categorical variables show a direct relationship of increasing or decreasing in one direction. Since the data variables of the four grades of evaluation indicators were the four grades of 1–4 of the disease, and the higher values revealed higher grades of disease severity, multivariate ordered logistic regression analysis was conducted in this study.

The output R-square, i.e., the grade of model fit, is listed in Table 2. The output contained three R-squared indicators, i.e., McFadden R-squared, Cox and Snell R-squared, and Nagelkerke R-squared. These indicators are pseudo-R-squared values; the larger the value, the better the model fit. The first two pseudo-R-squared values exceeded 0.8, suggesting that the model fit as expected for the experiment. The overall validity of this model was analyzed. To be specific, the model likelihood ratio test and the results suggested that the significance was p = 0.000 < 0.05, i.e., the original hypothesis was rejected. As revealed by the above result, the independent variables put in this construction of the model were valid, and this model construction took on certain significance.

After the specific analysis of the effect relationship, the results of the ordered logistic regression model analysis are summarized (Figure 4). As indicated by the results, the z-values were all positive. This result suggested that a positive correlation existed between the respective independent variable, i.e., the four grade evaluation indicators on the dependent variable of disease grade. In addition, the p-values can indicate the significant degree of the relationship. Moreover, the significance of a single number of disease spots at grade one and three, a single imbalance at grade four, and a single main vein distance at grade one coefficients exceeded 0.05 (Figure 4). The above result suggested that no significance was reported, i.e., the above-mentioned variables were not significantly correlated with the disease grade of apple leaves.

2.4.3. PCA Analysis

In pathology, the same disease is different in the color, texture, and shape of the spots on the leaves at different severity, i.e., different disease grades. Thus, this part of the study considered the number of spots and the newly added indicators, imbalance degree and main vein distance, based on which, the color and texture (HSV, contrast, entropy, angular second order moment, and inverse differential moment) and shape (spot area ratio) of the spots were considered, for a total of six grade evaluation indicators. Moreover, the PCA principal component analysis [67] was adopted to rank the importance of the above-described indicators on the degree of effect of apple leaf disease grading (Figure 5). Apple leaf disease pictures in the dataset had insufficient clarity. Given external factors (e.g., sunshine and shooting angles), the first four grade evaluation indicators with a cumulative contribution over 85% were included in the subsequent multivariate logistic regression modeling as the main ingredient.

2.4.4. Stepwise Multiple Logistic Regression Analysis

Regression analysis is a statistical analysis method to determine the quantitative interdependent relationship between two or more variables. This statistical method has been extensively employed. Regression analysis can fall into univariate regression analysis and multiple regression analysis following the number of independent variables involved. Based on the type of correlation between the independent and dependent variables, it can be divided into linear regression analysis and nonlinear regression analysis. If only one independent variable and one dependent variable are covered in the regression analysis, and the correlation between them can be approximated by a straight line, this regression analysis is termed univariate linear regression analysis. If two or more independent variables are included in the regression analysis, and there is a linear correlation between the dependent variable and the independent variable, this regression analysis is termed multiple linear regression analysis. The basic formula of the model is Y = b0 + b1 × 1 + b2X2 + b3X3 + b4X4 + … + bnXn + K, where b0 denotes a constant term, b1–bn represents the regression coefficient, and K expresses a constant. For the estimation of the regression parameters, the least squares estimation principle was generally adopted, i.e., to minimize the sum of squares of the residuals.

The stepwise multiple regression analysis method was adopted in this study to build the multiple regression analysis models for subsequent optimal selection and comparison of methods before and after optimization. Four multiple regression analysis models were built under the following conditions: the probability of the respective variable entering or being excluded from the model was less than 0.050; the probability of the X to be removed exceeded 0.100. Model 3 comprised the spot area ratio, number of spots, and imbalance. Model 4 covered the main vein distance based on Model 3. The details are listed in Table 3.

The Durbin–Watson test, i.e., the D–W test, has been the most used method to test autocorrelation first. When the DW value was significantly close to 0 or 4, autocorrelation was generated (positive or negative); under the DW value close to 2, no first-order autocorrelation was reported. The Durbin–Watson test value for model 4 was obtained as 1.826, and its value was close to 2. As revealed by the above result, the autocorrelation of the independent variables in the multiple linear regression model of this study was not significant, i.e., the model was well designed.

In the stepwise multiple logistic regression analysis, the adjusted R-squared was generally adopted to evaluate the regression analysis. The aim of this process was to avoid the overestimation of the R-squared by adding independent variables, i.e., the closer this value to 1, the better the model fit. The adjusted R-squared values of all four models exceeded 0.7 (Figure 6). The adjusted R-squared values of model 2, model 3, and model 4 tended to increase with the increase in the number of variables, and they could more effectively explain the stability coefficient. It is generally understood that the explanatory power of the multiple linear regression model was increased at an R-value over 0.8. As depicted in Figure 6, the adjusted R-squared value of the optimized model 4 reached the maximum. As revealed by this result, model 4 exhibited the highest quality of fit among the four models, whereas the standard estimation error declined with the increase of independent variables. This result suggested that the stepwise multiple logistic regression analysis conformed to the requirements.

Table 4 and Table 5 list the regression coefficient results and the statistics of the conventional grade evaluation model 1 before optimization and the optimized grade evaluation model 4, respectively. To be specific, the dependent variables of the models were all true grades of leaf diseases. Significance in the coefficient tables is capable of suggesting the degree of effect of the independent variable on the dependent variable. The significance test values in the tables were all less than 0.05, suggesting that the above four factors significantly affect apple leaf disease. VIF (inverse of tolerance) was adopted for covariance diagnosis (the degree of correlation between variables), and the tables were all consistent with 0 < VIF < 10. Accordingly, there existed no multicollinearity, and the expected criteria of the model were satisfied.

Table 5 presents the comparison of the standardized coefficient values of the variables in model 4, 0.626 > 0.241 > 0.137 > 0.062. As revealed by this comparison, among the four factors for apple leaf disease, the degree of effect of the spot area ratio was the largest, the imbalance, i.e., the distance between the spots, was the second, and the distance between the main veins, i.e., the distance between the spots and the main veins, was increased. The number of spots exerted the least influence. Moreover, as indicated by the non-standardized coefficients B in the above two tables, the specific expressions of the model relying only on a single indicator of disease spot area ratio for rank evaluation in the conventional method and the optimized model for rank evaluation of apple leaf diseases in this method are written as follows:

Y 1 = 0.812 * x 1 + 0.794

(4)

Y 4 = 0.595 * X 1 + 0.072 * X 2 + 0.244 * X 3 + 0.194 * X 4 - 0.160

(5)

where Y denotes the true rating of apple leaf diseases (Grade); X1 represents grades under the single spot area ratio indicator (RST); X2 expresses rating under the single spot count indicator (NUM); X3 denotes grades under a single imbalance metric (IID); X4 is the rank under the single main vein distance indicator (MVD).

Figure 7 presents the histogram and normal P–P plot of the optimized optimal model 4. In the figure, the dependent variables were all apple leaf disease grades, the histogram graph was very close to the normal distribution, and the scatter points were clustered near the diagonal. These results suggested that the apple leaf disease grade evaluation model under this method can satisfy the basic assumptions of the classical regression model. In other words, the random error terms conform to a normal distribution, and this multiple linear regression analysis is feasible.

2.5. Object Detection Algorithm

In the role of an object detection algorithm, this paper uses a convolutional neural network model to obtain the relevant parameters of apple leaf disease spots, including the type and coordinates of disease spots and the coordinate recognition of the main vein. Then, through these parameters brought into the formula of the grade evaluation index, parameter values such as the area ratio of the disease spot and the distance of the main pulse can be obtained. The proposed apple leaf disease detection and parameter measurement method based on Mask R-CNN can automatically measure parameters for quantitative analysis, thereby helping researchers obtain accurate leaf disease information in an efficient and simple way.

Since ordinary convolutional neural networks require more accurate prediction of multiple regions, which consumes a lot of computing time, this study directly chooses a more suitable R-CNN.

As is known, Mask R-CNN [68] is an instance segmentation algorithm that can be applied to scenarios (e.g., target detection, target instance segmentation, and target keypoint detection). The algorithm was based on the original Faster R-CNN algorithm with the addition of FCN to generate the corresponding MASK branch. In fact, it is a combination of object detection and semantic segmentation, thus achieving the optimal effect of instance segmentation. Accordingly, the image labeling tool LabelMe software was first used for the labeling of the dataset, i.e., the preliminary preparation work. The polygon tool was adopted to label the location of disease spots, leaf areas, and main veins of apple leaves. Three labels were set, rust (disease spot), vein (main vein complex), and break (stump), to facilitate the subsequent calculation of each indicator parameter using the algorithmic network (Figure 8).

After data enhancement, a total of 4500 images of apple leaf diseases were taken, most of which were used for model training, and a small amount of data were used for model testing, and the ratio of the training set to test set was about 9:1. Next, the pre-labeled dataset pictures were input into the target detection algorithm. This is shown in Figure 9. Then, all the parameter pixel values required under the four indicators are obtained, including the area of diseased spots on apple leaves (e.g., the incomplete parts on the leaves), the total number of leaves area, location and quantity of lesions, location of leaf main vein, etc. The values of RST, NUM, IID, and MVD were obtained through the formula of the grade evaluation indicator. At the same time, the network classified the types of apple leaf diseases. Finally, the parameter values of each grade indicator were substituted into the apple leaf disease grade evaluation models before and after optimization, i.e., Y1 and Y4, and then the recognition accuracy of the model before and after the grade evaluation method was optimized.

2.6. Experiment Setup

After the disease grades of apple leaves are recalibrated, the new dataset labeled using the LabelMe software was put into a convolutional neural network, and the specific network was run in the environment shown in Table 6 below.

2.7. Performance Metrics

A multi-classification task was performed in this study. To more effectively measure the performance of the model, the following metrics were selected: accuracy, recall, average accuracy, and average recall of the three diseases, with the following equations, where the average accuracy and recall denote the average of the four grades of accuracy and recall for the three diseases. To evaluate the above-mentioned metrics, several parameters were required, comprising true positives (TP), false positives (FP), true negatives (TN), false negatives (FN), and so forth. TP represents the correct classification of images in a given category. FP represents the number of misclassified images, TN expresses the sum of correctly classified images in all other categories, and FN denotes the number of misclassified images in the relevant category.

Accuracy = \frac{TP + TN}{TP + FP + TN + FN} = \frac{TP + TN}{P + N}

(6)

Recall = \frac{TP}{TP + FN} = \frac{TP}{P}

(7)

3. Results

To investigate the advantages of the Mask R-CNN neural network target detection model compared with other mainstream neural networks, the detection results of the Faster R-CNN algorithm were compared with its detection results for three apple leaf diseases using two networks. The accuracy, recall, average accuracy, and average recall for the three diseases are listed in Table 7.

The above table shows that the accuracy and recall of Mask R-CNN are higher than that of Faster R-CNN. The output results of the network are shown in Figure 10, which is represented as a comparison chart of the black rot input and output networks, in which the output results include the type of apple leaf disease, the coordinates of the disease spot and the recognition accuracy, the coordinates of the main vein of the leaf, and the recognition accuracy.

In the meantime, the training loss is plotted in Figure 11. This depicts the training loss value curve of Faster R-CNN and Mask R-CNN from depth images for disease detection in apple leaves. The local fluctuation of the curve is caused by repeatedly confirming the size of the bounding box to accurately depict the apple leaf disease spot from the depth image, but the overall downward trend of the curve indicates better convergence of the training.

Figure 12 presents the comparison results of one random selection. The horizontal coordinates of Figure 12 represent the picture numbers, and the vertical coordinates represent the four classes of apple leaf diseases. Moreover, the three curves represent the disease classes judged by the conventional method using a single spot area ratio, the optimized apple leaf disease classes, and the true apple leaf disease classes, respectively. The true disease classes were obtained by selecting 20 pathology professionals to examine the apple leaves in the dataset based on the color of the disease spots [69], their number [70] and texture [71], as well as other angles to judge the disease grade, recorded as true grade.

In addition, the accuracy results of apple leaf disease class recognition before and after optimization were obtained by substituting the values of the respective parameter from the Mask R-CNN output into the models Y1 and Y4 before and after optimization as shown in Table 8. Compared with a single criterion, the overall average of the optimized model recognition accuracy was increased by 20.48%. Thus, when multiple parameters were used to measure the degree of disease, multidimensional measurements can make the model judgment more effective, further validating the effectiveness of the apple leaf disease class evaluation method based on PCA-logistic regression analysis in this study.

4. Discussion

First, the statistical analysis of SPSS26.0 software indicated that there was a strong correlation between the four grade evaluation indicator variables of this study, such that the correlation passed the correlation test significantly. The multivariate ordered logistic regression analysis showed that the output pseudo-R-squared values all exceeded 0.8. This result suggested that the model fit of this study was good and met the expected experimental standards. The results of the four apple leaf grade evaluation indicators on the determination of the true grade of apple leaf diseases are discussed below. To be specific, the results comprise the disease spot area ratio, the number of disease spots, and two indicators considering the distribution of disease spots, the imbalance degree, and the main vein distance.

The correlation analysis indicated that the Spearman correlation coefficient between the spot area ratio and apple leaf disease grade was 0.87. As indicated by multiple regression analysis, the variable standardized coefficient value was 0.626. Moreover, the single RST had the largest weight of influence on the grade identification of apple leaf disease, with the highest significance. There was a high correlation between the RST and leaf grade. In general, the area ratio of conventional spots remained the main factor leading to the occurrence of apple leaf diseases. The Spearman correlation coefficient between the number of spots and apple leaf disease grade was 0.51, with a significance test value of 0.046. As revealed by multiple regression analysis (the variable standardized coefficient reaching 0.062), the NUM indicator exerted the least effect on grade identification. Furthermore, the first and third grades of the number of disease spots cannot significantly affect the disease grade of apple leaves. Likewise, correlation analysis suggested that the Spearman correlation coefficients between the IID and MVD and apple leaf disease grade reached 0.67 and 0.61, respectively, with a significance test value of 0.00. The standardized coefficients of the variables were 0.241 and 0.137. As revealed by the analysis of the ordered logistic regression model, there was a positive correlation between each grade evaluation indicator and the dependent variable of disease grade. Nevertheless, the fourth grade in the IID indicator and the first grade in the MVD indicator did not have a significant relationship with the disease grade of apple leaves.

In brief, given the distribution of disease spots on apple leaves, the two newly added grade evaluation indicators of IID and MVD more significantly affected the grade of apple leaf diseases. Thus, they can be used to improve and optimize the conventional leaf disease grade identification model. Notably, the results of the statistical analysis conformed to the assumptions made by reading considerable references at the early stage of this study. On that basis, SPSS26.0 statistical software was used for PCA-logistic regression analysis.

First, the PCA principal component analysis method was adopted to consider five indicators that may affect apple leaf disease grade evaluation. Based on the importance coefficient, the indicators of greater than 85% (RST, IID, MVD, and NUM) served as the principal components applied in the evaluation model. Next, the stepwise multiple regression method was used. Since the random error term of the optimized model obeyed the normal distribution, this multiple linear regression analysis was feasible. Finally, the model of the conventional method that relied only on a single indicator of spot area ratio for grade evaluation can be summarized as follows: apple leaf disease grade = 0.812 × RST + 0.794. The model of the optimized apple leaf disease grade evaluation model in this study can be summarized as follows: apple leaf disease grade = 0.595 × RST + 0.072 × NUM + 0.244 × IID + 0.194 × MVD − 0.160.

The disease grade evaluation model before and after optimization obtained in the study can also be adopted to predict the disease grade of apple leaves. Accordingly, to confirm the feasibility of the new model and new method, the target detection algorithm was selected to calculate and predict parameters under the grade evaluation indicator. The images of three diseases of apple leaves that have been labeled at the previous stage were input into the neural network for training. The experimental performance of the Mask R-CNN algorithm used in this study was optimized, with an accuracy of 89.47% and a recall of 91.05% (Table 7). Compared with the conventional target detection algorithm Faster R-CNN, the accuracy and recall rate were increased by 4.91% and 5.19%, respectively, and the average accuracy and average recall rate of the three apple leaf diseases were increased compared with those before optimization. On that basis, the target detection algorithm Mask R-cnn can obtain the types of leaf disease, and the parameter values were employed for the disease spots under each grade. Next, the obtained data were substituted into the known apple leaf disease grade evaluation models Y1 and Y4 before and after optimization. The overall average accuracy rate of the optimized apple leaf disease grade evaluation model was 90.12%, 20.48% higher than that of the model before optimization (Table 8), proving the effectiveness of the method and model. Among them, the recognition accuracy of the black rot grade of apple leaf disease was increased by 5.99%, the recognition accuracy of the scab grade was increased by 10.5%, and the recognition accuracy of the rust grade was increased by 39.33%. Nevertheless, the rust class recognition accuracy was increased by 39.33%, and the accuracy of rust disease recognition reached 55.17% when the spot area ratio was the only criterion for measuring the disease grade. The classification accuracy for rusts was most significantly optimized. Table 9 shows the results of this study compared with other studies, demonstrating the validity of this study. It can be seen that the method and model selected in this study are more effective in the grade identification of apple leaf diseases.

The reason for this result is that when the spot area ratio was used as the only measure of disease rank, its variation was not closely correlated with the variation of rust rank in real situations. As depicted in Figure 12, the severity grades obtained from most of the datasets using the optimized grading evaluation model were almost the same as the real rank of apple leaf diseases determined by experts, such that the apple leaf disease grading evaluation method based on PCA-logistic regression analysis in this study can be feasible and effective. In this study, compared with other studies using deep learning methods to identify leaf diseases, we mainly proposed some novel evaluation metrics, thus improving the rank evaluation model of apple leaf diseases.

5. Conclusions

The mainstream method for the classification of existing plant leaf diseases has been the spot area ratio, i.e., the ratio of leaf spot area to total leaf area. However, this classification standard is insufficient to divide the leaf disease grade carefully. Thus, a PCA-logistic regression analysis-based apple leaf disease rating evaluation method was proposed in this study.

This method takes three common apple leaf diseases as the research objects, i.e., black rot, scab, and rust. Based on conventional decision factors (e.g., disease spot area ratio, number, color, and texture), two new grades of evaluation indicators were added under the effect of apple leaf damage and spot distribution on the judgment of leaf disease grades. To be specific, the indicators of imbalance degree and the main vein distance were added. SPSS26.0 software was employed for statistical analysis, the PCA principal component analysis method was adopted to determine the four grade evaluation indicators from multiple factors. After multiple stepwise regression analysis, the improved model for identifying apple leaf disease grades before and after was proposed. The validity of the method was verified through correlation analysis and quantitative ordered variable analysis.

As indicated by the final comparative experimental results, the accuracy and recall of Mask R-CNN recognition were increased over Faster R-CNN by 4.91% and 5.19%, respectively. As indicated by the result, given the distribution of lesions and the newly added evaluation indicators of imbalance degree and the main vein distance, the recognition accuracy of the grade evaluation model was increased by 20.48% on average after optimization. The above results confirmed the effectiveness of the proposed model. The evaluation of the grade of leaf disease in apple trees based on the PCA-logistic regression analysis proposed in this study can technically support the establishment of plant disease grade standards, suggesting that it is capable of dividing apple diseases into four grades at the disease spot area ratio of less than 25%. Furthermore, the model presented in this study can lay a certain theoretical basis for the subsequent spraying of pesticides on apple leaf diseases.

In the subsequent work, the new methods will be applied to the classification, and diseases will be graded to further improve this study. Moreover, given the knowledge regarding plant pathology, it is still an important research direction for how to apply the pesticide dosage of diseases under the various grades subdivided in this study. Additionally, it may help to consider how to extend the application of the experimental methods and objects of this study to other common crops or plant leaf diseases in terms of grade identification. This is also an area that needs attention in the future.

Author Contributions

Conceptualization and methodology, B.X. and D.W.; data curation, B.X. and T.Y.; writing—original draft preparation, B.X.; writing—review and editing, B.X., T.Y. and D.W.; supervision, D.W.; project administration, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

Funded by contract research of non-government funded projects (2022092803003).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Cai, S.; Zheng, B.; Zhao, Z.; Zheng, Z.; Yang, N.; Zhai, B. Precision Nitrogen Fertilizer and Irrigation Management for Apple Cultivation Based on a Multilevel Comprehensive Evaluation Method of Yield, Quality, and Profit Indices. Water 2023, 15, 468. [Google Scholar] [CrossRef]
Zhou, J.; Zhao, D.; Chen, Y.; Kang, G.; Cheng, C. Analysis of changes in apple production areas in China. J. Fruit Trees 2021, 38, 372–384. [Google Scholar] [CrossRef]
Huo, X.-x.; Liu, T.-j.; Liu, J.-d.; Wei, Y.-a.; Yao, X.-s.; Ma, X.-y.; Lu, F. China Apple Industry Development Report in 2020 (Condensed Version). China Fruit Veg. 2022, 42, 1–6. [Google Scholar] [CrossRef]
Zhang, C.; Zhang, S.; Yang, J.; Shi, Y.; Chen, J. Apple leaf disease identification using genetic algorithm and correlation based feature selection method. Int. J. Agric. Biol. Eng. 2017, 10, 74–83. [Google Scholar]
Khan, A.; Nawaz, U.; Ulhaq, A.; Robinson, R.W. Real-time Plant Health Assessment Via Implementing Cloud-based Scalable Transfer Learning On AWS DeepLens. PLoS ONE 2020, 15, e0243243. [Google Scholar] [CrossRef] [PubMed]
Bensaci, O.A.; Aliat, T.; Berdja, R.; Popkova, A.V.; Kucher, D.E.; Gurina, R.R.; Rebouh, N.Y. The Use of Mycoendophyte-Based Bioformulations to Control Apple Diseases: Toward an Organic Apple Production System in the Aurès (Algeria). Plants 2022, 11, 3405. [Google Scholar] [CrossRef] [PubMed]
Zhang, W.; Zhou, G.; Chen, A.; Hu, Y. Deep multi-scale dual-channel convolutional neural network for Internet of Things apple disease detection. Comput. Electron. Agric. 2022, 194, 106749. [Google Scholar] [CrossRef]
Sottocornola, G.; Baric, S.; Stella, F.; Zanker, M. Case study on the development of a recommender for apple disease diagnosis with a knowledge-based Bayesian Network. In Proceedings of the Workshop Proceedings of the 3rd Edition of Knowledge-Aware and Conversational Recommender Systems (KaRS) and the 5th Edition of Recommendation in Complex Environments (ComplexRec), Amsterdam, The Netherlands, 27 September–1 October 2021; Volume 2960. [Google Scholar]
Khan, M.A.; Akram, T.; Sharif, M.; Awais, M.; Javed, K.; Ali, H.; Saba, T. CCDF: Automatic system for segmentation and recognition of fruit crops diseases based on correlation coefficient and deep CNN features. Comput. Electron. Agric. 2018, 155, 220–236. [Google Scholar] [CrossRef]
Singh, V.; Misra, A.K. Detection of plant leaf diseases using image segmentation and soft computing techniques. Inf. Process. Agric. 2017, 4, 41–49. [Google Scholar] [CrossRef]
Gogoi, N.K.; Deka, B.; Bora, L.C. Remote sensing and its use in detection and monitoring plant diseases: A review. Agric. Rev. 2018, 39, 307–313. [Google Scholar] [CrossRef]
Shi, T.; Liu, Y.; Zheng, X.; Hu, K.; Huang, H.; Liu, H.; Huang, H. Recent advances in plant disease severity assessment using convolutional neural networks. Sci. Rep. 2023, 13, 2336. [Google Scholar] [CrossRef]
Gonzalez-Dominguez, E.; Caffi, T.; Rossi, V.; Salotti, I.; Fedele, G. Plant disease models and forecasting: Changes in principles and applications over the last 50 years. Phytopathology 2023, 113, 588–752. [Google Scholar] [CrossRef] [PubMed]
Kuswidiyanto, L.W.; Noh, H.H.; Han, X. Plant Disease Diagnosis Using Deep Learning Based on Aerial Hyperspectral Images: A Review. Remote Sens. 2022, 14, 6031. [Google Scholar] [CrossRef]
Wang, H.; Shang, S.; Wang, D.; He, X.; Feng, K.; Zhu, H. Plant Disease Detection and Classification Method Based on the Optimized Lightweight YOLOv5 Model. Agriculture 2022, 12, 931. [Google Scholar] [CrossRef]
Kim, D.G.; Burks, T.F.; Qin, J.; Bulanon, D.M. Classification of citrus peel diseases using color texture feature analysis. Int. J. Agric. Biol. Eng. 2009, 2, 41–50. [Google Scholar]
Ghaiwat, S.N.; Arora, P. Detection and Classification of Plant Leaf Diseases Using Image processing Techniques: A Review. Int. J. Recent Adv. Eng. Technol. 2014, 2, 1–7. [Google Scholar]
Ramcharan, A.; Baranowski, K.; McCloskey, P.; Ahmed, B.; Legg, J.; Hughes, D.P. Using Transfer Learning for Image-Based Cassava Disease Detection. arXiv 2017, arXiv:1707.03717. [Google Scholar]
Fuentes, A.; Yoon, S.; Kim, S.C.; Park, D.S. A Robust Deep-Learning-Based Detector for Real-Time Tomato Plant Diseases and Pests Recognition. Sensors 2017, 17, 2022. [Google Scholar] [CrossRef] [PubMed]
Chao, X.; Sun, G.; Zhao, H.; Li, M.; He, D. Identification of Apple Tree Leaf Diseases Based on Deep Learning Models. Symmetry 2020, 12, 1065. [Google Scholar] [CrossRef]
Baum, T.; Navarro-Quezada, A.; Knogge, W.; Douchkov, D.; Schweizer, P.; Seiffert, U. Hypharea--automated analysis of spatiotemporal fungal patterns. J. Plant Physiol. 2011, 168, 72–78. [Google Scholar] [CrossRef]
Mokhtar, U.; Ali, M.A.; Hassenian, A.E.; Hefny, H. Tomato leaves diseases detection approach based on Support Vector Machines. In Proceedings of the 2015 11th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2015; pp. 246–250. [Google Scholar]
Wang, Z.; Zheng, C.; Li, T.; He, X. Analysing the preference for pesticide spray to be deposited at leaf-tips. Biosyst. Eng. 2021, 204, 247–256. [Google Scholar] [CrossRef]
Rezk, N.G.; Attia, A.F.; El-Rashidy, M.A.; El-Sayed, A.; Hemdan, E.E.D. An Efficient Plant Disease Recognition System Using Hybrid Convolutional Neural Networks (CNNs) and Conditional Random Fields (CRFs) for Smart IoT Applications in Agriculture. Int. J. Comput. Intell. Syst. 2022, 15, 65. [Google Scholar] [CrossRef]
Lakhdari, K.; Saeed, N. A new vision of a simple 1D Convolutional Neural Networks (1D-CNN) with Leaky-ReLU function for ECG abnormalities classification. Intell.-Based Med. 2022, 6, 100080. [Google Scholar] [CrossRef]
Le-Tien, T.; To, T.N.; Vo, G. Graph-based signal processing to convolutional neural networks for medical image segmentation. Seatuc. J. Sci. Eng. 2022, 3, 9–15. [Google Scholar]
Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
Tomita, G.; Matsubara, K.; Ido, T.; Kitazawa, Y. Computerized digital image analysis of optic nerve heads with a three-dimensional image analyzer, IMAGEnet and a comparison with the Optic Nerve Head Analyzer. Nippon. Ganka Gakkai Zasshi 1989, 93, 741–746. [Google Scholar]
Abas, M.A.H.; Ismail, N.; Yassin, A.I.M.; Taib, M.N. VGG16 for Plant Image Classification with Transfer Learning and Data Augmentation. Int. J. Eng. Technol. 2018, 7, 90–94. [Google Scholar] [CrossRef]
Rahman, C.R.; Arko, P.S.; Ali, M.E.; Khan, M.A.I.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and Recognition of Rice Diseases and Pests Using Convolutional Neural Networks. Biosyst. Eng. 2020, 194, 112–120. [Google Scholar] [CrossRef]
Kumar Gupta, R.; Kumar Rai, R.; Kumar Tiwari, P.; Kumar Misra, A.; Martcheva, M. A mathematical model for the impact of disinfectants on the control of bacterial diseases. J. Biol. Dyn. 2023, 17, 2206859. [Google Scholar] [CrossRef]
Gao, S.; Binod, P.; Chukwu, C.W.; Kwofie, T.; Safdar, S.; Newman, L.; Choe, S.; Datta, B.K.; Attipoe, W.K.; Zhang, W.; et al. A mathematical model to assess the impact of testing and isolation compliance on the transmission of COVID-19. Infect. Dis. Model. 2023, 8, 427–444. [Google Scholar] [CrossRef]
Sutulo, S.; Soares, C.G. Application of an offline identification algorithm for adjusting parameters of a modular manoeuvring mathematical model. Ocean. Eng. 2023, 279, 114328. [Google Scholar] [CrossRef]
Edwards, A.J.; Benson, L.; Guo, Z.; López-García, M.; Noakes, C.J.; Peckham, D.; King, M.F. A mathematical model for assessing transient airborne infection risks in a multi-zone hospital ward. Build. Environ. 2023, 238, 110344. [Google Scholar] [CrossRef]
Dai, C. Life Cycle Management of Heavy-duty Railway Public Works Equipment Based on Digital Model. Railw. Constr. 2022, 62, 37–41. [Google Scholar]
Chen, H.; Wang, L.; Ran, Y. Research progress in forecasting technology of major tobacco diseases based on digital models. Plant Med. 2023, 2, 18–24. [Google Scholar] [CrossRef]
Anggriani, N.; Amelia, R.; Istifadah, N.; Arumi, D. Optimal control of plant disease model with roguing, replanting, curative, and preventive treatment. J. Phys. Conf. Ser. 2020, 1657, 012050. [Google Scholar] [CrossRef]
Shaw, P.K.; Kumar, S.; Momani, S.; Hadid, S. Dynamical analysis of fractional plant disease model with curative and preventive treatments. Chaos Solitons Fractals Interdiscip. J. Nonlinear Sci. Nonequilibrium Complex Phenom. 2022, 164, 112705. [Google Scholar] [CrossRef]
Wei, Y.; Deng, Y.; Shi, Z.; Yang, G.; Qing, D.Y.; Sin, S.; Qin, Y. Prediction of the epidemic model of Eucalyptus glauca. Guangxi For. Sci. 1998, 4, 170–173. [Google Scholar] [CrossRef]
Shi, M.; Zhang, M.J. Study on detection and reporting model of eucalyptus wilt. J. Cent. South Univ. For. Technol. 1997, 17, 26–31. [Google Scholar]
Pan, H.; Huang, Y.; Cao, J. Research on the evaluation system of tunnel water seepage disease level. J. Railw. Eng. 2010, 27, 63–67. [Google Scholar]
Cao, Z.; Li, B.; Zhao, C.; Gao, J.; Ma, G. Research on the Growth Model and Prediction Model of Tobacco Red Spot Disease and Target Spot Disease. J. Jilin Agric. 2020, 42, 1–8. [Google Scholar] [CrossRef]
Tang, R.; Zhou, Z.; Chen, Y.; Long, J.; Li, X.; Yang, L. Research on the technology of measuring and reporting the occurrence of tobacco erysipelas in Hunan. J. Hunan Agric. Univ. (Nat. Sci. Ed.) 2005, 2, 165–169. [Google Scholar]
Dietz-Pfeilstetter, A.; Mendelsohn, M.; Gathmann, A.; Klinkenbuß, D. Considerations and Regulatory Approaches in the USA and in the EU for dsRNA-Based Externally Applied Pesticides for Plant Protection. Front. Plant Sci. 2021, 12, 974. [Google Scholar] [CrossRef] [PubMed]
Belakhov, V.V.; Boikova, I.V.; Novikova, I.I.; Kolodyaznaya, V.A. Results of Examination of the Biological Activity of Nonmedical Antibiotics with a View to Finding Environmentally Friendly Pesticides for Plant Protection. Russ. J. Gen. Chem. 2018, 88, 2982–2989. [Google Scholar] [CrossRef]
Emilio, M.; Eduard, B. ChemInform Abstract: Synthetic Antimicrobial Peptides as Agricultural Pesticides for Plant-Disease Control. Chem. Biodivers. 2008, 5, 1225–1237. [Google Scholar]
Patil, R.R.; Kumar, S.; Chiwhane, S.; Rani, R.; Pippal, S.K. An Artificial-Intelligence-Based Novel Rice Grade Model for Severity Estimation of Rice Diseases. Agriculture 2022, 13, 47. [Google Scholar] [CrossRef]
Guo, W.; Feng, Q.; Li, X. Research progress of convolutional neural network model based on crop disease detection and identification. Chin. J. Agric. Mech. 2022, 43, 157–166. [Google Scholar] [CrossRef]
Folle, L.; Fenzl, P.; Fagni, F.; Thies, M.; Christlein, V.; Meder, C.; Simon, D.; Minnopoulou, I.; Sticherling, M.; Schett, G.; et al. DeepNAPSI multi-reader nail psoriasis prediction using deep learning. Sci. Rep. 2023, 13, 5329. [Google Scholar] [CrossRef] [PubMed]
Bao, W.; Zhao, J.; Hu, G.; Zhang, D.; Huang, L.; Liang, D. Identification of wheat leaf diseases and their severity based on elliptical-maximum margin criterion metric learning. Sustain. Comput. Inform. Syst. 2021, 30, 100526. [Google Scholar] [CrossRef]
Fang, T.; Chen, P.; Zhang, J.; Wang, B. Crop leaf disease grade identification based on an improved convolutional neural network. J. Electron. Imaging 2020, 29, 1. [Google Scholar] [CrossRef]
Bertalmio, M.; Vese, L.; Sapiro, G.; Osher, S. Simultaneous structure and texture image inpainting. In Proceedings of the IEEE Computer Society Conference on Computer Vision & Pattern Recognition, Madison, WI, USA, 18–20 June 2003. [Google Scholar]
Sun, J.; Yuan, L.; Jia, J.; Shum, H.Y. Image completion with structure propagation. Acm. Trans. Graph. 2005, 24, 861–868. [Google Scholar] [CrossRef]
Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context Encoders: Feature Learning by Inpainting. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Yang, C.; Lu, X.; Lin, Z.; Shechtman, E.; Wang, O.; Li, H. High-Resolution Image Inpainting using Multi-Scale Neural Patch Synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Wang, H.-Y.; Xiao, H.-H.; Ma, S.-H. MSF-PPD network-based point cloud complementation method for green leaf under shading conditions. J. Agric. Mach. 2021, 52, 1–13. [Google Scholar]
Cui, X.; Duan, X.D.; Zheng, Y.; Li, Y.L.; Yang, X.D. Spatial distribution of black spot on leaves of four-season mallow and sampling technique. J. Northeast. For. Univ. 2015, 43, 104–106. [Google Scholar] [CrossRef]
Tang, X.; Lu, J.; Zang, J. Spatial distribution characteristics of rust spots on apple tree leaves. North. Hortic. 2017, 8, 119–123. [Google Scholar]
Liu, Z.; Zhang, J.; Wang, F.; Liu, Y.; Cui, J. Study on the vertical distribution of ash narrow girdling on ash trees of different damage classes. Hebei For. Sci. Technol. 2022, 4, 21–24. [Google Scholar] [CrossRef]
Xiang, Q.; Wang, X.; Li, R.; Zhang, G.; Lai, J.; Hu, Q. Fruit Image Classification Based on MobileNetV2 with Transfer Learning Technique. In Proceedings of the 3rd International Conference, Sanya, China, 22–24 October 2019. [Google Scholar]
Amarasingam, N.; Gonzalez, F.; Salgadoe, A.S.A.; Sandino, J.; Powell, K. Detection of White Leaf Disease in Sugarcane Crops Using UAV-Derived RGB Imagery with Existing Deep Learning Models. Remote Sens. 2022, 14, 6137. [Google Scholar] [CrossRef]
Albarrak, K.; Gulzar, Y.; Hamid, Y.; Mehmood, A.; Soomro, A.B. A Deep Learning-Based Model for Date Fruit Classification. Sustainability 2022, 14, 6339. [Google Scholar] [CrossRef]
Gulzar, Y.; Hamid, Y.; Soomro, A.B.; Alwan, A.A.; Journaux, L. A Convolution Neural Network-Based Seed Classification System. Symmetry 2020, 2020, 2018. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
Singh, D.; Jain, N.; Jain, P.; Kayal, P.; Kumawat, S.; Batra, N. PlantDoc: A Dataset for Visual Plant Disease Detection. In Proceedings of the 7th ACM IKDD CoDS and 25th COMAD, Hyderabad, India, 5–7 January 2020; pp. 249–253. [Google Scholar]
Li, S.; Wang, D.; Xu, S.; Zhang, M.; Li, H. Correlation analysis between the fungal community structure and physical and chemical factors in the rhizosphere soil of A. chinensis in different planting areas. J. Trop. Crops. 2023, 46, 1–14. [Google Scholar]
Li, M.; Niu, R.; Song, Q.; Su, C.; Wang, H.; Zhang, B.; Wu, Z. Research on the relationship between human agricultural and veterinary drug chemical pollutant residues and hypercholesterolemia based on principal component analysis-Logistic regression method. Chin. J. Food Hyg. 2022, 34, 1–13. [Google Scholar]
Liu, Q.; Wen, X.; Luo, W.; Wang, Y.; Hui, C. A method for identifying cracks in railway tunnels based on MASK R-CNN. Sci. Technol. Innov. 2023, 3, 5–9. [Google Scholar]
Gao, S.Y. Occurrence and control of brown rot of apple. Northwest Hortic. (Fruit Tree) 2012, 2, 53. [Google Scholar]
Li, C.; Yu, N.; Liu, Z. Survey and control measures of apple rust in Liaonan region. North. Fruit Trees 2020, 4, 29–30. [Google Scholar]
Wang, K.; Gong, X.; Liu, L. Investigation and evaluation of apple spotted leaf drop resistance in apple local variety resources. China Fruit Tree 2015, 5, 81–84. [Google Scholar]

Figure 1. Common diseases of apple leaves, including black rot, black star disease, and rust [32,33].

Figure 2. HSV color threshold segmentation pretreatment for apple leaf diseases. (The lesion area on the leaf was 697 pixels, and the total area of the leaf was 26,602 pixels, as depicted in Figure 2).

Figure 3. Correlation heat map between grade evaluation indicators.

Figure 4. Summary of Ordinal Logistic Regression Model Analysis Results.

Figure 5. Importance degree coefficient of grade evaluation indicator.

Figure 6. A summary of the results for the four models.

Figure 7. The optimized standardized residual P –P scatterplot of apple leaf disease grade evaluation model 4.

Figure 8. LabelMe software was used to label and preprocess the apple leaf disease pictures in the dataset.

Figure 9. The entire network process of putting the labeled dataset into the algorithm.

Figure 10. Comparison chart of black rot input and output networks.

Figure 11. Convergence curve of training results of the two networks.

Figure 12. A random selection of apple leaf disease models before and after optimization, compared with the real grade of apple leaf diseases judged by experts.

Table 1. Optimized Grading Standards for Apple Leaf Diseases.

Disease Grade	RST	NUM	IID	MVD
1	0%–5%	0–5	100 px or more	50 px or more
2	5%–10%	5–10	50–100 px	30–50 px
3	10%–15%	10–15	20–50 px	15–30 px
4	15% or more	15 or more	0–20 px	0–15 px

Table 2. Pseudo-R-squared value suggesting how well the model fits.

CoxSnell	Negorko	Mcfadden
0.812	0.866	0.602

Table 3. Summary results of model input variables.

Model	The Input Variable	Method
1	RST	Stepping (Condition: The probability of X to be entered is less than 0.050, the probability of X to be removed is greater than 0.100
2	NUM
3	IID
4	MVD

Table 4. The weights of variable coefficients of apple leaf disease grade evaluation model 1 before optimization.

Model		Unnormalized Coefficient		Standardized Coefficient	Significance Coefficient	VIF
Model		B	Standard Error	Beta	Significance Coefficient	VIF
1	(Constant)	0.794	0.085		0.000
1	RST	0.812	0.035		0.000	1.000

Table 5. The weights of variable coefficients of apple leaf disease grade evaluation model 4 after optimization.

Model		Unnormalized Coefficient		Standardized Coefficient	Significance Coefficient	VIF
Model		B	Standard Error	Beta	Significance Coefficient	VIF
4	(Constant)	−0.160	0.146		0.275
	RST	0.595	0.039	0.626	0.000	1.778
	NUM	0.072	0.043	0.062	0.046	1.456
	IID	0.244	0.044	0.241	0.000	1.976
	MVD	0.194	0.057	0.137	0.001	1.694

Table 6. The summary of the network operating environment.

GPU	NVIDIA GeForce RTX 2060Ti
CPU	Intel(R) Core (TM) i7-9700
Operating Environment	Windows 10 (64-bit)
Programming Language Version	Python 3.6
Deep Learning Framework	TensorFlow 2.4.0
Training Batch	100
Learning Rate	0.001

Table 7. Comparison of network output results of three apple leaf diseases.

	Accuracy/(%)	Recall/(%)	Average Accuracy/(%)			Average Recall/(%)
	Accuracy/(%)	Recall/(%)	Black Rot	Scab	Rust	Black Rot	Scab	Rust
Faster R-CNN	84.56	85.86	83.67	73.75	96.25	84.20	77.15	96.23
Mask R-CNN	89.47	91.05	90.12	79.96	98.32	91.37	83.23	98.55

Table 8. The results of recognition accuracy of apple leaf disease grade evaluation before and after optimization.

	Model 1 before Optimization	Model 4 after Optimization
Black Rot	81.56%	87.55%
Scab	75.17%	85.67%
Rust	55.17%	94.50%
Overall Average	69.64%	90.12%

Table 9. Comparison of data with other studies.

Subject	Method	Model	Accuarcy
Apple leaf disease	Hyperspectral imaging	GoogLeNet	89%
Artemisiae argyi folium	PCA-Logistic regression analysis	SVM	91.25%
Apple leaf disease	PCA-Logistic regression analysis	Mask R-CNN	90.12%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xing, B.; Wang, D.; Yin, T. The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis. Forests 2023, 14, 1290. https://doi.org/10.3390/f14071290

AMA Style

Xing B, Wang D, Yin T. The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis. Forests. 2023; 14(7):1290. https://doi.org/10.3390/f14071290

Chicago/Turabian Style

Xing, Bingqian, Dian Wang, and Tianzhen Yin. 2023. "The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis" Forests 14, no. 7: 1290. https://doi.org/10.3390/f14071290

APA Style

Xing, B., Wang, D., & Yin, T. (2023). The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis. Forests, 14(7), 1290. https://doi.org/10.3390/f14071290

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Evaluation of the Grade of Leaf Disease in Apple Trees Based on PCA-Logistic Regression Analysis

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Data Collection and Data Processing

2.3. Variables Selection

2.3.1. Definition of Spot Area Ratio

2.3.2. Definition of Leaf Blade Mutilation

2.3.3. Definition of Imbalance Degree

2.3.4. Definition of Main Vein Distance

2.3.5. Optimized Grading Standards

2.4. Model Development

2.4.1. Correlation Analysis

2.4.2. Quantitative Ordered Analysis

2.4.3. PCA Analysis

2.4.4. Stepwise Multiple Logistic Regression Analysis

2.5. Object Detection Algorithm

2.6. Experiment Setup

2.7. Performance Metrics

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI