1. Introduction
China has a history of over 2000 years. Apple is one of the vital fruit crops in China, and the area of cultivation has been expanding on a year-to-year basis [
1,
2]. From 1978 to 2021, China’s annual apple production has risen from 2,275,200 tons to 45,973,400 tons [
3]. With the upgrading of the conventional apple industry, China is a major source of energy for the apple industry, currently ranking first worldwide. Apple disease control has aroused significant attention from farmers who usually identify apple diseases using costly and ineffective methods (i.e., following experience, books, the internet, and expert advice) [
4]. Accordingly, automatic detection methods should be adopted to identify disease types and levels more accurately in apple production [
5,
6,
7,
8]. The method can be employed to monitor apple health in the long term and effectively assess apple diseases, such that subsequent timely control can be facilitated, and economic efficiency can be increased [
9].
In modern agriculture, monitoring based on plant leaf spot characteristics has become a research hotspot for automatic plant disease monitoring [
10,
11,
12,
13,
14,
15]. In 2009, Dae Gwan Kim, Thomas F. Burks, Jianwei Qin [
16], Savita N. Ghaiwat, Parul Arora, et al. [
17] suggested that tools (e.g., probabilistic neural networks, principal component analysis, artificial neural networks, and fuzzy logic) have been adopted to build leaf classification models. With the advancing technology in computer vision, deep learning has been employed more often for disease detection for its convenience and high accuracy (e.g., cassava [
18], potato [
19], and apple [
20]). Tobias Bauma, Aura Navarro-Quezada, Wolfgang Knogge, et al. [
21] classified fungal colonies using Genetic algorithms. Moreover, support vector machines, k-nearest neighbor algorithms, artificial neural networks, and decision trees have been extensively employed for addressing classification problems [
22,
23,
24,
25].
Convolutional neural networks exhibit strong feature learning capability and prominent generalization ability for multi-classification tasks. Since convolutional neural networks have been adopted to solve classification tasks, several classical classification networks have been developed (e.g., Alexnet [
26], Image net [
27], and Vgg16 [
28]). Novel networks are constantly proposed to increase the classification accuracy of the net. The Internet hardware configuration bars have been poor in remote planting areas of China. Thus, the complexity of the net and the size of the model are involved in the selection criteria of the network. As novel networks are proposed one after another, the accuracy of the network is rising, while the number of layers of the network and that of parameters are also increasing. In addition, difficulties exist in deploying the above-described high-performance networks in real-life applications. For instance, VGG16 covers nearly 138 million parameters, whereas considerable network parameters hinder their application. Chowdhury R. Rahman, Preetom S. Arko, Mohammed E. Ali, et al. [
29] proposed an optimized lightweight model for the model extension of rice pest identification.
A numerical model refers to a mathematical structure expressed in general or approximate terms through quantitative analysis using mathematical language regarding the characteristics or quantitative dependence of a system of things [
30,
31,
32,
33,
34]. The occurrence of crop diseases refers to a series of qualitative morphological, physiological, and biochemical pathological changes under the effects of biotic or abiotic factors. The numerical model construction of disease is elucidated as follows: the qualitative practical problems of leaf disease occurrence are converted into corresponding quantitative mathematical problems, and the weighting ratios of various factors leading to disease occurrence are analyzed, and then the results are returned to the production reality. Regression analysis in mathematical and statistical models exhibits the advantages of solving a wide range of problems, requiring a moderate amount of data, and being easy to operate. Furthermore, it is most generally adopted to predict a wide variety of diseases [
35,
36,
37,
38].
Wei, Yingwen, et al. [
39] developed a model to predict the incidence of cyanobacteria by analyzing soil and meteorological factors in Leizhou, Guangdong, based on a gray system and multiple regression methods. Shi Mingwang et al. [
40] suggested that the gray system forecasting model exhibits higher accuracy than that of the multiple linear regression model through the statistics of climatic conditions and the incidence of cyanobacteria in the Wuchuan area. Pan, Haize, Huang, Yuanchun, et al. [
41] employed mathematical methods (e.g., the Marxian distance method) to build a tunnel seepage and water leakage disease grade evaluation model. Zhao and Changzhou [
42], in 2018, analyzed the degree of effect of different meteorological factors (e.g., atmospheric temperature, atmospheric humidity, and sunshine hours) on disease prevalence using a multiple regression method. They suggested that the above-mentioned factors are significantly correlated with the development of disease. Afterward, they built a prediction model of disease and elucidated the respective statistical analysis indicator of the model to determine the theoretical basis of this model.
To alleviate the harm arising from diseases, spraying pesticides has been confirmed as one of the fastest and most effective ways to prevent diseases [
43,
44,
45]. In addition, the scientific use of pesticide dosage following the severity of the disease is a problem that should be solved. The solution to the above problem can alleviate the harm arising from drug sites and drug residues. The core problem of solving the above problems is how to accurately determine the degree of crop disease promptly and on time. From a traditional perspective, trained experts can assess the severity of plant diseases by visual inspection. However, the expensive labor and time costs do not allow for the inconvenience of disease monitoring [
46,
47,
48,
49].
With powdery mildew and stripe rust of wheat as the research object, Wenxia Bao, Jian Zhao, Gensheng Hu, [
50] et al. proposed an algorithm to identify wheat leaf diseases and their severity through elliptic maximum edge criterion (E-MMC) metric learning using a combination of elliptic metric and maximum spacing criterion to indicate the nonlinear transformation of a spatial structure or semantic information of wheat leaf disease images. Tao Fang, Peng Chen, Jun Zhang, et al. [
51] determined the ratio of the number of pixels in the diseased area to the number of pixels in the diseased leaf area. They incorporated the ratio as the classification threshold of disease classes into the convolutional neural network Resnet-50 to classify ten diseases of eight plants.
In the above-mentioned research, the disease spot area ratio was a single threshold to judge the disease class classification. Good results were achieved; however, the method has some limitations. For instance, leaves are missing leaf edges and areas at the late stage of disease infection under the effect of the fusion of multiple spots, such that inaccurate spot area ratio values are obtained. Consequently, disease classes are inaccurately judged, i.e., a certain correlation exists between leaf mutilation rates and disease classes. The relationship may vary for different crops, different tree species, or different diseases. Thus, the assessment and judgment should be combined with specific situations in practical applications.
M.Bertalmio, L. Vese, G. Sapiro, et al. [
52] proposed an algorithm to fill both texture and structure in regions with missing image information. The basic idea is elucidated as follows. The image is first decomposed into the sum of two functions with different basic features, and then each function is reconstructed separately using structure and texture-filling algorithms. Jian Sun, Lu Yuan, and Jiaya Jia [
53] presented a method to automatically generate synthetic patches of unknown regions in the input image. The method generates synthetic patches from known regions in the input image. Thus, one or more curves are generated to provide missing structures to the unknown regions. Deepak Pathak, Philipp Krahenbuhl, and Jeff Donahue [
54] yielded an unsupervised visual feature learning algorithm based on contextual pixel prediction using the surrounding image information. The aim of this algorithm is to infer the image information of the lost part. Chao Yang, Xin Lu, Zhe Lin, et al. [
55] proposed a multi-scale CNN matching method combining image content and texture. The proposed method can preserve the texture structure and generate high-frequency details by extracting the intermediate layer of the classification network. Thus, the problem of high-definition image complementation can be addressed to a certain extent. Haoyun Wang and Haihong Xiao et al. [
56] proposed a leaf-shape complementation network based on a multi-scale feature extraction module combined with a point cloud pyramid decoder (MSF-PPD) to achieve global extraction and fusion and multi-stage generation of leaf point clouds for complementation in natural backgrounds (e.g., obscured green leaves).
The leaf veins are one of the important organs of plants, whose main function is to transport water and nutrients. The primary leaf veins (i.e., the main veins) are located at the center of the leaf and are the core of the leaf veins. As revealed by numerous studies, leaf veins cover the most critical physiological and genetic information regarding plants. The leaf veins, especially the primary veins, serve as highly valuable indicators to identify plant species and analyze the growth and development of plants. They take on critical significance in research on the classification of leaf diseases. Xin Cui [
57] et al. investigated the spatial distribution pattern of black spot disease spots on leaves of four-season mallow in Changchun. As indicated by the results, the lower the number of spots per leaf, the greater the relative clustering indicator will be. Using diffusion coefficient C and other methods, Tang Xiaoqin [
58] studied the spatial distribution pattern of rust disease on the leaves of apple trees in Gongbu Nature Reserve in southeastern Tibet using apple trees. Liu, Zeyong, and Zhang et al. [
59] investigated the distribution of ash narrow girdling on ash trees with different degrees of damage over the vertical height of the tree. Their result suggested that the degree of aggregation of spots, i.e., the distribution of spots on the leaves can, exert a certain effect on the judgment of the disease grade. Q Xiang [
60] used the fruit image classification method of MobileNetV2, a lightweight neural network based on transfer learning technology, to recognize fruit images. Amarasingam N [
61] evaluated the performance of the existing DL models such as YOLOv5, YOLOR, DETR, and Faster R-CNN to recognize WLD in sugarcane crops. Albarrak K [
62] used convolutional neural networks to identify and classify date fruits through deep learning models. Gulzar Y [
63] proposed a classification system for seeds by employing CNN and transfer learning, which contains a model that classifies 14 commonly known seeds with the application of advanced deep learning techniques.
In brief, given the effect of the distribution of disease spots in the leaves on the determination of apple leaf disease severity, this study placed a focus on how to optimize the grading criteria of apple leaf diseases and built a novel grading evaluation model. Based on the target detection algorithm and convolutional neural network, an apple leaf disease grading evaluation method was proposed in this study based on PCA-logistic regression analysis by integrating the conventional disease spot area ratio and the newly added evaluation indicators through statistical correlation analysis. Furthermore, an optimized novel model was built for apple leaf disease severity identification.
4. Discussion
First, the statistical analysis of SPSS26.0 software indicated that there was a strong correlation between the four grade evaluation indicator variables of this study, such that the correlation passed the correlation test significantly. The multivariate ordered logistic regression analysis showed that the output pseudo-R-squared values all exceeded 0.8. This result suggested that the model fit of this study was good and met the expected experimental standards. The results of the four apple leaf grade evaluation indicators on the determination of the true grade of apple leaf diseases are discussed below. To be specific, the results comprise the disease spot area ratio, the number of disease spots, and two indicators considering the distribution of disease spots, the imbalance degree, and the main vein distance.
The correlation analysis indicated that the Spearman correlation coefficient between the spot area ratio and apple leaf disease grade was 0.87. As indicated by multiple regression analysis, the variable standardized coefficient value was 0.626. Moreover, the single RST had the largest weight of influence on the grade identification of apple leaf disease, with the highest significance. There was a high correlation between the RST and leaf grade. In general, the area ratio of conventional spots remained the main factor leading to the occurrence of apple leaf diseases. The Spearman correlation coefficient between the number of spots and apple leaf disease grade was 0.51, with a significance test value of 0.046. As revealed by multiple regression analysis (the variable standardized coefficient reaching 0.062), the NUM indicator exerted the least effect on grade identification. Furthermore, the first and third grades of the number of disease spots cannot significantly affect the disease grade of apple leaves. Likewise, correlation analysis suggested that the Spearman correlation coefficients between the IID and MVD and apple leaf disease grade reached 0.67 and 0.61, respectively, with a significance test value of 0.00. The standardized coefficients of the variables were 0.241 and 0.137. As revealed by the analysis of the ordered logistic regression model, there was a positive correlation between each grade evaluation indicator and the dependent variable of disease grade. Nevertheless, the fourth grade in the IID indicator and the first grade in the MVD indicator did not have a significant relationship with the disease grade of apple leaves.
In brief, given the distribution of disease spots on apple leaves, the two newly added grade evaluation indicators of IID and MVD more significantly affected the grade of apple leaf diseases. Thus, they can be used to improve and optimize the conventional leaf disease grade identification model. Notably, the results of the statistical analysis conformed to the assumptions made by reading considerable references at the early stage of this study. On that basis, SPSS26.0 statistical software was used for PCA-logistic regression analysis.
First, the PCA principal component analysis method was adopted to consider five indicators that may affect apple leaf disease grade evaluation. Based on the importance coefficient, the indicators of greater than 85% (RST, IID, MVD, and NUM) served as the principal components applied in the evaluation model. Next, the stepwise multiple regression method was used. Since the random error term of the optimized model obeyed the normal distribution, this multiple linear regression analysis was feasible. Finally, the model of the conventional method that relied only on a single indicator of spot area ratio for grade evaluation can be summarized as follows: apple leaf disease grade = 0.812 × RST + 0.794. The model of the optimized apple leaf disease grade evaluation model in this study can be summarized as follows: apple leaf disease grade = 0.595 × RST + 0.072 × NUM + 0.244 × IID + 0.194 × MVD − 0.160.
The disease grade evaluation model before and after optimization obtained in the study can also be adopted to predict the disease grade of apple leaves. Accordingly, to confirm the feasibility of the new model and new method, the target detection algorithm was selected to calculate and predict parameters under the grade evaluation indicator. The images of three diseases of apple leaves that have been labeled at the previous stage were input into the neural network for training. The experimental performance of the Mask R-CNN algorithm used in this study was optimized, with an accuracy of 89.47% and a recall of 91.05% (
Table 7). Compared with the conventional target detection algorithm Faster R-CNN, the accuracy and recall rate were increased by 4.91% and 5.19%, respectively, and the average accuracy and average recall rate of the three apple leaf diseases were increased compared with those before optimization. On that basis, the target detection algorithm Mask R-cnn can obtain the types of leaf disease, and the parameter values were employed for the disease spots under each grade. Next, the obtained data were substituted into the known apple leaf disease grade evaluation models Y1 and Y4 before and after optimization. The overall average accuracy rate of the optimized apple leaf disease grade evaluation model was 90.12%, 20.48% higher than that of the model before optimization (
Table 8), proving the effectiveness of the method and model. Among them, the recognition accuracy of the black rot grade of apple leaf disease was increased by 5.99%, the recognition accuracy of the scab grade was increased by 10.5%, and the recognition accuracy of the rust grade was increased by 39.33%. Nevertheless, the rust class recognition accuracy was increased by 39.33%, and the accuracy of rust disease recognition reached 55.17% when the spot area ratio was the only criterion for measuring the disease grade. The classification accuracy for rusts was most significantly optimized.
Table 9 shows the results of this study compared with other studies, demonstrating the validity of this study. It can be seen that the method and model selected in this study are more effective in the grade identification of apple leaf diseases.
The reason for this result is that when the spot area ratio was used as the only measure of disease rank, its variation was not closely correlated with the variation of rust rank in real situations. As depicted in
Figure 12, the severity grades obtained from most of the datasets using the optimized grading evaluation model were almost the same as the real rank of apple leaf diseases determined by experts, such that the apple leaf disease grading evaluation method based on PCA-logistic regression analysis in this study can be feasible and effective. In this study, compared with other studies using deep learning methods to identify leaf diseases, we mainly proposed some novel evaluation metrics, thus improving the rank evaluation model of apple leaf diseases.
5. Conclusions
The mainstream method for the classification of existing plant leaf diseases has been the spot area ratio, i.e., the ratio of leaf spot area to total leaf area. However, this classification standard is insufficient to divide the leaf disease grade carefully. Thus, a PCA-logistic regression analysis-based apple leaf disease rating evaluation method was proposed in this study.
This method takes three common apple leaf diseases as the research objects, i.e., black rot, scab, and rust. Based on conventional decision factors (e.g., disease spot area ratio, number, color, and texture), two new grades of evaluation indicators were added under the effect of apple leaf damage and spot distribution on the judgment of leaf disease grades. To be specific, the indicators of imbalance degree and the main vein distance were added. SPSS26.0 software was employed for statistical analysis, the PCA principal component analysis method was adopted to determine the four grade evaluation indicators from multiple factors. After multiple stepwise regression analysis, the improved model for identifying apple leaf disease grades before and after was proposed. The validity of the method was verified through correlation analysis and quantitative ordered variable analysis.
As indicated by the final comparative experimental results, the accuracy and recall of Mask R-CNN recognition were increased over Faster R-CNN by 4.91% and 5.19%, respectively. As indicated by the result, given the distribution of lesions and the newly added evaluation indicators of imbalance degree and the main vein distance, the recognition accuracy of the grade evaluation model was increased by 20.48% on average after optimization. The above results confirmed the effectiveness of the proposed model. The evaluation of the grade of leaf disease in apple trees based on the PCA-logistic regression analysis proposed in this study can technically support the establishment of plant disease grade standards, suggesting that it is capable of dividing apple diseases into four grades at the disease spot area ratio of less than 25%. Furthermore, the model presented in this study can lay a certain theoretical basis for the subsequent spraying of pesticides on apple leaf diseases.
In the subsequent work, the new methods will be applied to the classification, and diseases will be graded to further improve this study. Moreover, given the knowledge regarding plant pathology, it is still an important research direction for how to apply the pesticide dosage of diseases under the various grades subdivided in this study. Additionally, it may help to consider how to extend the application of the experimental methods and objects of this study to other common crops or plant leaf diseases in terms of grade identification. This is also an area that needs attention in the future.