1. Introduction
Cotton is one of the dominant commercial crops in major cotton-producing countries such as China, United States, India, Pakistan, Uzbekistan and West Africa [
1,
2], which is not only a raw material for the textile industry, but also plays an important role in the chemical industry. The growth and production of cotton are significant to economic and social development. However, cotton disease infection is one of the main factors affecting the profitability and sustainability of agricultural management [
3], so it is of great importance to monitor the cotton health conditions in a timely manner and precisely detect the disease infection severity. Among all of the cotton diseases and insect pests, cotton aphids have attracted more and more attention. The fast-growing cotton aphid is one of the most destructive sucking pests in the world, and its population increases rapidly [
4,
5]. The infestation of cotton aphids causes damage through direct feeding, virus transmission, and honeydew contamination. Once aphid infection occurs, the cotton leaves curl downward and have a crinkled appearance. Heavy damage to cotton may decrease photosynthesis, resulting in the stunting of seedlings and yield losses. Insecticide application is an effective method of cotton aphid control; however, the abuse of insecticides could result in the failure of disease control and even environmental pollution such as soil contamination. Therefore, it is practically urgent to monitor and control aphid infestation more effectively and efficiently so that prevention and control measures can be formulated to reduce economic losses.
The traditional ground-based survey method of plant infestation usually suffers from tremendous labor costs, low efficiency, and subjective human error, which makes it difficult to accurately estimate the infection areas and severity at a large scale. The most common strategy of this ground-based survey for disease detection is to conduct visual surveys by experienced producers, who can identify subtle changes in plant phenotype such as plant color, curl of the plant leaves, etc., and scout the infected area of the crop. In comparison, the spectral information of plant growth can be collected with remote sensing technology, which not only covers a large area, but also at a relatively low cost [
6,
7,
8], since the stresses induced by pests and diseases affect photosynthesis and the physical structure of plants, which result in the alteration of some characteristic bands among the different severity levels of pests and diseases [
9,
10,
11,
12]. Hence, the crop damage severity caused by diseases and insect pests can be identified with the sensitive wavelength. On this basis, the spectral index (SI) combined with the sensitive spectrum is obviously related to the physiological and biochemical processes of crop infection.
A typical severity identification of crop infestation based on hyperspectral remote sensing technology mainly includes the steps of sensitivity feature selection and classification model determination. In the step of sensitivity feature selection, the features can be reflectance or transformed reflectance data (derivative, continuum removed, etc.) and the vegetation index [
13], in which index construction is a simple and effective method and numerous indices have been formulated to detect the infection of plants by using remote sensing technologies [
14]. Then, the extracted features are usually input into the classical machine learning-based classifier to determine the crop health status. The commonly used classification methods in the model determination step include random forest (RF) [
15], support vector machine (SVM) [
16], K-nearest neighbor (KNN) [
17], etc. These classification models are usually trained on a labeled dataset and the labels of the test dataset are determined by supervised learning. For example, the iterative self-organizing data analysis techniques algorithm (ISODATA) was applied to recognize root rot infested areas of cotton [
18]. Zhao Hengqian et al. [
19] proposed an automatic crop disease severity classification method based on vegetation index normalization with six typical vegetation indices on cotton fields infected with root rot. Elhadi et al. [
20] developed a phaeosphaeria leaf spot identification method based on random forest, which could be used for early disease detection in maize fields with the extracted six key features. Nagasubramanian et al. [
21] used genetic algorithms (GA) to identify the optimal combination of six bands from 240 hyperspectral bands, and then classified the combination with SVM for the early-stage detection of charcoal rot disease in soybean stems. Mirik et al. [
22] adopted the maximum likelihood algorithm to classify Landsat 5 Thematic Mapper (TM) images in Texas to discriminate the wheat streak mosaic virus. In particular, Feng Wei et al. [
23] established an optimum dual-green vegetation index to improve the detection of wheat powdery mildew, which provided a new idea and analytical method for the spectral monitoring of wheat diseases, and also indicated that the spectral monitoring of healthy wheat in constructing a dual-green vegetation index significantly improved the disease estimation precision compared with the optimal common vegetation index. Lili Luo et al. [
24] constructed a sensitive index for measuring spectral differences between leaves infected by MDMV and healthy leaves and also developed the classification model with integrated linear discriminant analysis (LDA) with SVM to detect the damage severity caused by the maize dwarf mosaic virus.
Despite the promising results in previous studies, spot-level hyperspectral measurements have rarely been used to detect cotton aphids, and the crops are usually treated with diseases or insect inoculum, which can make the infection severity of the crops relatively uniform. However, in a natural environment, crop diseases and pests often originate from one place and then spread to other regions. Moreover, it is challenging to analyze and reconstruct the hyperspectral indices to provide valuable information indicating the physiological and biochemical processes of pest and disease damage on plants. In addition, the severity level identification of pest and disease damage in crops mainly relies on classification models. The research has paid more attention on selecting the effective model based on the influence of different classification models, but not much effort has been taken to analyze the inherited relations among different classes or grades, and the severity class is considered as independent class, however, different classes in disease damage correspond to specific disease severity ranges. Ignoring the inter-class associations will affect the classification performance and increase the complexity of the inter-class comparison. Therefore, to address the issue presented above, we constructed six typical spectral indices from the full spectral bands, which were then reconstructed based on the comparison with healthy samples. Furthermore, a novel approach to select sensitive spectral indices was proposed, and simultaneously, a corresponding method of setting thresholds between adjacent disease levels instead of selecting the classification model was established to identify the severity caused by cotton aphids.
2. Materials and Methods
2.1. Experimental Site and Sampling
The experiment was performed in the Korla region (41°44′59″ N, 85°48′30″ E) of Xinjiang, China, a typical cotton production area, which is located in the southern foot of the Tianshan Mountains, and the northeastern edge of the Tarim Basin. It has a temperate continental arid climate, with a total annual sunshine duration of 2990 h, an annual average temperature of 11.4 °C, a minimum of −28 °C, an average annual precipitation of 58.6 mm, and an annual maximum evaporation of 2788.2 mm. Cotton was planted with a large scale and simple planting structure, where cotton aphids were the main cotton pests. The field data experiments were conducted in the budding stage from 3 July to 11 July 2019. Any disease or insect inoculum had not been treated and no pesticide was applied until after data collection in the cotton fields. During these periods, sampling plots were selected randomly for the canopy hyperspectral reflectance and corresponding aphid damage grade measurements.
2.2. Data Collection
2.2.1. Spectral Measurements
All spectral measurements were taken from a height of 50 cm above the canopy, and spectra were obtained under a clear sky with minimal or no wind between 12:00 and 16:00 (Beijing local time) using a Field Spec HandHeld spectrometer. The spectral measurement range was from 325 to 1075 nm with a resolution of 1.4 nm, and the canopy spectra were averaged over ten scans. Before and after each sample measurement, the sensor was calibrated for baseline reflectance using a white polytetrafluoroethylene panel. To reduce the impact of environmental conditions, sample points with 4 m2 (2 m × 2 m) were randomly used for measurement, and the average value was used as the representative spectral reflectance of the sample point. In this study, a total of 61 field measured samples were obtained and used to verify the effectiveness of the proposed method. The samples were randomly split into the training (75%) and testing (25%) datasets.
2.2.2. Disease Severity Assessment
The investigation method was carried out according to the National Standard Pesticide Guidelines for the field efficacy trials, Fungicides against cotton aphid (GB/T15799-2011), which stipulates the requirements for judging the disease level in
Table 1. The disease progression of each plant was calculated according to the percentage of infected leaves. As shown in
Table 1, each selected plant was categorized into one of the five levels of cotton aphid damage severity from Grade 0 (healthy) and Grade 1 (mild infestation) to Grade 4 (severe damage).
Cotton samples in our study were selected from each root quadrat according to the traditional five-point sampling method (in the central area and the surrounding four corners). In order to comprehensively consider the incidence and damage severity caused by cotton aphids at the plot scale, the disease index (DI) was calculated with Equation (1) [
25,
26] and used as a comprehensive index to evaluate the incidence degree of cotton aphids in this study.
The disease index was divided into four grades (G0–G4): G0 for healthy (0), G1 for mild (0–25%), G2 for moderate (25–50%), G3 for severe (50%–75%), and G4 for profound (>75%).
2.3. Construction and Reconstruction of the Initial Spectral Indices
Under the stress of disease, the physiological and biochemical characteristics as well as the apparent morphology of crops varied, which in turn caused the changes in the spectral indices. Therefore, its spectral response could be regarded as a function of pigment, water, morphology, and structure changes [
27,
28,
29]. SI formulated with the sensitive bands could reflect the fluctuation of the spectral response to realize the detection of the disease in light of the pathological mechanism [
30,
31]. In this study, we constructed the SIs through two stages. The first stage was the construction of the initial SIs, where all possible combinations of six types of hyperspectral SIs were constructed to simplify the spectral detection of cotton aphid damage including three three-band SIs and three two-band SIs. These initial SIs were then reconstructed with the healthy samples in the second stage to evaluate the cotton aphid damage effectively.
In the first stage, three types of two-band SIs including the differential spectral index (DSI), normalized difference spectral index (NDSI), and the ratio spectral index (RSI) were calculated for the construction of the two-band initial indices, and three types of three-band indices that improved the differential spectral index (IDSI), photochemical spectral index (PSI), and the ratio spectral index (IRSI) were constructed as the three-band initial indices based on the raw spectral response from the full band range. These were formulated as follows:
In Equations (2)–(4), were all possible two-band combinations in the range of 325–1075 nm; and were the reflectance values at , respectively. In Equations (5)–(7), were all possible three-band combinations and were the reflectance at , respectively.
Let
m be the number of training samples. After the construction and calculation of the initial SIs for each training sample, we reconstructed them based on the SI difference between the sample
and each healthy one in the training dataset to highlight their possible difference. The RISI, defined by the mean absolute value of the difference, is represented by Equation (8), where the higher the RISI, the greater the difference from healthy, and the more serious the disease, and vice versa. To some extent, the influence of illumination, canopy, and blade structure could be suppressed with the RISI.
where
is the
SI of the healthy sample
, and
is the number of healthy samples in the training dataset.
According to Equation (8), the RISI sequence of the training samples can be achieved, and can be the sequence of reconstructed DSI (RDSI), reconstructed RSI (RRSI), reconstructed NDSI (RNDSI), reconstructed IDSI (RIDSI), reconstructed IRSI (RIRSI), and reconstructed PSI (RPSI), where represents the RISI of the training sample .
2.4. Selection of the Sensitive RISI
Donate
as the sequence of
in ascending order, and its corresponding severity levels caused by cotton aphids are represented as
. The RISI indicates the difference between the sample and all of the healthy samples in the training dataset. The lower the disease severity level, the closer to the healthy ones, and the higher rank in the ideal level sequence, while the higher severity level induced a lower rank in the desired sorting sequence. Therefore, donate
,
,
,
,
as the number of samples in the training set at the levels of 0, 1, 2, 3, and 4 respectively, then the ideal disease level sequence of
can be formulated as
, where there are
samples with G0, followed by
samples with G1,
samples with G2,
samples with G3, and
samples with G4. Then, the consistency
between
and the ideal disease level sequence is represented as follows:
where
,
;
is the number of samples with grade
at the same position in both sequences; and
is the number of samples in the training dataset. According to Equation (9), the KIs of all possible combinations in RDSI, RRSI, RNDSI, RIDSI, RIRSI, and RPSI were calculated.
After the calculation of the KIs for all RISIs, the sensitive band combinations for each type of RISI were selected with the maximum KI, that is, the severity grade sequences of the RISIs similar to the ideal sequence were taken as the sensitive band combination of RDSI, RRSI, RNDSI, RIDSI, RIRSI, and RPSI, respectively. If the maximum KI was obtained for more than one RISI, the severity ratio between the inter- and intra-grades is defined as:
where
is minimum RISI with grade
of sequence
, which reached the maximum KI, and
,
are the maximum and standard deviation of grade r in the sequence, respectively. The numerator of
describes the boundary difference between adjacent grades. The larger the value, the stronger the discrimination. The denominator of
denotes the intra-grade difference, and the smaller the denominator, the stronger the discrimination. Therefore, if there are more than one maximum KI value, the sensitive RISI is selected with the largest
from RDSI, RRSI, RNDSI, RIDSI, RIRSI, and RPSI, respectively.
2.5. Cotton Aphid Severity Grading
Suppose the ascending sort sequence of optimal RISI is
, the minimum RISI of grade
is
, and the maximum value of grade
is
, then the division threshold
between the grade
and
is represented in Equation (11):
Therefore, if the training sample of optimal RISI is greater than or equal to the threshold
, it would be determined as grade
, otherwise it would be judged as grade
. Therefore, the thresholds
divided the RISI into five intervals, as shown in
Figure 1.
The cotton aphid severity of each test sample was graded based on the level division threshold of the optimal band index. First, the initial SI of the test sample was calculated according to the selected optimal spectral index type and band combination, then they were reconstructed with the health training sample set according to Equation (8). Finally, through the thresholds calculated with Equation (11), the RISI of the test sample determined its corresponding range, thus the severity level of the cotton aphid damage could be obtained.
4. Discussion
With the virus transmission of aphids in a cotton field, ruinous damage of yield and cotton quality is inevitable, which causes huge economic loss and environmental pollution problems [
3]. The automatic detection method of crop disease severity level has become increasingly important in the field of agricultural diseases and pests [
2,
7,
10]. The crops have been treated with inoculum in most of the studies on crop diseases and pests [
23,
33], which has made it easier to detect the infected samples. In this study, the cotton field was in a natural environment, without any inoculum treatment, and the cotton aphids were the result of natural processes. Previous studies have explored the discrimination of cotton aphids at the leaf scale and showed a clear correlation between the leaf reflectance and cotton aphid DI [
34]. This correlation provides a basis for the exploration of the automatic grading method of aphid damage on a larger scale. In this study, the natural environment and monitoring scale made the automatic grading method more challenging and valuable.
Traditionally, the existing binary classifiers or threshold segmentation methods have been adopted to distinguish between healthy and infected samples, but not the multiple disease severity levels. Meanwhile, grading disease severity was regarded as a multi-class classification problem in machine learning, and it was mainly considered that there was no correlation among multiple levels. However, the severity levels of the crop diseases and insect pests were not independent classes without any inter-class correlation, and the disease severity classification based on existing classifiers in machine learning usually had weak interpretability. The automatic disease severity grading approach proposed in this study could help to fill this gap. The proposed approach mined the inherited relations among different severity levels with RISIs and calculated the level intervals, which gave it stronger interpretability than the traditional multi-class classifiers, and the threshold setting between the adjacent disease levels made it easier to determine the severity grades of the diseases and pests. In grading the disease severity caused by cotton aphids, the damage severity could be divided into five ranges based on sensitive RIRSI (R732, R878, R712), and 0.006, 0.018, 0.029, 0.041 were adjacent thresholds for interval division. As far as multi-class classifiers in grading cotton aphids, pairwise class discrimination was needed to optimize the classifier parameters, that is, 10 inter-class discriminations, and it was difficult to obtain the level range of high dimensional data, but the proposed method only distinguished between the adjacent levels and thresholds and level range could be acquired with five discriminations.
A large number of contiguous narrow bands in disease detection resulted in data redundancy, which increased the difficulty in data processing [
19]. It was significant to develop a selection method of sensitive spectrums or spectral indices and use a limited number of wavebands that were strongly correlated to crop diseases in grading the damage severity level. In this study, it was difficult to detect the cotton aphid severity level due to the large number of hyperspectral wavebands with the full range of 345–1075 nm. The complex problem of sensitive waveband selection was converted to the distance ranking based on reconstructed spectral indices with healthy samples, which was a huge difference between this study and other studies with regard to the severity classification of disease and infection [
14]. The proposed dimension reduction method by selecting the sensitive waveband was evaluated using three types of hyperspectral indices on all possible two- and three-band combinations. As shown in
Table 3, among the sensitive band combination selected from each type of spectral indices, the minimum OA, AA, and Kc values were 77.8%, 72%, and 0.714, respectively, and the best performance with OA = 94.4%, AA = 90%, and Kc = 0.928. The perfect grading performance shows that the proposed sensitive band selection method could be used to grade the damage severity of cotton aphids. Additionally, in [
23,
24], the difference between the healthy and infected samples was introduced in disease detection, which contributed to the improvement in the detection results. In the study, the distance from healthy samples were computed, which not only highlighted the difference between the infected and healthy samples, but also suppressed the similar growing conditions between them such as the management practice, weather and soil conditions, etc., thus the distance between healthy cotton and cotton infected by cotton aphids could have great potential in eliminating the influence of the growing conditions to improve the disease severity grading performance.
The relationship between NDSI, RSI, and aphid infestation severity was observed with traditional linear regression method at leaf scale in [
34], as shown in their contour maps, NDSI of 600–1100 nm and 1400–2000 nm, and RSI of 1850–2100 nm were “hotspots” with high coefficients; NDSI (678, 1471) and RSI (1975, 1904) were determined from these sensitive regions. However, in order to be consistent with Landsat Thematic Mapper spectral 1–5, 690–760 nm was not analyzed in their study, which was exactly the red edge region with high relation to many plant diseases and insect pests [
23,
24]. Compared with their results, as shown in
Figure 3,
Figure 4 and
Figure 5, the sensitive spectral index RIRSI (R732, R878, R712) were derived within the full range of 345–1075 nm at the 1 nm interval in our study, which consisted of the 732 and 712 nm bands in the red edge area.
Figure 3b,c shows the contour maps of RNDSI and RRSI, which were based on the traditional NDSI and RSI. It was obvious that both of the contour maps had a similar distribution of “hotspots”, the three-edge regions including the red edge, yellow edge, and blue edge regions as well as 850–900 nm obtained greater KI values. The red areas showed the highest correlation, while the blue areas showed the lowest, which was in accordance with [
34]. Whereas the optimal bands of RNDSI and RRSI selected in this study were not the same as that in [
34], this might be due to the following. (1) The authors in [
34] regarded disease severity grading as a regression problem and their contour maps corresponded to the continuous disease index, while the contour maps in
Figure 3 in our study corresponded to the discrete ideal sequence. (2) Compared to the leaf level hyperspectral measurements in their study, the spot level was adopted in this study, and the contour maps were at different scales. (3) In their study, the red edge regions were not considered.
After the spectral index reconstruction, the sensitive spectral indices, RDSI (R702, R715), RNDSI (R643, R656), RRSI (R656, R643), RIDSI (R1051, R1049, R747), RPSI (R519, R438, R706), and RIRSI (R732, R878, R712) were selected from each type of RISI on the possible band combination. RRSI (R656, R643) obtained the best performance of the two-band combination with OA = 88.9%, AA = 80%, Kc = 0.855, and RIRSI (R732, R878, R712) showed the best performance with OA = 94.4%, AA = 90%, Kc = 0.928, which demonstrated that the type of differential RISI including the improved differential RISI had a stronger correlation to the damage severity levels than other types of RISIs. Additionally, the red-edge was indicative of crop growth and nutrition status [
23]. In this study, it was found that the reflectance of the red-edge region was highly correlated with cotton aphids.
Although the method has achieved good grading results on cotton aphids, the samples in this study were limited. In future research, a larger number of samples under complex conditions such as longer time, various regions and crops as well as different diseases and pests will be collected, and more types of SIs will be used to further evaluate and improve the performance of the grading method.