1. Introduction
Powdery mildew (
Blumeria graminis), a crop disease, and aphids (
Sitobion avenae), an insect pest, are both destructive and occur almost each year in major winter wheat growing regions in China [
1,
2]. These two threats can result in a serious loss of grain yield and quality, the annual average occurrence area of powdery mildew was recorded to be as high as 10 million ha during 2000 to 2016, annual aphid damage affects 13 million ha and causes of up to 40% wheat yield loss in China [
3,
4,
5,
6,
7]. In practice, wheat powdery mildew and aphids tend to occur in fields unpredictably, making real-time characterization, identification, and classification of different diseases and pests very necessary to mitigate the problems associated with disease and pest monitoring and pesticide overuse [
8].
Remote sensing technology is an important alternative of traditional manual scouting in crop disease and pest monitoring. Some researchers demonstrated the feasibility of remote sensing technology in detecting and differentiating crop diseases and pests according to hyperspectral analysis. For instance, Feng et al. [
9] suggested that the best two-band vegetation index ranges for powdery mildew detection of different incidence levels were between 570–590 nm and 536–566 nm for the ratio index, and 568–592 nm and 528–570 nm for the normalized difference index. Riedell et al. [
10] characterized leaf reflectance spectra of wheat damaged by Russian wheat aphids and greenbugs, finding the chlorophyll concentrations of the plants damaged by the two aphids significantly influenced the reflectance in the 625–635 nm and the 680–695 nm ranges. Huang et al. [
11] found single wavelengths around 400 nm, 500 nm, and 750 nm were highly relevant for wheat leaves diseased with powdery mildew, single wavelengths around 540 nm and 750 nm were relevant to wheat yellow rust, and single wavelength around 400 nm was relevant to wheat aphid infection. Based on these findings, they developed four new spectral indices and successfully identified healthy leaves and leaves infected with powdery mildew, yellow rust, and aphid using them. In addition, based on an advanced hyperspectral analysis technique, continuous wavelet analysis, Shi et al. [
4] determined the most sensitive wavelet features (WFs) for the identification of yellow rust and powdery mildew in winter wheat. Although these hyperspectral based studies gave more detailed information and demonstrated the effectiveness of hyperspectral sensors in detecting and discriminating crop diseases and pests, its high hardware and computational costs restrict its application over large areas [
12,
13]. Based on the acceptable spatial and temporal resolutions, multispectral satellite technique becomes a feasible method for crop diseases and pests monitoring [
8,
14,
15]. For instance, based on Landsat-5 Thematic Mapper (TM) data, Mirik et al. [
16] successfully assessed the infection and progression of wheat streak mosaic. Navrozidis et al. [
17] demonstrated that field spectroscopy and wide area remote sensing (i.e., Landsat-8) can be used to create sufficiently accurate quantification models of crop disease severity. Furthermore, relying on a relative spectral response function (RSR function) spectral simulation, Yuan et al. [
12] converted canopy hyperspectral signals to broadband reflectance corresponding to seven high-resolution satellite sensors and channel settings, and simulated some classic vegetation indices to discriminate three typical diseases and an insect pest of winter wheat, and their results indicated the feasibility of high resolution multispectral satellite sensors for discriminating crop diseases and pests. By developing a set of normalized bi-temporal vegetation indices using PlanetScope image datasets at a 3-m spatial resolution, Shi et al. [
8] mapped and evaluated the damage caused by rice dwarf, rice blast, and glume blight at fine spatial scales. These results motivate us to attempt to discriminate wheat powdery mildew and aphid using multispectral satellite imagery.
For one thing, different diseases and pests could cause similar stresses and symptoms such as discoloration, wilting, and rot. For another, in different growth periods, the occurrence and epidemic law of different diseases and pests are different. Both of which may result in confusion for multiple damage detection using a single-date satellite imagery. The information gathered on within-field variability in growth conditions and diseases and pest infestations is important for precision crop diseases and pests monitoring through multi-temporal remote sensing imagery [
8,
14,
18]. Furthermore, the occurrence and development of crop diseases and pests not only are related to crop growth conditions, but also require appropriate environmental conditions such as temperature, humidity, etc. [
19]. The monitoring accuracy of crop diseases and pests could be improved by integrating environmental information [
3,
19]. The effectiveness of field environmental parameters such as land surface temperature (LST), soil water content (SWC) and the tasseled cap transformation features (Greenness and Wetness) based on remotely-sensed shortwave infrared and thermal infrared information of Landsat-8 imagery for crop disease and pest monitoring have been demonstrated [
3,
19,
20]. However, most existing models for monitoring crop diseases and pests by remote sensing focus on detection and monitoring of crop damages using corresponding single-date imagery; meanwhile, crop environmental characteristics have not been considered [
13,
20,
21,
22]. Some other scholars only considered either temporal information or crop environment in disease and pest monitoring instead of both factors, few studies combined the information from these two aspects into disease and pest monitoring and differentiation [
8,
19]. Therefore, it is necessary to evaluate the feasibility of remotely sensed feature set integrating multi-temporal crop growth indices and environmental factors in monitoring and discriminating crop diseases and pests.
There is often a situation in the field where one crop stress is dominant and other stresses are mild but important. An imbalanced data set is formed if one class has a significantly different number of samples from other classes for a field survey experiment. For an imbalanced data set, more attention needs to be paid to the minority class that contains more valuable information [
23]. However, when samples of the majority class in a training data set vastly outnumber those of the minority class, traditional data mining algorithms tend to ignore the minority class because of the pursuit of global accuracy [
24,
25]. The synthetic minority oversampling technique (SMOTE) proposed by Chawla et al. [
26] was a popular method through oversampling at the connection between the current samples of the minority class to get synthetic samples of the minority class to balance the proportions in classes. This method was widely used in combination with a variety of traditional classification methods to solve the classification problem of imbalanced data [
27,
28].
Back propagation neural network (BPNN) is a popular classification method for its back propagation-learning algorithm, which is a mentor-learning algorithm of gradient descent, or its alteration [
29]. BPNN is a multilayer mapping network that minimizes an error backward while information is transmitted forward [
30]. The BPNN method can implement any complex nonlinear mapping function proven by mathematical theories and approximate any arbitrary nonlinear function with satisfactory precision, which makes BPNN popular for predicting complex nonlinear systems [
31,
32]. The BPNN method has some advantages such as simple architecture, easy model construction and rapid calculation speed [
33]. The BPNN method has been widely used for classification [
33,
34,
35,
36,
37]. These existing successful cases support the use of BPNN in this study for the discrimination of wheat powdery mildew and aphid.
In this study, a coupled SMOTE-BPNN model integrating bi-temporal growth indices and environmental factors has been developed, which can accurately discriminate different damages in winter wheat. The simultaneous outbreak of wheat powdery mildew and wheat aphid was chosen for the case study. Wheat powdery mildew and aphid occurred in the Shijiazhuang area of Hebei Province, China, during the spring of 2014. Bi-temporal Landsat-8 imagery was used in this study. Both wheat growth and environmental parameters were used in combination. The aims of this study were: (1) to evaluate the performance of the coupled SMOTE-BPNN classification models for mapping the damage from the disease and pest; and (2) to assess the impact of the feature set consisting of bi-temporal growth indices and environmental factors on the accuracy of the classification models when it is considered as an input parameter.
4. Discussion
We found that classification using bi-temporal growth indices and environmental factors resulted in the highest accuracies among the four different feature sets for discriminating healthy, powdery mildew infected, and aphid damaged winter wheat through the three methods. Only the SMOTE-BPNN model obtained acceptable results for all three classes (i.e., healthy, powdery mildew infected, and aphid damaged) among the three SDG-based models. The BTGE-based SMOTE-BPNN method was also found to produce the most accurate classification for the two minority classes (i.e., healthy and aphid damaged). This suggests that our proposed SMOTE-BPNN method combing bi-temporal growth and environmental parameters improved overall crop disease and pest discriminating accuracy.
Typically, the disease causes the changes of biophysical and biochemical parameters of plants, such as pigments, water content and canopy structure as well as leaf color changes due to pustules or lesions [
72]. Meanwhile, pest damages can also cause a reduction in pigment concentrations especially chlorophylls and leaf water content in the infested leaf and the destruction of infested leaf tissues [
6,
73]. These changes can influence the tissue optical properties and alter the spectral response characteristics [
13,
74,
75]. The reduction of chlorophyll concentrations and water content in the leaf, damaged by aphids piercing the leaf and sucking out leaf juice, results in a higher reflectance in the visible and SWIR regions than the non-infested leaf [
6]. The leaf tissue destructed by aphid infestation leads to a lower reflectance than the non-infested leaf in the NIR region [
73]. The raw reflectance of leaves diseased by powdery mildew has a significant increase in the visible spectral region and a slight decrease in the NIR region over that of the healthy wheat leaves. Similar spectral characteristics of powdery mildew and aphids were also observed in the present study (
Figure 7). The chosen indices (i.e., DVI, PDI, SIPI, SIWSI, SR and TVI) exhibit remarkable performance on monitoring and discriminating powdery mildew and aphids. These indices enable transformation of raw spectra into more meaningful metrics of the disease and pest damage. Furthermore, two environmental factors LST and Greenness had also been used for discriminating wheat disease and pest in this study. Their contributions for classification was evaluated. LST extracted from satellite imagery has been identified as one of the sky parameters controlling the physical, chemical, and biological processes at the interface between the earth and the atmosphere [
76,
77]. LST is also an effective means of partitioning latent heat fluxes, which provides information on micro-environmental conditions such as crop respiration and evapotranspiration [
19,
78]. Meanwhile, Greenness is responsive to the characteristic of healthy green vegetation that has high absorption of chlorophyll in the visible region and high reflectance of leaf tissue in the NIR region, so it reflects overall crop growth conditions and is suitable for the characterization of field environment [
19,
79]. These two environmental parameters influence the occurrence of crop diseases and pests. Our results revealed the relationship between the environmental factors and the development of the crop conditions affected by the disease and pest (
Figure 3) and demonstrated the positive contributions of environmental factors for the discrimination of different diseases and pests (
Figure 5,
Table A2,
Table A3 and
Table A4).
Compared with single-date feature sets, the newly proposed bi-temporal feature sets performed better on discriminating and mapping healthy wheat, powdery mildew infected and aphid infested wheat (
Figure 5,
Table A2,
Table A3 and
Table A4). Factors such as phenological, cultivation, and crop conditions may lead to responses of the same features which fluctuate following the disease and pest infestations [
80]. The occurrence and development characteristics of different diseases and pests are different. The bi-temporal variations help to eliminate field anomalies other than the disease and pest infestations [
8]. For example, the bi-temporal features could characterize the pigments and canopy morphology variations better, and indicated the relative importance of the combination of multi-temporal features in discriminating crop disease and pest. Currently, the quick development of precision agriculture requires finer field details and higher temporal resolutions. Due to the special location of the study area, two scenes with a time interval of less than 16 days were successfully acquired. Hence, the Landsat-8 imagery was successfully used for the discrimination of the wheat disease and pest in this case study. Furthermore, although the dates of the image acquisition and field survey were very hard to keep consistent due to the influence of the sensor revisit period and cloud cover, the chosen investigation dates in this study were a critical period when the occurrence and development potential of crop disease and pest remained consistent and stable [
41,
42,
43,
81]. Therefore, the obtained samples could effectively reflect the occurrence and development of disease and pest, and our good results demonstrated such effectiveness of the samples. On the other hand, although some high spatial-temporal resolution satellite images (i.e., Worldview-2, PlanetScope, SPOT-6 and so on) have been used to monitor and discriminate different crop diseases and pests, the corresponding environmental characteristics cannot be obtained from these sensors due to the limitations of their available bands [
8,
13,
19]. Therefore, in the future, field investigation and satellite acquisition dates should be as consistent as possible, and multi-source remote sensing data should be fused for crop disease and pest discrimination.
The feasibility of each type of remotely sensed feature sets in describing different crop damages was assessed using different feature combinations balanced by SMOTE as input variables in BPNN (SMOTE-BPNN). For disease and pest discrimination, the positive contributions of environmental information and the importance of the two temporal images were confirmed in this research by developing models using the four different feature sets (BTGE, BTG, SDGE, and SDG) (
Figure 5). Based on the bi-temporal growth indices and environmental factors, all three methods (i.e., SMOTE-BPNN, BPNN and SVM) had similar overall accuracy values and the classification accuracies for the healthy plots were all acceptable for the three methods based on this feature set. However, the classification accuracy for the aphid damaged plots using SMOTE-BPNN increased by 25.0% compared with the accuracy using the other two methods. Meanwhile, the G-means of the SMOTE-BPNN model based on the feature set was 8.0% and 9.0% higher than that for BPNN and SVM, respectively (
Table 6). The results proved that the bi-temporal growth indices and environmental factors-based SMOTE-BPNN were an effective approach for automatic discrimination among healthy wheat, powdery mildew infected wheat and aphid damaged wheat. This approach performed well in the classification of the small or rare classes. Additionally, although the imbalance of the training data had been improved by the SMOTE algorithm, it was not optimal (
Table 5). The disadvantage of SMOTE reported is that, since the separation between majority and minority class clusters is not often clear, noisy samples may be generated, resulting in a new sample set that may not be the best one [
82]. Some modifications of SMOTE have been proposed [
83,
84]. Therefore, more suitable methods for imbalanced data should be further studied for application to the discrimination of imbalanced crop diseases and pests.
Overall, the proposed SMOTE-BPNN model integrating bi-temporal growth indices and environmental factors performed better in discriminating damages in winter wheat based on Landsat-8 satellite imagery, with a good accuracy of 82.6%. In this study, our goals were to improve the accuracy of the discrimination models through the integration of multi-source and multi-temporal remotely sensed data, thus providing a detailed spatial distribution of crop diseases and pests to meet the current needs of precision agriculture. Meanwhile, the combination of the SMOTE resample algorithm and the BPNN classification method made the classification of imbalanced data more accurate. However, limited by the spatial-temporal resolution of Landsat-8 images, although typical areas of diseases or pests infestation were firstly chosen and a diagonal five point sampling method based on the combination scheme of random sampling and representative sampling was then used to characterize the damage severity of each sample plot, it is still very difficult to eliminate the influence of the mixed pixel problem. In addition, our research was based solely on remote sensing data. In future research, multi-source information such as climate data and geographic data that are able to eliminate the classification uncertainty should also be incorporated. Some satellite data with finer spatial-temporal resolution should also be used in crop pest and disease monitoring and discrimination by fusing medium resolution satellite data containing environmental information. Additionally, more advanced methods that are more sensitive to imbalanced data can be tested to further improve the stability and reliability on crop disease and pest discrimination.