Early Onset Yellow Rust Detection Guided by Remote Sensing Indices

Thirugnana Sambandham, Venkatesh; Shankar, Priyamvada; Mukhopadhaya, Sayan

doi:10.3390/agriculture12081206

Open AccessArticle

Early Onset Yellow Rust Detection Guided by Remote Sensing Indices

by

Venkatesh Thirugnana Sambandham

^1,2,*,†

,

Priyamvada Shankar

¹ and

Sayan Mukhopadhaya

^1,†

¹

BASF Digital Farming GmbH, 50678 Köln, Germany

²

Faculty of Computer Science, Otto-Von-Guericke-Universität, 39106 Magdeburg, Germany

^*

Author to whom correspondence should be addressed.

^†

These authors are the co-first authors of this work.

Agriculture 2022, 12(8), 1206; https://doi.org/10.3390/agriculture12081206

Submission received: 30 June 2022 / Revised: 8 August 2022 / Accepted: 10 August 2022 / Published: 12 August 2022

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Early warning systems help combat crop diseases and enable sustainable plant protection by optimizing the use of resources. The application of remote sensing to detect plant diseases like wheat stripe rust, commonly known as yellow rust, is based on the presumption that the presence of a disease has a direct link with the photosynthesis capability and physical structure of a plant at both canopy and tissue level. This causes changes to the solar radiation absorption capability and thus alters the reflectance spectrum. In comparison to existing methods and technologies, remote sensing offers access to near real-time information at both the field and the regional scale to build robust disease models. This study shows the capability of multispectral images along with weather, in situ and phenology data to detect the onset of yellow rust disease. Crop details and disease observation data from field trials across the globe spanning four years (2015–2018) are combined with weather data to model disease severity over time as a value between 0 and 1 with 0 being no disease and 1 being the highest infestation level. Various tree-based ensemble algorithms like CatBoost, Random Forest and XGBoost were experimented with. The XGBoost model performs best with a mean absolute error of 0.1568 and a root mean square error of 0.2081 between the measured disease severity and the predicted disease severity. Being a fast-spreading disease and having caused epidemics in the past, it is important to detect yellow rust disease early so farmers can be warned in advance and favorable management practices can be implemented. Vegetation indices like NDVI, NDRE and NDWI from remote-sensing images were used as auxiliary features along with disease severity predictions over time derived by combining weather, in situ and phenology data. A rule-based approach is presented that uses a combination of both model output and changes in vegetation indices to predict an early disease progression window. Analysis on test trials shows that in 80% of the cases, the predicted progression window was ahead of the first disease observation on the field, offering an opportunity to take timely action that could save yield.

Keywords:

yellow rust; multispectral images; weather data; XGBoost; winter wheat; vegetation indices; NDVI; NDWI; NDRE; auxiliary feature

1. Introduction

Agriculture is faced with the enormous challenge of feeding a growing world population [1]. In this regard, sustainable intensification of agriculture production is needed to combat a growing global food scarcity and world hunger problem. A major threat regarding this challenge is caused by plant diseases and pests [1,2]. The Food and Agriculture Organization (FAO) of the United Nations (UN) estimates a global yield loss of 20 to 40 percent each year due to plant pests and diseases resulting in about USD 300 billion loss to the global economy [3]. The top five crops, wheat, rice, maize, potato and soybean, which account for half the global human calorie intake, are at threat too. It is concerning that the top two major (wheat and rice) staple crops are the most hard-hit by pests and diseases, accounting for 20–30% of yield losses each year, putting a major strain on the United Nations sustainable development goal of “zero hunger” [4,5].

Wheat stripe rust (WSR), or yellow rust as it is commonly known, is a disease caused by a fungus called Puccinia striiformis f.sp. tritici (Pst) [6]. Over the years, yellow rust outbreaks have been observed in Asia and Africa which resulted in losses of 5.5 million tons of yield each year [7,8]. Due to its capability to migrate long distances and mutate to adapt to different climates, farmers are extremely cautious with this wheat pathogen. A severity of more than 5% is considered to be a “damaging epidemic” [9]. Hence, it is necessary to detect this disease at an early stage. Scouting, which means field measurements of disease incidence and crop injury, is a highly crucial task in any agricultural activity [10]. Traditionally, this has been carried out by farmers themselves or by experienced assessors, called scouts. However, as the size of farms increases, the workload around farming increases too, so looking after the farm manually becomes a tedious job overall. Plant diseases are heterogeneously distributed across fields, making it cumbersome to know the right location and extent to scout [11]. This can lead to farmers missing out on crucial information at the right time regarding the extent and severity of disease occurrence; hence, there is a need to plan disease management strategies.

The latest advancements in the field of remote sensing have made it possible to collect valuable information regarding crop health without any physical contact or extensive manual labor [12,13]. Such non-invasive scouting is carried out with the help of optical sensors [10]. Capabilities of optical sensors combined with advancements in GIS and IoT allow us to obtain scouting results that could be incorporated into precision agriculture [14]. Optical sensors used for disease detection are of four types [15]:

RGB (red, green and blue) sensors;
Multi- and hyperspectral reflectance sensors;
Thermal sensors;
Fluorescence imaging.

Using multi- and hyperspectral reflectance sensors for disease detection is based on the amount of emitted light from a crop canopy in specific wavelengths of the Electro-Magnetic Spectrum (EMS). The two types of spectral sensors, multispectral and hyperspectral, are based on the number of bands and their narrowness of EMS. Multispectral sensors result in imagery with 3 to 10 bands such as red, green, blue, red edge, near infrared (NIR), short-wave infrared (SWIR), etc. Hyperspectral on the other hand has multiple narrower bands (100–1000 bands) [16]. Instead of monitoring reflectance from individual bands, they can be combined in the form of Vegetation Index (VI), too. Vegetation indices such as the Normalized Difference Vegetation Index (NDVI) [17], the Normalized Difference Red Edge Index (NDRE) [18] and the Normalized Difference Water Index (NDWI) [19] are good indicators of vegetation condition and disease incidence in a field [20,21]. Therefore, such spectral sensor-based vegetative indices are considered to be most valuable for site-specific disease management [22,23]. Figure 1 represents the changes in NDVI at a specific field affected by yellow rust by assessing a time series of satellite images. In the time series, it is clearly visible that as the severity of yellow rust disease increases in the field, the NDVI value drops significantly, as shown by the dropping off in greenness of the field. In general, healthy plants look green as they absorb blue and red light and mostly reflect NIR and green light [24]. However, as there is a disease manifestation, there is reversal of spectral reflectance and absorption; hence, there is a drop in greenness.

In this paper, our major focus lies on disease detection from multispectral imagery. Remote sensing in this regard relies on the aforementioned optical sensors mounted on satellites to measure the amount of light reflected and/or emitted from plants at specific wavelengths to estimate the chlorophyll content which is an indication of plant health. Healthy plants are usually green because they absorb the blue and red light and have higher reflectance of NIR and green in the visible wavelength [24]. However, when the plant is diseased the spectral reflectance and absorption are reversed [25]. Therefore, remote sensing can be a supplement to manual scouting, used as a diagnostic tool that can detect and quantify plant diseases in an automated and objective manner versus manual scouting which is less efficient and subjective to human judgement [15]. However, it must be considered that this is not true for all plant diseases. Only those diseases causing a visible effect on the physiology of plants such as reduction in biomass, decrease in Leaf Area Index (LAI), lesions caused by infections, destruction of pigments, wilting, etc., have shown to be a good candidate for remote-sensing-based detection [26,27]. Diseases that do not cause such physiological changes but still correspond to crop stress are difficult to detect with remote sensing [27]. Since yellow rust causes visible effects on the physiology of plants, as shown in Figure 1, it is a good candidate for being detected with remote sensing.

Disease models aimed at forecasting yellow rust disease and giving early warning to farmers that allows for sufficient time to plan and execute management practices have been around for a while. Many existing works focus on identifying the relationship between weather parameters and disease progression [9,28]. The estimates from these models are then used to determine a time window at which the progression of the pathogen is intense, during which mitigation strategies like spraying can be carried out. However, climate change has shown to alter host–pathogen interactions, which prompted the exploration of other data sources that can improve model predictions [29]. Images from Unmanned Aerial Vehicles (UAVs) [30], hyperspectral remote-sensing data [31] and remote-sensing-based indices derived from multispectral imagery have been used for yellow rust prediction [13,32,33,34]. Apart from identifying a good combination of vegetation indices from the existing list that can indicate the presence of disease in a field, a new spectral index aimed at detection of yellow rust has been proposed as well [34]. A detailed review of various disease-forecasting models showed that most disease models can be categorized into three classes: models built on weather data, models built on imagery data and models built on a heterogeneous combination of the multiple data sources including weather and imagery data [35]. While most works fall into the first two categories, there are very few works that explored a combination of multiple data sources for disease forecasting. However, those that combined different data sources showed improved performance. In this regard, three works have been identified that combine weather and remote-sensing data for prediction of yellow rust [13,33,36]. In the recent work by [36], it was pointed out that the addition of crop-growth information may enable prediction at a regional scale but has not been explored yet. To the best of our knowledge, a combination of in situ observations, weather data, remote sensing and phenology data for yellow rust prediction has also not yet been explored.

In this study, a method for early onset detection of yellow rust is proposed. Since yellow rust is a fast spreading disease, and with a disease severity above 5% considered as an epidemic, early detection of disease is crucial. However, due to there being multiple actors involved, vegetative indices derived from remote sensing alone are not enough to identify disease onset. Therefore, the effect of combining in situ observations, weather data, remote sensing and phenology data for onset detection of yellow rust is studied. Novel approaches to combine these data features as a mix of primary and auxiliary features are explored.

2. Materials and Methods

This study involves three kinds of data: in situ data or field collected data, weather data and remote-sensing data.

2.1. Field Data

Field trials are the most important part of any agricultural research work. These trials help gather absolute ground truth data for a targeted use case or feature. In this case, the target feature is the severity of yellow rust on the winter wheat crops. The field data is collected across 16 countries in total over a period of four years from 2015 to 2018 with minimum plot size of 50 m in both length and breadth. To understand the true disease dynamics, no controlled actions have been taken; the trials are kept untreated. This helps to formulate better management practices like when to spray pesticides or fungicides so that the growth of the disease can be controlled. A handheld Global Positioning System (GPS) with an overall average accuracy of 2.0 m to 5.0 m was used to obtain the geo-coordinates of the in situ observations. The accuracy of these handheld GPS devices is optimal since most of the remote-sensing sources have a spatial resolution ranging from 10 m to 30 m. So location inaccuracies in the in situ measurements are negligible. The distribution of all the fields in the study across Europe can be found in Figure 2.

During field trials, disease assessments are made at individual levels of the leaf. However, this is too granular and makes generalization difficult for a machine-learning algorithm. Studies have shown that the top three leaves have the highest impact on the yield and contribute up to about 80% of the wheat yield [37,38]. According to [38], infections on FL (flag leaf) and FL-1 have the highest impact on overall yield of the plant, where FL represents the flag leaf level or the canopy of the crop, and FL-n represents n levels below flag leaf. Based on these studies, an aggregation function was defined. The aggregation function is a weighted sum of disease severity at different leaf levels. The highest weight is given to the severity at FL and the subsequent leaf layers are given a lower weightage. A patent (this formula has been filed for patenting at the European Patent Office and internationally published with the following details: Application EP2021056340 2021-03-12 Publication WO2021180925 2021-09-16) has been filed representing this aggregation function [39].

The output of the aggregation function is the intensity of yellow rust severity ranging from 0 to 100 with 0 being no severity and 100 being a complete infestation of Puccinia striformis in the crop. Figure 3 and Figure 4 represent the severity of yellow rust and its distribution on different leaf layers addressed with respect to the flag leaf. Apart from the disease severity; phenological and temporal features like growth stage, previous crops and sowing date were also acquired from in situ measurement. The growth stage is represented in BBCH (Biologische Bundesanstalt, Bundessortenamt und Chemische Industrie) scale [40] of 0–100 with 0 representing seed treatment before planting and 100 representing post-harvest or storage harvest stage. The values are recorded at a gap of two to three weeks.

A total of 221 trials were conducted, resulting in around 700 Puccinia striformis observations in total. The intermediate field observations from each trial are conducted with a gap of around 5–10 days; therefore, the severity value in the intermediate days are interpolated using a Gompertz function [41]. After this there were around 6700 data points to train and test the models. Figure 3 shows the field-collected data with interpolated disease severity in the observation window.

2.2. Weather Data

Yellow rust caused by Puccinia striiformis has a very strong correlation with the behavior of mean and max temperature of a day [9]. In addition to that precipitation, sunshine duration and other weather parameters were also found to have a significant impact on the progression of yellow rust on winter wheat crops. So the weather parameters are very important features considered while training a model to predict the severity of yellow rust disease. The geo-location details that were collected from in situ measurements were used to fetch the grid from which the weather data is acquired. The weather data is based on Iteris (https://docs.clearag.com; (last accessed: 1 August 2022)) and Arable weather API (https://www.meteomatics.com/en/weather-api; (last accessed: 1 August 2022). These weather APIs are a grid-based system that contains observed data from weather stations and modeled data. These grids have a very high spatial resolution of 1 km × 1 km and hence they are up-sampled using models that were built in house.

The weather APIs return parameters such as air temperature (maximum, minimum and average), air temperature at a height of 5 cm, cloud cover percentage, dew point, precipitation (mm and duration), relative humidity (maximum, minimum and average), sunshine duration in hours and wind speed (maximum, minimum and average). Both daily and hourly data of these parameters can be fetched, but for this study, only the daily weather features were used since the temporal resolution of the available disease data is daily as well.

2.3. Remote-Sensing Data

The structure of the yellow rust in situ dataset is organized in such a way that each trial has a point geometry (latitude and longitude). A buffer is given to this point geometry and this buffered region is the region of interest to clip the acquired multispectral images. Different remote-sensing indices (NDVI, NDRE and NDWI) are calculated by different band combinations of acquired multispectral images of the field. A daily availability of these images is not possible considering the constraints with satellite revisits and cloud cover. Hence, to acquire as much data as possible to conduct an efficient interpolation of these remote-sensing features on the missing days, images from multiple providers are collected for dates ranging from 2015 to 2018 (scope of the dataset) for every field involved in this study.

Information about high-medium resolution multispectral satellite sources used in this work and their specifications can be found in Table 1. All these sources capture images in at least four spectrums (red, green, blue, NIR). To handle the differences in resolution, all the images from different sources are bilinearly resampled to 10 m resolution. The mean vegetation indices (VIs), as mentioned in Table 2, are calculated from the reflectances of these multispectral images and are mapped into the dataset. The intermediate points between the mapped values are linearly interpolated and then a windowed rolling mean (15-day window) is applied to the interpolated values. As depicted in Figure 5, this smoothens the jumps that were caused on the indices.

NDVI and all other remote-sensing indices mentioned in Table 2 are calculated and the mean NDVIs of the acquired images are mapped to the table. All the available points are linearly interpolated and then smoothed with a rolling average with a 15-day window.

2.4. Modeling Pipeline

The flowchart of the complete modeling pipeline of the rule-based onset-detection system is described in Figure 6. All the weather, in situ and phenological parameters are compiled and preprocessed. The best regression model is selected from the pool of regression algorithms. The predictions from this model in combination with remote-sensing parameters are then used to detect onset of the disease. This section discusses the modelling pipeline in detail.

2.4.1. Data Preprocessing

After collecting the data as mentioned in Section 2.2, Section 2.1 and Section 2.3, all the features are compiled into a tabular structure. The table consists of groups of trials with records containing the weather, remote sensing, phenological and temporal features (planting date, days after sowing, etc.) starting from the date at which the seed was sown until harvest. In addition to all these features, the disease severity interpolation for the trials is also added to this table. The train–test split of the dataset is achieved at the trial level and not at the data points level. The target variable is interpolated across trials.

The data was split into train and test sets at a 70:30 proportion. The trials range through a time span of 2015 until 2018 and the train–test split is effected in such a way that the trials that occurred in earlier years (2015–2016) were kept in the train set and in the later years (2017–2018) were kept out for testing the models. In the end, there were 142 trials with 4857 data points for training and 71 trials and 1854 data points for testing the models. All the numerical features like the weather parameters were scaled between 0 and 1 using a min–max scaler and all the categorical features are numerically encoded. All the data processing, interpolation and manipulation are handled using the Python-based pandas library [42] and the data visualization is effected using the Python-based Plotly library [43].

2.4.2. Models

As mentioned in Section 2.1, there are only about 6700 data points to train and test the models. Studies have shown that tree-based models such as gradient boosted Decision Trees give superior performance on smaller tabular datasets [44]. Hence, the following three tree-based algorithms were experimented with:

Random Forest [45]—Decision Trees (DT) are a combination of if-then-else statements organized in flowchart-like structures. The objective of the DT algorithm is to find a tree structure that gives an optimal solution for a given input with respect to the intended output. These DTs are built using different algorithms like ID3, C4.5 and CART, which were developed over time. However, a single tree is often termed a weak estimator due to its inability to handle more complex problems. This has been handled by putting together results from different trees. Random Forest is one such approach that randomly generate trees and the final decision is based on majority voting from individual trees [46]. Scikit learn’s [47] implementation of Random Forest regressor was used in this work.
XGBoost [48]—XGBoost, also called Extreme Gradient boosting, is a scalable, GPU-based gradient-boosting algorithm. It is known to have produced state-of-the-art results in many regression and classification tasks. Unlike a Random Forest algorithm where trees are randomly added, the collected trees are learned from the impurities (errors) of previous trees. This approach boosts overall performance of the intended model more efficiently. It offers several advantages over other gradient-boosting algorithms like regularization, null and sparse value handling and so on. Due to its explainability, easy handling and scalablity, it has been used for a lot of applications in medical and finance domains, where compared with the improvements in metrics, consistent monitoring of the model’s sanity in terms of knowledge is more important [49,50].
CatBoost [51]—tabular data is a combination of heterogenous data types. Categorical features are generally vectorized using numerical or one-hot encoding before feeding the inputs into the model. CatBoost is also a type of gradient-boosting algorithm with functions that has in-built functionalities to handle categorical variable in the inputs. Catboost uses Minimal Variance Sampling (MVS) to perform boosting. This technique reduces the number of samples that are required for each iteration of boosting, thereby improving the quality of the intended model.

Apart from the metrics calculated from the model outputs and predictions, these models can also give feature importance scores for each feature based on the normalized gain values.

2.4.3. Evaluation Metrics

Once the models are trained, the predictions from each model are evaluated against the ground truths. The target variable in this analysis is a continuous variable; hence, the most commonly used regression metrics are used to evaluate the models. The regression models are evaluated with the following metrics,

MAE: MAE is the mean absolute difference between the ground truth and the model prediction and is given in Equation (1),

$MAE = \frac{1}{n} \sum_{n = 1}^{n} | y - \hat{y} |$

(1)

where n is the number of data points, y is the interpolated severity value and $\hat{y}$ is the predicted severity. The target values are scaled between 0 and 1; hence, the MAE too will be between 0 and 1.
MSE: MSE is the square of the difference between the ground truth and model prediction given in Equation (2),

$MSE = \frac{1}{n} \sum_{n = 1}^{n} {(y - \hat{y})}^{2}$

(2)
RMSE: RMSE is the square root of mean squared error and is given as,

$RMSE = \sqrt{\frac{1}{n} \sum_{n = 1}^{n} {(y - \hat{y})}^{2}}$

(3)

2.4.4. Rule-Based Onset-Detection Approach

The basic idea is to not use these remote-sensing indices as a feature to train a regression model but to use them as an auxiliary feature to the yellow rust disease prediction model to determine early onset of the disease. The same train–test split is used in this analysis. A regression model was trained on this data with the weather, phenological and in situ features.

\hat{y} = F (X_{Weather}, X_{Phenological}, X_{in - situ})

(4)

where

\hat{y}

is the predicted target variable and X represents the corresponding input features to the model F. A simple rule was fixed based on the trend of the model-predicted disease progression and these remote-sensing indices to estimate the onset of yellow rust severity.

\frac{d (\hat{y})}{d t} > 0 \land \frac{d (X_{Remote Sensing})}{d t} < 0

(5)

Equation (5) was used as a rule; ∧ represents a logical AND operator. The first part of the equation is the gradient of the model predictions and the latter part is the gradient of the remote-sensing indices over time. The rule states that if there is an increasing trend in the model disease prediction and if there is a decreasing trend in the remote-sensing index, a signal to indicate the onset of yellow rust has to be triggered.

3. Results and Discussion

Yellow rust infestation affects the spectral responses of winter wheat crop and hence forms the basis of remote-sensing studies [52]. In this section, the results from experiments using different algorithms are explained. Early onset estimation of yellow rust using a rule-based system guided by a trained model with remote-sensing features as auxiliaries is presented.

3.1. Data-Driven Disease-Prediction Models

The problem at hand is to build a system that could estimate the target feature, i.e., yellow rust disease severity, given the weather, remote sensing, in situ and phenological features. Since disease severity is a continuous variable, a regression model is trained which is given by the following equation,

\hat{y} = F (X_{weather}, X_{Remote Sensing}, X_{Phenological}, X_{in - situ})

(6)

where

\hat{y}

is the predicted target variable and X represents the input features corresponding to the model F.

Regression Model

Regression models based on the previously described algorithms are trained. The hyper-parameters are tuned using a grid-search strategy, where a discrete grid of possible hyper-parameters is fixed. Models are trained and evaluated on all combinations of parameters on the grid to choose the best set of hyper-parameters.

The final obtained metrics on the test set from all three algorithms are as given in Table 3.

The feature importance metrics give a score to each feature based on the impurity reduction in the trees generated by the gradient-boosting models. This could sometimes create some inconsistencies since the hyper-parameters in these models could differ. Therefore, the feature importance is also calculated using Shapley (SHAP) scores [53], which is a game theory-based data-dependent metric. The SHAP scores give an insight into how every feature contributed to the predictions. The overall feature importance is the mean SHAP score of all the features across all the instances of the dataset.

In summary, the regression models performed at an MAE of around 0.16; however, the XGBoost model performed better than the other two models by around only 0.002 MAE. The performance of the model could be attributed to the imbalance in data. In all the available trials, the ground truths are available only if the severity is present. Figure 7 represents the distribution of the target feature at different stages of severity. The dataset is imbalanced, with the majority of observations made at later stages (>15%) of infestation after the onset of disease. As a result, the model is biased and tends to predict the existence of disease even if there is none present. An effort to balance the dataset with conventional resampling techniques like SMOTE and NN did not have a huge impact on the model predictions. Although the model performance is not optimal, the feature importance plots in Figure 8 show that remote-sensing-based features (NDVI and NDWI) appear in the top five features, superseding all the weather parameters that were used to train these models. Therefore, another set of models were trained without using remote-sensing features as input to observe the effect of remote-sensing indices on model performance metrics, the results of which can be found in Table 4. As seen from the table, the features do not have a major impact on the final model performance and hence lead to exploration of other alternative approaches to working with such imbalanced data by corroborating it with remote-sensing data in an alternate manner.

3.2. Rule-Based System for Disease Onset Identification

Based on the observations in the previous section and the need to estimate the early onset of yellow rust owing to its quick development and spread, remote-sensing data of the field could be exploited to analyze the behavior of the vegetation in the field even before the first observation of disease. NDVI and NDRE represents the biomass in a field and NDWI gives an overview of the leaf moisture content in the field. Sentinel-2 also covers the red-edge spectrum which could be used to calculate NDRE. Similar to NDVI, NDRE is calculated from the red-edge band rather than the red band. It is known to complement some saturation issues occurring on the NDVI at later growth stages of the crop [34]. For all the following analyses, only images from Sentinel-2 are used because of the consistent availability of images after 2016.

3.2.1. Remote-Sensing Indices from Sentinel-2

Previously, NDVI and NDWI mentioned in Table 2 were used to train models. In this analysis, red-edge-band-based indices calculated from near-infrared and red-edge channels of Sentinel-2 images like NDRE were also included. The behavior of all these indices across all the test trials was explored. Example time series plots with NDVI, NDWI and NDRE indices can be seen in Figure 9. The NDRE and NDWI indices from different wavelengths of red edges and short wave infrared spectrums of Sentinel-2 images were analyzed. The NDRE1, NDRE2 and NDRE3 indices are calculated using the NDRE formula in Table 2 with band B8A of Sentinel-2 as NIR and bands B5, B6 and B7 as red edge, respectively. The NDWI1 and NDWI2 indices are calculated using band B8A as NIR and bands B11 and B12, respectively. For further analysis, NDRE1 and NDVI are considered since these are proved to be directly affected by the presence of fungus on leaves that might have an effect on the reflectance of light that falls on them and then is captured by these satellite sensors [34].

3.2.2. Trial Segregation and Cleaning

Images were collected for all the test trials, mapped in the dataset and then visually analyzed. Through all the test trials (71 trials), the behavior of NDVI was observed across and around the disease observation period.Based on these analysis the trials are categorized into five different categories,

Decreasing NDVI trend at disease;
Decreasing NDVI trend post disease observation;
High NDVI at late growth stage;
Increasing NDVI trend;
No major changes.

The count of all test trials across these categories can be found in Table 5. After carrying out this categorization, the presence of leaf-level severity was checked on the first and the final day of observation. The objective of this analysis is to see if the trials in the first category (decreasing NDVI trend at disease) have flag level severity on the first day of observation. This could prove the hypothesis that the presence of yellow rust on the leaf layer has a significant impact on the NDVI. The behavior of flag leaf level yellow rust severity can be observed by the histograms in Figure 10 and Figure 11. There were some abnormalities observed in the third category (high NDVI in late growth stages). Based on the images, it was found that some of the geo-coordinates of these trials lie in urban areas or on the edge of the fields which created abnormalities in the NDVI values. Abnormalities of these kinds were removed as well as trials without any valid Sentinel-2 images around the observation period. In the end, there were a total of 41 trials to test this approach.

3.3. Onset Detection

The pipeline of onset detection can be explained using the pseudocode in Algorithm 1. The predictions(

\hat{y}

) from the aforementioned XGBoost model with weather, phenological and in situ features are gathered. The date at which the model spike occurs in the predictions is found using the detect model spike function. The method convolves a differential operator over

\hat{y}

, which gives the gradients of all predictions. The point of spike is the point with the maximum gradient. This approach is able to detect the spike in model prediction. Figure 12 shows the spike produced by the model along with the detected period and the other parameters. This spike-detection method was applied across all 41 test trials to estimate the average time interval at which the model spikes in relation to the first actual observation. The histogram in Figure 13 shows that most of the model spikes occur between 10 and 20 days before the first disease observation and there were few cases where the model spiked after the first observation.

Algorithm 1: Pseudocode for yellow rust onset detection

Require: » $W i n d o w S i z e (W S), W i n d o w L o c a t i o n (W L)$
» $\hat{y} \leftarrow F (X_{W e a t h e r}, X_{p h e n o l o g i c a l}, X_{i n - s i t u})$
» $s p i k e d a t e \leftarrow d e t e c t m o d e l s p i k e (\hat{y})$
» $\frac{d (\hat{y})}{d t}, \frac{d (X_{R e m o t e S e n s i n g})}{d t} \leftarrow d e t e c t t r e n d (s p i k e d a t e, \hat{y}, W S, W L, X_{R e m o t e S e n s i n g})$
if $\frac{d (\hat{y})}{d t} > 0 \land \frac{d (X_{R e m o t e S e n s i n g})}{d t} < 0$ then
» Signal Disease Onset

Once this spike is detected, a temporal window (days) of size WS at a location WL with respect to the model spike is fixed. The detect trend function returns the slope of

\hat{y}

and

X_{R e m o t e S e n s i n g}

within the window and the disease onset is signalled if the slopes satisfy the condition in Equation (5).

Experimentation

A pipeline to detect the yellow rust onset detection was proposed in the previous section and an experiment was designed based on two parameters, window size (WS) and window location (WL). Window sizes of 20, 31 and 42 days were set up and the locations were set up in such a way that:

The window lies before the model spike, to identify the drop in the remote-sensing indices even before the model spike occurs. This could be a case where the field has already been infested before the model has spiked.
There is a window with the point of spike in the middle, a case where the remote-sensing indices started dropping a little before the model spike.
The window lies after the model spike, to identify the drop in remote-sensing indices after the model has spiked. This is a case where the model has detected the spike even before the drop in the remote-sensing indices.

The number of trials that passed the condition in Equation (5) was noted for all the test cases. From Table 6, it can be observed that about 80% of the test trials cleared the rule with a window location after the model spike and a size of 42 days. Therefore, an ideal procedure is to alert the farmer when the model spike occurs and then observe the remote-sensing-based indices after the spike. If there is a drop in remote-sensing indices, then mitigation strategies have to be performed in accordance with the infestation levels. Reference [35] shows that most disease models can be grouped into three categories: models built on weather data, models built on imagery data and models built on a combination of different sources for forecasting of diseases.References [13,33,36] showed that combination of different data sources improves the model performance.Reference [36] points out that adding crop phenological information may help in crop disease prediction at a regional scale. However, this has not been explored.

In this study, a combination of weather data, phenological data and remote-sensing imagery data for prediction of yellow rust has been explored and an effective remote-sensing-based monitoring model has been established to detect an early onset of yellow rust. The rapid and efficient monitoring of plant diseases helps to reduce the pressure on the farmers. This can also help to achieve the goals of sustainable development as well as promote healthy growth of agriculture.

4. Conclusions and Future Works

In this study, multispectral satellite imagery along with in situ measurements, weather data and phenology data were used to predict a time window for early onset of wheat yellow rust disease. A yellow rust-forecasting model overlaid with remote-sensing indices as auxiliary features drive this prediction. Among CatBoost, Random Forest and XGBoost, the XGBoost regression model was found to be optimal for yellow rust forecasting and predicts disease severity in the range of 0–1 with an MAE of 0.1568, MSE of 0.0433 and RMSE of 0.2081. Since the data is highly imbalanced, with more disease observations reporting the presence of diseases at higher disease severity, the model performance is sub-optimal for early onset detection. Therefore, remote-sensing indices are introduced as auxiliary features along with the model prediction to derive disease progression windows at the early stages of disease development. A rule-based approach is proposed that combines the two feature types. The proposed rule-based method was able to estimate the occurrence of yellow rust in about 80% of the test trials even before the first observation. This offers a unique opportunity to reduce the impact of disease by undertaking timely management strategies and thus reducing any impact on global food security as well.

In the future, the proposed method can be evaluated on other similar diseases as well. There is scope to extend the work by incorporating other relevant vegetation indices such as the Leaf Area Index (LAI) and disease-specific indices which were not considered in this study. Not only more disease-relevant parameters but also other parameters like soil type, planting procedure, topography and other geomorphological parameters could also be included in the models to assess performance improvements. Leaf wetness has been shown to be an important feature for yellow rust since the corresponding pathogen Puccinia striiformis thrives at cooler temperatures and the presence of prolonged leaf wetness would accelerate the progression of the infestation. The inclusion of such specific features can also be derived from remote sensing and be included for improvement of models. However, diseases in the field can be a combination of multiple factors. The inclusion of more disease-relevant parameters will help in better identification of the disease. The interplay between multiple diseases in the field is tough to capture without the proper use of in situ measurements and thus requires further exploration.

Author Contributions

Conceptualization, P.S. and S.M.; Data Curation, P.S.; Formal Analysis, V.T.S.; Methodology, S.M. and P.S.; Validation, V.T.S.; Visualization, V.T.S.; Writing—original draft, V.T.S., P.S. and S.M.; Writing—review & editing, V.T.S., P.S. and S.M. All authors have read and agreed to the published version of the manuscript.

Funding

The APC for this article was funded by the Open Access Publication Fund of Magdeburg University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

We acknowledge support for the Book Processing Charge by the Open Access Publication Fund of Magdeburg University. We would also like to acknowledge our collegues at BASF Digital Farming GmbH for their consistent support throughout this work.

Conflicts of Interest

The authors declares no conflict of interest.

References

Oerke, E. Crop losses to pests. J. Agric. Sci. 2006, 144, 31. [Google Scholar] [CrossRef]
Strange, R.N.; Scott, P.R. Plant disease: A threat to global food security. Annu. Rev. Phytopathol. 2005, 43, 83–116. [Google Scholar] [CrossRef] [PubMed]
Sarkozi, A. New Standards to Curb the Global Spread of Plant Pests and Diseases; Food and Agriculture Organization of the United Nations: Roma, Italy, 2019. [Google Scholar]
Carvajal-Yepes, M.; Cardwell, K.; Nelson, A.; Garrett, K.A.; Giovani, B.; Saunders, D.; Kamoun, S.; Legg, J.; Verdier, V.; Lessel, J.; et al. A global surveillance system for crop diseases. Science 2019, 364, 1237–1239. [Google Scholar] [CrossRef] [PubMed]
Savary, S.; Ficke, A.; Aubertot, J.N.; Hollier, C. Crop Losses due to Diseases and Their Implications for Global Food Production Losses and Food Security. Food Sec. 2012, 4, 519–537. [Google Scholar] [CrossRef]
Figueroa, M.; Hammond-Kosack, K.E.; Solomon, P.S. A review of wheat diseases—A field perspective. Mol. Plant Pathol. 2018, 19, 1523–1536. [Google Scholar] [CrossRef] [PubMed]
Hovmøller, M.S.; Walter, S.; Justesen, A.F. Escalating Threat of Wheat Rusts. Science 2010, 329, 369. [Google Scholar] [CrossRef] [PubMed]
Beddow, J.M.; Pardey, P.G.; Chai, Y.; Hurley, T.M.; Kriticos, D.J.; Braun, H.J.; Park, R.F.; Cuddy, W.S.; Yonow, T. Research investment implications of shifts in the global geography of wheat stripe rust. Nat. Plants 2015, 1, 15132. [Google Scholar] [CrossRef]
Beest, D.; Paveley, N.; Shaw, M.; Bosch, F. Disease–Weather Relationships for Powdery Mildew and Yellow Rust on Winter Wheat. Phytopathology 2008, 98, 609–617. [Google Scholar] [CrossRef]
Kalischuk, M.; Paret, M.L.; Freeman, J.H.; Raj, D.; Da Silva, S.; Eubanks, S.; Wiggins, D.; Lollar, M.; Marois, J.J.; Mellinger, H.C.; et al. An improved crop scouting technique incorporating unmanned aerial vehicle–assisted multispectral crop imaging into conventional scouting practice for gummy stem blight in watermelon. Plant Dis. 2019, 103, 1642–1650. [Google Scholar] [CrossRef]
Franke, J.; Menz, G. Multi-temporal wheat disease detection by multispectral remote sensing. Precis. Agric. 2007, 8, 161–172. [Google Scholar] [CrossRef]
Mukhopadhaya, S. Land use and land cover change modelling using CA-Markov Case study: Deforestation Analysis of Doon Valley. J. Agroecol. Nat. Resour. Manag 2016, 3, 1–5. [Google Scholar]
Xu, W.; Wang, Q.; Chen, R. Spatio-temporal prediction of crop disease severity for agricultural emergency management based on recurrent neural networks. GeoInformatica 2018, 22, 363–381. [Google Scholar] [CrossRef]
Oerke, E.C.; Mahlein, A.K.; Steiner, U. Proximal sensing of plant diseases. In Detection and Diagnostics of Plant Pathogens; Springer: Berlin/Heidelberg, Germany, 2014; pp. 55–68. [Google Scholar]
Mahlein, A.K. Plant disease detection by imaging sensors–parallels and specific demands for precision agriculture and plant phenotyping. Plant Dis. 2016, 100, 241–251. [Google Scholar] [CrossRef] [PubMed]
Mukhopadhaya, S. Hyperspectral remote-sensing data processing and classification: A tutorial. J. Basic Appl. Eng. Res. 2016, 3, 831–837. [Google Scholar]
Weier, J.; Herring, D. Measuring vegetation (ndvi & evi). NASA Earth Obs. 2000, 20, 2. [Google Scholar]
Barnes, E.; Clarke, T.; Richards, S.; Colaizzi, P.; Haberland, J.; Kostrzewski, M.; Waller, P.; Choi, C.; Riley, E.; Thompson, T.; et al. Coincident detection of crop water stress, nitrogen status and canopy density using ground based multispectral data. In Proceedings of the Fifth International Conference on Precision Agriculture, Bloomington, MN, USA, 16–19 July, 2000; Volume 1619, p. 6. [Google Scholar]
Gao, B.C. NDWI—A normalized difference water index for remote sensing of vegetation liquid water from space. Remote Sens. Environ. 1996, 58, 257–266. [Google Scholar] [CrossRef]
Ennouri, K.; Kallel, A. Remote sensing: An advanced technique for crop condition assessment. Math. Probl. Eng. 2019, 2019, 9404565. [Google Scholar] [CrossRef]
Pohl, C.; Van Genderen, J. Remote Sensing Image Fusion: A Practical Guide; CRC Press: Boca Raton, FL, USA, 2016. [Google Scholar]
Gebbers, R.; Adamchuk, V.I. Precision agriculture and food security. Science 2010, 327, 828–831. [Google Scholar] [CrossRef] [PubMed]
West, J.S.; Bravo, C.; Oberti, R.; Lemaire, D.; Moshou, D.; McCartney, H.A. The potential of optical canopy measurement for targeted control of field crop diseases. Annu. Rev. Phytopathol. 2003, 41, 593–614. [Google Scholar] [CrossRef] [PubMed]
Boiarskii, B.; Hasegawa, H. Comparison of NDVI and NDRE indices to detect differences in vegetation and chlorophyll content. J. Mech. Contin. Math. Sci 2019, 4, 20–29. [Google Scholar] [CrossRef]
Ortiz., B.; Shaw., J.; Fulton., J. Basics of Crop Sensing. Alabama Cooperative Extension System. 2011. Available online: https://www.aces.edu/wp-content/uploads/2019/03/ANR-1398-Basics-of-Crop-Sensing_061319La.pdf (accessed on 1 August 2022).
Pérez-Bueno, M.L.; Pineda, M.; Barón, M. Phenotyping plant responses to biotic stress by chlorophyll fluorescence imaging. Front. Plant Sci. 2019, 10, 1135. [Google Scholar] [CrossRef]
Zhang, J.; Huang, Y.; Pu, R.; Gonzalez-Moreno, P.; Yuan, L.; Wu, K.; Huang, W. Monitoring plant diseases and pests through remote sensing technology: A review. Comput. Electron. Agric. 2019, 165, 104943. [Google Scholar] [CrossRef]
Coakley, S.M.; Line, R.F.; McDaniel, L.R. Predicting stripe rust severity on winter wheat using an improved method for analyzing meteorological and rust data. Phytopathology 1988, 78, 543–550. [Google Scholar] [CrossRef]
Coakley, S.M.; Scherm, H.; Chakraborty, S. Climate change and plant disease management. Annu. Rev. Phytopathol. 1999, 37, 399–426. [Google Scholar] [CrossRef]
Zhang, X.; Han, L.; Dong, Y.; Shi, Y.; Huang, W.; Han, L.; González-Moreno, P.; Ma, H.; Ye, H.; Sobeih, T. A deep learning-based approach for automated yellow rust disease detection from high-resolution hyperspectral UAV images. Remote Sens. 2019, 11, 1554. [Google Scholar] [CrossRef]
Krishna, G.; Sahoo, R.; Pargal, S.; Gupta, V.; Sinha, P.; Bhagat, S.; Saharan, M.; Singh, R.; Chattopadhyay, C. Assessing wheat yellow rust disease through hyperspectral remote sensing. Int. Arch. Photogramm. Remote Sens. Spat. Inform. Sci. 2014, XL-8, 1413–1416. [Google Scholar] [CrossRef]
Zheng, Q.; Huang, W.; Cui, X.; Dong, Y.; Shi, Y.; Ma, H.; Liu, L. Identification of wheat yellow rust using optimal three-band spectral indices in different growth stages. Sensors 2019, 19, 35. [Google Scholar] [CrossRef]
Newlands, N.K. Model-based forecasting of agricultural crop disease risk at the regional scale, integrating airborne inoculum, environmental, and satellite-based monitoring data. Front. Environ. Sci. 2018, 6, 63. [Google Scholar] [CrossRef]
Zheng, Q.; Huang, W.; Cui, X.; Shi, Y.; Liu, L. New spectral index for detecting wheat yellow rust using Sentinel-2 multispectral imagery. Sensors 2018, 18, 868. [Google Scholar] [CrossRef]
Fenu, G.; Malloci, F.M. Forecasting plant and crop disease: An explorative study on current algorithms. Big Data Cogn. Comput. 2021, 5, 2. [Google Scholar] [CrossRef]
Zheng, Q.; Ye, H.; Huang, W.; Dong, Y.; Jiang, H.; Wang, C.; Li, D.; Wang, L.; Chen, S. Integrating spectral information and meteorological data to monitor wheat yellow rust at a regional scale: A case study. Remote Sens. 2021, 13, 278. [Google Scholar] [CrossRef]
Roelfs, A.P.; Bushnell, W.R. The Cereal Rusts; Academic Press: Orlando, FL, USA, 1985; Volume 2. [Google Scholar]
Introduction to Foliar Disease Management in Cereals; Agriculture and Horticulture Development: Warwickshire, UK, 2019. Available online: https://ahdb.org.uk/knowledge-library/introduction-to-foliar-disease-management-in-cereals (accessed on 1 August 2022).
Shankar, P.; Johnen, A.; Morales Cepeda, D.A.; Janssen, O. Method and System for Determining a Plant Protection Treatment Plan of an Agricultural Plant; World Intellectual Property Organization WO2021180925A1 (2021-09-16); WIPO: Geneva, Switzerland, 2021. [Google Scholar]
Meier, U.; Bleiholder, H.; Buhr, L.; Feller, C.; Hack, H.; Heß, M.; Lancashire, P.D.; Schnock, U.; Stauß, R.; Van Den Boom, T.; et al. The BBCH system to coding the phenological growth stages of plants–history and publications. J. Für Kult. 2009, 61, 41–52. [Google Scholar]
Gottwald, T.R.T.L.W.; Timmer, L.W.; McGuire, R.G. Analysis of Disease Progress of Citrus Canker in Nurseries in Argentina. Phytopathology 1989, 79, 1276–1283. [Google Scholar] [CrossRef]
McKinney, W. Data Structures for Statistical Computing in Python. In Proceedings of the 9th Python in Science Conference, Austin, TX, USA, 28–30 June 2010; pp. 56–61. [Google Scholar] [CrossRef]
Plotly Technologies Inc. Collaborative Data Science. Montréal, QC, Canada. 2015. Available online: https://plotly.com/ (accessed on 1 August 2022).
Cha, G.W.; Moon, H.J.; Kim, Y.C. Comparison of Random Forest and Gradient Boosting Machine Models for Predicting Demolition Waste Based on Small Datasets and Categorical Variables. Int. J. Environ. Res. Public Health 2021, 18, 8530. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
Mitchell, R.; Adinets, A.; Rao, T.; Frank, E. XGBoost: Scalable GPU Accelerated Learning. arXiv 2018, arXiv:1806.11248. [Google Scholar]
Demajo, L.M.; Vella, V.; Dingli, A. An Explanation Framework for Interpretable Credit Scoring. Int. J. Artif. Intell. Appl. (IJAIA) 2021, 12, 19–38. [Google Scholar] [CrossRef]
Wang, F.; Tian, Y.C.; Zhang, X.; Hu, F. An ensemble of Xgboost models for detecting disorders of consciousness in brain injuries through EEG connectivity. Expert Syst. Appl. 2022, 198, 116778. [Google Scholar] [CrossRef]
Dorogush, A.V.; Gulin, A.; Gusev, G.; Kazeev, N.; Prokhorenkova, L.O.; Vorobev, A. Fighting biases with dynamic boosting. arXiv 2017, arXiv:1706.09516. [Google Scholar]
Guo, A.; Huang, W.; Dong, Y.; Ye, H.; Ma, H.; Liu, B.; Wu, W.; Ren, Y.; Ruan, C.; Geng, Y. Wheat yellow rust detection using UAV-based hyperspectral technology. Remote Sens. 2021, 13, 123. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]

Figure 1. Time series plot of NDVI maps from a trial location in UK during 2018 with reported observations of increasing yellow rust disease severity.

Figure 2. A Choropleth plot representing the distribution of all the field trials taken within Europe that were used in this study.

Figure 3. An example trial representing the complete lifecycle of the cropping season, with growth stage (GS) represented in BBCH standard and disease severity information. The red zone in this example between May 2017 represents a window where the yellow rust severity observations are collected from field trials.

Figure 4. An extension of Figure 3 which represents the leaf level yellow rust severity values collected on field trials. FL represents the flag leaf level, or the top most leaf of the canopy and the subsequent labels represents the leaf below the flag leaf, i.e., FL-1 represents 1 level below the flag leaf level, and so on.

Figure 5. True NDVI points from the Sentinel-2 (S2 NDVI) image source with missing values are interpolated by a moving average (S2 NDVI interpolated). The red-zone in this plot represents the observation of yellow rust disease in the field.

Figure 6. Flowchart describing the Rule-Based Onset-Detection pipeline.

Figure 7. Categorical distribution of the target feature.

Figure 8. Model feature importance and Shapley feature importance of all three algorithms. Remote-sensing features supersede all weather parameters in importance. Columns 1 and 2 represents the tree and Shapley feature importance scores, respectively. Rows (a–c) represents the XGBoost Regressor, Cat Boost Regressor and Random Forest Regressor models, respectively. In all the feature-importance plots, the Growth Stage (GS) and the days after sowing (Daysaftersowing) act as the most important features for predicting disease severity. However, the interpolated remote-sensing-based features (ndvi_rolling_average, ndwi1_rolling_average) supersede the importance of other weather-related features such as wind speed (windSpeedMSAvg), maximum air temperature (airtempCMax) and so on.

Figure 9. Time series plots of different remote-sensing indices: NDVI (S2 NDVI), NDWI (S2 NDWI1, S2 NDWI2) and NDRE (S2 NDRE1, S2 NDRE2, S2 NDRE3) calculated using the Sentinel-2 images along with the interpolated values on a sample trial.

Figure 10. Flag leaf level yellow rust severity on the first day of disease observation. The horizontal axis represents the intensity of the severity and the vertical axis represents the count of trials in each categories, where (a) Decreasing NDVI trend at disease, (b) Decreasing NDVI trend post disease observation disease, (c) High NDVI at late Growth Stage, (d) Increasing NDVI trend, (e) No major changes.

Figure 11. Flag leaf level yellow rust severity on the last day of disease observation. The horizontal axis represents the intensity of the severity and the vertical axis represents the count of trials in each category, where (a) Decreasing NDVI trend at disease, (b) Decreasing NDVI trend post disease observation disease, (c) High NDVI at late Growth Stage, (d) Increasing NDVI trend, (e) No major changes.

Figure 12. The predictions from weather feature-based XGBoost model (XGB predictions) showing a spike in the predicted yellow rust severity. The spike in this trial occurs around 15 days before the first actual in situ observation. The NDVI (S2 NDVI interpolation) also starts dropping after the model spike.

Figure 13. Histogram represents the difference in days between XGBoost weather model spike and the actual disease observation. A negative value represents the spike occurring after the first disease observation.

Table 1. Multispectral image sources ordered based on the priority of mapping in the dataset.

S.No	Source	Resolution (meters)	Revisit	Available Bands
1	Sentinel-2 A/B	10–60 m	5 days	Blue, Green, Red, NIR, red-edge, SWIR
2	Airbus OneAtlas Pléiades	0.5 m	1 year	Blue, Green, Red, NIR
3	Airbus OneAtlas SPOT	1.5 m	1 year	Blue, Green, Red, NIR
4	Landsat-8 (OLI)	15–30 m	16 days	Blue, Green, Red, NIR, SWIR
5	Landsat-7 (ETM+)	15–30 m	16 days	Blue, Green, Red, NIR, SWIR

Table 2. Derived Index Description.

Index	Description	Formula	Range
NDVI	Normalized Difference Vegetation Index	NDVI $= \frac{NIR - Red}{NIR + Red}$	• Range: −1 to 1 • 0.1—Outliers (clouds, water) • 0.2–0.5 Sparse Vegetation • 0.6 Dense Vegetation
NDWI	Normalized Difference Water Index	NDWI $= \frac{NIR - SWIR}{NIR + SWIR}$	• Range: −1 to 1 • −1 to 0—No Vegetation or No water content • +1 Very High Leaf moisture content
NDRE	Normalized Difference Red Edge Index	NDRE $= \frac{NIR - Red edge}{NIR + Red edge}$	• Range: −1 to 1 • 0.1—Outliers (clouds, water) • 0.2–0.5 Sparse Vegetation • 0.6 Dense Vegetation

Table 3. Regression model metrics—using remote sensing features.

Model	MAE	MSE	RMSE
XGBoost	0.1613	0.0408	0.2021
CatBoost	0.1636	0.0416	0.2039
Random Forest	0.1654	0.0415	0.2037

Table 4. Regression model metrics—without using remote-sensing features.

Model	MAE	MSE	RMSE
XGBoost	0.1568	0.0433	0.2081
Cat Boost	0.1597	0.0465	0.2153
Random Forest	0.1667	0.0466	0.2177

Table 5. Trial categorization after manual observation of NDVI trend around the disease observation.

	Category	Count
(a)	Decreasing NDVI trend at disease	30
(b)	Decreasing NDVI trend post disease observation disease	15
(c)	High NDVI at late Growth Stage	8
(d)	Increasing NDVI trend	16
(e)	No major changes	2

Table 6. Experiment results representing the number of trials that passed the condition mentioned in Equation (5) in relation to the window size.

Window Location (WL)	Window Size (days)	No. of Trials Passed Rule (NDVI)	No. of Trials Passed Rule (NDRE)
	20	6	5
Before	31	7	6
	42	7	6
	20	15	11
Before and After	31	17	18
	42	16	18
	20	18	20
After	31	29	31
	42	34	34

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Thirugnana Sambandham, V.; Shankar, P.; Mukhopadhaya, S. Early Onset Yellow Rust Detection Guided by Remote Sensing Indices. Agriculture 2022, 12, 1206. https://doi.org/10.3390/agriculture12081206

AMA Style

Thirugnana Sambandham V, Shankar P, Mukhopadhaya S. Early Onset Yellow Rust Detection Guided by Remote Sensing Indices. Agriculture. 2022; 12(8):1206. https://doi.org/10.3390/agriculture12081206

Chicago/Turabian Style

Thirugnana Sambandham, Venkatesh, Priyamvada Shankar, and Sayan Mukhopadhaya. 2022. "Early Onset Yellow Rust Detection Guided by Remote Sensing Indices" Agriculture 12, no. 8: 1206. https://doi.org/10.3390/agriculture12081206

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early Onset Yellow Rust Detection Guided by Remote Sensing Indices

Abstract

1. Introduction

2. Materials and Methods

2.1. Field Data

2.2. Weather Data

2.3. Remote-Sensing Data

2.4. Modeling Pipeline

2.4.1. Data Preprocessing

2.4.2. Models

2.4.3. Evaluation Metrics

2.4.4. Rule-Based Onset-Detection Approach

3. Results and Discussion

3.1. Data-Driven Disease-Prediction Models

Regression Model

3.2. Rule-Based System for Disease Onset Identification

3.2.1. Remote-Sensing Indices from Sentinel-2

3.2.2. Trial Segregation and Cleaning

3.3. Onset Detection

Experimentation

4. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI