Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques

Chiu, Marco Spencer; Wang, Jinfei

doi:10.3390/rs16173132

Open AccessArticle

Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques

by

Marco Spencer Chiu

^1,*

and

Jinfei Wang

^1,2

¹

Department of Geography and Environment, The University of Western Ontario, London, ON N6G 3K7, Canada

²

The Institute for Earth and Space Exploration, The University of Western Ontario, London, ON N6A 3K7, Canada

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(17), 3132; https://doi.org/10.3390/rs16173132

Submission received: 11 July 2024 / Revised: 21 August 2024 / Accepted: 22 August 2024 / Published: 25 August 2024

(This article belongs to the Special Issue New Insights in Crop Monitoring and Management Using Remote Sensing Data)

Download

Browse Figures

Versions Notes

Abstract

:

Reliable and accurate crop yield prediction at the field scale is critical for meeting the global demand for reliable food sources. In this study, we tested the viability of VENμS satellite data as an alternative to other popular and publicly available multispectral satellite data to predict winter wheat yield and produce a yield prediction map for a field located in southwestern Ontario, Canada, in 2020. Random forest (RF) and support vector regression (SVR) were the two machine learning techniques employed. Our results indicate that machine learning models paired with vegetation indices (VIs) derived from VENμS imagery can accurately predict winter wheat yield 1~2 months prior to harvest, with the most accurate predictions achieved during the early fruit development stage. While both machine learning approaches were viable, SVR produced the most accurate prediction with an R² of 0.86 and an RMSE of 0.3925 t/ha using data collected from tillering to the early fruit development stage. NDRE-1, NDRE-2, and REP from various growth stages were ranked among the top seven variables in terms of importance for the prediction. These findings provide valuable insights into using high-resolution satellites as tools for non-destructive yield potential analysis.

Keywords:

precision agriculture; yield prediction; VENμS; machine learning; vegetation index; winter wheat

1. Introduction

The growing global population has heightened the need for reliable food sources and food security, underscoring the importance of advancing efficient and sustainable agricultural practices. The agriculture industry today faces substantial challenges, including rising global food demand, crop diseases, pest outbreaks, limited arable land, and the impacts of climate change. Addressing these issues is vital for ensuring a resilient and productive agricultural sector. Research by Tan and Reynolds indicates that in southwestern Ontario, water supply and demand pose the greatest challenge to the agricultural sector [1]. Interestingly, farmers in this region are less concerned about climate change compared to those in areas more frequently affected by extreme weather events [2]. The agriculture and agri-food sectors contributed approximately 7% to Canada’s gross domestic product (GDP) and accounted for one in every nine jobs in 2022 [3]. While climate change may not present an immediate threat to the Canadian agricultural industry, it is wise to stay informed and proactively prepare for potential future climate variations.

Precision agriculture (PA) employs advanced technologies and data analysis techniques to optimize crop yields while minimizing resource use. This approach involves evaluating quantified spatial and in situ plant data to inform agricultural practices such as the application of water, labor, and fuel, thereby reducing costs and preventing excessive waste, including pesticide and nutrient loss. PA integrates various spatial technologies, such as geographic information systems (GIS), handheld ground-based data collection devices, and remote sensing through ground-based or aerial vehicles, to develop and implement efficient agricultural strategies [3]. Given the high demand for data collection, remote sensing techniques are employed in crop management to precisely manage, produce, and predict crop data for analysis. Accurate crop yield prediction is crucial for helping farmers address production challenges and mitigate the effects of climate variability and change on crop yield [4].

Among the various platforms of surface spectral data collection in PA, space-borne satellites are one of the most stable platforms [5,6,7,8,9,10]. A key advantage of using optical satellite images for remote sensing is the ability to obtain spectral data over large land areas in a single snapshot with high resolution. Traditionally, researchers have faced challenges with optical satellite images due to their relatively lower spatial resolution compared to ground-collected data [11]. This limitation has restricted research to regional scales rather than local, field-scale studies. For example, Landsat 8, launched in 2013 by the United States, features the Optical Land Imager (OLI) with a spatial resolution of up to 30 m [12]. Similarly, Sentinel-2, launched in 2015, features 13 multispectral bands with spatial resolutions of 10 m, 20 m, and 60 m, and a revisit time of 5 days with its constellation of twin satellites [13]. In contrast, VENμS’s VSSC (Vegetation and Environment Monitoring on a New Micro-Satellite Super-Spectral Camera) captures optical images at a resolution as high as 5.3 m. Additionally, VENμS has a revisit time of 2 days, compared to Landsat 8’s 16 days [12,14]. These advantages in both high spatial and temporal resolution make VENµS a superior choice for detailed crop monitoring and analyses, providing more frequent and precise data for agricultural applications.

The ease of access to satellite data offers a significant advantage over the other major remote sensing methods: ground-level and UAV-level remote sensing. Many satellite datasets, such as those from VENμS, Landsat series, Sentinel-2, MODIS, and SPOT series are publicly available. VENμS imagery can be downloaded free of charge in its predefined areas, thereby reducing both labor and monetary research costs compared to ground sampling and UAV flight operations. While crop monitoring has traditionally relied on satellite imagery, UAV-based systems often challenge their usability due to superior spatial and temporal resolution. Crop growth stages can vary week to week, making some satellite images unsuitable for timely analysis. For instance, Sentinel-2 data have yielded unsatisfactory crop yield prediction results due to cloud coverage and lower temporal resolution [15]. VENμS addresses this issue by providing higher spatial resolution data compared to most satellites, while maintaining frequent revisits of 2 days and offering a wide range of multispectral bands [14].

Furthermore, UAV operations are often constrained by weather conditions. Clear skies and low wind speeds are typically required to collect high-quality data. While UAVs offer flexible planning and scheduling, VENµS can achieve similar advantages by mitigating poor coverage with its short revisit period. Despite this, UAVs provide an edge over satellites by allowing researchers greater control over the location and timing of data collection. However, UAV flights with payloads such as multispectral cameras are often restricted under aviation regulations, and additional procedures or certifications are often required if the flight is to be conducted in a regulated aerodrome in most countries. For instance, Transport Canada mandates the registration of any remotely piloted aircrafts (RPAs) weighing between 250 g and 25 kg, which encompasses most commercially available UAVs that can carry spectral sensors as payloads [16]. Additionally, operating these RPAs categorized by Transport Canada requires the pilot to have different classes of operation licenses based on the location of flight. In contrast, satellite data can often be obtained online free of charge and without any operational requirements, making the data widely accessible. Thus, VENμS effectively combines the advantages of both satellite and UAV systems, offering high spatial resolution, frequent temporal coverage, and ease of data access.

With the spectral data collected from remote sensing imagery, vegetation index (VI) calculations become feasible. VIs are mathematical transformations of spectral bands widely used in agricultural research to determine specific plant properties, such as leaf area index (LAI), chlorophyll content, and nutrient levels [17,18,19]. Consequently, VIs are commonly employed for crop growth and health monitoring, including yield prediction [19,20]. For instance, vegetation indices that performed well in the study by Fu et al. were derived using the red absorption portion of the spectrum [21]. On multispectral cameras, this typically includes the red band and red-edge bands. Indices such as the normalized difference vegetation index (NDVI), normalized difference red edge (NDRE), and soil-adjusted vegetation index (SAVI) have been previously studied as effective indices in winter wheat yield monitoring [11,22]. VENμS is specifically designed for vegetation monitoring, offering more bands in the red-edge and near-infrared range than most publicly available satellite data. This enhanced spectral capability improves its ability to detect vegetation properties. Therefore, it is important to further explore VENμS’s yield prediction potential using a more diverse range of vegetation indices that may be unavailable from other satellites.

Recently, machine learning regression methods, such as Random Forest (RF) and Support Vector Regression (SVR), have been extensively investigated for biomass and yield estimation [23,24,25,26]. These machine learning methods can capture complex patterns and relationships in the data that traditional methods might miss, and was proven to be viable in yield prediction [27,28]. RF, for example, can handle a large number of input variables and is less likely to overfit due to its ensemble nature. Hunt et al. successfully mapped winter wheat yield using Sentinel-2 data and RF regression models, achieving a relatively low root mean square error at 0.66 t/ha [29]. This work suggests the potential of utilizing higher spatial resolution data to capture the within-field yield variability with a common machine learning algorithm.

SVR, conversely, focuses on optimizing a margin around a hyperplane, which can result in better generalization on unseen data. While traditional regression methods are straightforward and easier to interpret, machine learning regression methods like RF and SVR offer significant advantages in terms of handling complexity, scalability, and adaptability, making them suitable for a wide range of modern data-driven applications.

Compared to most publicly available satellites, such as Sentinel-2, VENμS offers additional bands in the red-edge and near-infrared ranges, which are particularly advantageous for vegetation monitoring. It also provides relatively higher spatial and temporal resolution. Despite these benefits, VENμS has been rarely studied in yield estimation research. Therefore, to make a well-informed prediction of winter wheat yield at a local, field-scale using VENμS data, it is essential to introduce an appropriate prediction model. The objective of this study is to (i) investigate the relationships between yield and VIs at difference growth stages, (ii) to evaluate the effectiveness of RF and SVR models in predicting yield, (iii) to determine the optimal combinations of dates (growth stages) for yield prediction in a winter wheat field located in southwestern Ontario, (iv) to uncover insights in the ranked importance of VIs from different forth stages, and (v) to produce a yield prediction map.

2. Materials and Methods

2.1. Study Area and Data Collection

The study site is in Strathroy-Caradoc, Ontario, Canada, near the village of Mount Brydges, which is about 23 km southwest of the urban center of London, Ontario (Figure 1). The studied period was from May to early July of 2020, during which the average recorded temperature was 22 °C and the relative humidity averaged 73%. The climate in the area is classified as a warm-summer humid continental climate (Dfb) according to the Köppen climate classification system. The area is predominantly agricultural croplands and its major field crops include winter wheat, corn, and soybeans [30]. Winter wheat was selected as the focus of this study. A winter wheat field covering 53.7 hectares in this region was designated as the specific area for investigation.

The cultivar in the studied field was soft red winter wheat, which was planted in October of 2019. In the region of Southwest Ontario, winter wheat typically lies dormant over the winter after planting, then commences shooting in late April of the following year and is harvested from early to mid-July. VENμS imagery acquisition was performed at each consequent growth stage starting at tillering, then stem elongation, booting, heading, flowering, early fruit development, and ripening. The growth stages were verified by gauging the plant’s physical characteristics using the Biologische Bundesanstalt, Bundessortenamt and CHemical industry (BBCH) scale at the field, matching the satellite overpass dates (Table 1). In total, 8 cloud-free VENμS images were acquired.

2.2. VENμS Satellite Imagery and Preprocessing

The data used in this research were collected by VENμS (Vegetation and Environment Monitoring on a New MicroSatellite), which was launched in August 2017. This satellite marks the first Earth observation collaboration between France and Israel, led by the Centre National d’Etudes Spatiales (CNES) and the Israeli Space Agency (ISA). The mission aims to monitor plant growth and health status, providing valuable insights into the impacts of environmental factors, human activities, and climate change on Earth’s land surface [14]. Since 2017, the VENμS VM1 mission has provided multispectral data from its 12 different bands, featuring a spatial resolution of 5.3 m, a revisiting period of 2 days, and operating at an altitude of 720 km above sea level (Table 2). As implied by its name, VENμS excels in monitoring Earth’s surface vegetation, which is facilitated by its extensive red-edge and near-infrared bands.

The imagery was categorized as level 2A (L2A) surface reflectance data, with each scene covering areas ranging from 27 × 27 km² to 27 × 54 km² at the spatial resolution of 5 × 5 m². The satellite imagery was processed and distributed by Theia MUSCATE (MUlti SATellite, multi-CApteurs, for multi-TEmporelles data), a component of the Theia Land Data Centre. This French inter-agency organization aims to provide satellite data and value-added products for scientific communities and public policy actors. MUSCATE facilitates the processing and distribution of large volumes of satellite imagery, particularly from VENμS, Sentinel-2, and Landsat satellites. This includes tasks such as atmospheric corrections and creating cloud-free surface reflectance syntheses. The processed data are used in various applications, including agriculture, forestry, urban planning, and environmental monitoring. Each of the VENμS L2A products contained two versions of surface reflectance data for the 12 bands, from B01 to B12. The first version of the surface reflectance rasters is denoted as SRE.DBL (Surface Reflectance), which is atmospherically corrected. The second version is denoted as FRE.DBL (Flat Reflectance), which is SRE.DBL files further corrected for slope effects. This correction suppresses apparent reflectance variations due to the orientation of slopes with regard to the sun, making the corrected image appear as if the land surface were flat. For this study, the FRE.DBL raster files were adopted.

The L2A surface reflectance rasters were encoded as 16-bit signed integers, necessitating preprocessing before any manipulation by dividing pixel values of each channel by 1000. This preprocessing was conducted in Python 3.9.19 using packages such as “rasterio”, “gdal”, and “numpy” to extract and obtain surface reflectance values from each band at the study site. Subsequently, the 12-band raster values were normalized to a range between 0 and 1 for use in later calculations.

2.3. Vegetation Indices

Vegetation indices (VIs) were used in this study as predictors of final harvested yield. The VIs were calculated as raster products using the 12 VENμS bands in Python 3.9.19, employing the same packages used in satellite image preprocessing. Additionally, several VIs that utilize spectral information in the red edge and near-infrared wavelengths, which are well represented in VENμS data, have demonstrated strong correlations with crop growth, health, and yield [17,31,32]. A total of 21 VIs were tested in this study, including 8 variations of existing VIs based on their original development formulas (Table 3). This was made possible by fitting the narrow bandwidth of VENμS bands into VI formulas initially developed with legacy sensors and satellites. For instance, NDVI was developed using the Landsat-1 Multispectral Scanner, where the NIR band 7 had a bandwidth range of 800 to 1100 nm. With the VSSC, both bands 11 and 12 fit within this NIR range, allowing for the inclusion of variations of existing VIs in the analysis.

2.4. Yield Dataset

The yield data were collected at harvest on 25 July 2020, with a combine harvester equipped with a 10 m-wide and 1.5 m-long header. Yield data were generated as a point shapefile, with yield data recorded approximately every second at the center of the harvester’s track. To ensure accuracy, potential outliers located at the edges of the field were removed. For this study, the shapefile was interpolated into a 5 × 5 m² spatial resolution raster using QGIS 3.22 with inverse distance weighted (IDW) interpolation, matching the VENμS imagery and the derived vegetation indices (VIs). This approach was adopted to fully utilize the high-resolution advantage of VENμS data and to produce a detailed yield prediction map.

2.5. Machine Learning Regression Modelling and Cross-Validation

In machine learning, regression models are used to predict continuous outcomes based on input variables. Two notable techniques in this domain are Random Forest (RF) regression and Support Vector Regression (SVR), both of which offer robust solutions for complex regression problems. Advantages of machine learning regression also include its ability to automatically learn from data without being explicitly programmed for each specific task. Given that the regression models in this study were based on pixel-level analysis, machine learning regression methods were ideal for our needs as they excel in handing large data sizes. In our study, we used three key metrics to evaluate the performance of our regression models: mean absolute error (MAE), R-squared (R²), and root mean squared error (RMSE). These metrics were employed during both the cross-validation stage and the calibration and validation of the final model to ensure a comprehensive assessment of model accuracy and reliability.

RF is an ensemble learning method that constructs multiple decision trees during calibration and outputs the mean prediction of these trees. By using multiple trees, RF reduces overfitting, a common issue in single-decision tree models. Each tree is built from a random sample of the data, with a random subset of features selected at each node to decide splits. This randomness helps make the model more resilient to noise and outliers. RF can handle large, high-dimensional datasets and identify important variables in the modeled relationships. Additionally, RF provides measures of feature importance, helping to understand the impact of each variable on the prediction.

SVR, on the other hand, extends the concepts of Support Vector Machines (SVMs) from classification to regression. Like RF, SVR is also generally robust to over fitting. It is a result of its margin maximization, the use of kernel functions, the epsilon-insensitive loss function, and the reliance on support vectors. Unlike traditional methods that minimize the error between predicted and observed values, SVR attempts to fit the error within a certain threshold. It involves the creation of a hyperplane in a multidimensional space where the distance between the data points and the hyperplane is minimized, ensuring that errors do not exceed a defined threshold. This makes SVR particularly useful in cases where a margin of tolerance is specified in the predictions. SVR is highly effective in handling non-linear relationships through the use of kernel functions, making it adaptable to various types of data [46].

In this study, the data collected over 8 dates were randomly divided into a 70% calibration set and a 30% validation set. A 10-fold K-fold cross-validation approach was employed in this study to ensure the robustness and generalizability of the machine learning regression models. This method involved splitting the calibration data into multiple subsets (folds), using each subset in turn as the validation set, while the remaining data were used for training. With 10 folds, each fold used 90% of the data for training and 10% for validation. This approach ensured that each training set was large enough to effectively train the model, while each validation set was sufficient to provide a reliable evaluation without overfitting. During the cross-validation stage, MAE, R², and RMSE served as crucial indicators of model performance. Cross-validation involved partitioning the calibration dataset into multiple folds and iteratively training and validating the model on these folds. MAE provided the average magnitude of errors in the predictions, indicating the overall accuracy of the model without considering the direction of errors. RMSE, which penalized larger errors more significantly due to its squared component, offered insight into the model’s ability to handle large deviations from observed values. R², representing the proportion of variance explained by the model, evaluated the goodness of fit, with values closer to 1 indicating a better fit. By averaging these metrics across all folds, we obtained a robust estimate of the model’s performance and its variability, thus mitigating the risk of overfitting or underfitting to specific subsets of data. For the RF models, the RMSE dictated the optimal cross-validated RF model with an optimal number of splits at each tree node. The MAE value of that optimal RF model represents its average magnitude of the errors in the prediction. MAE served the same purpose in the SVR models, but RMSE determined the optimal SVR model with the optimal regularization parameter. The equations are as follows:

R^{2} = 1 - \frac{{\sum (y_{i} - {\hat{y}}_{i})}^{2}}{{\sum (y_{i} - {\bar{y}}_{i})}^{2}}

(1)

where

y_{i}

is the observed value,

{\hat{y}}_{i}

is the predicted value, and

{\bar{y}}_{i}

is the mean of the observed values, and;

RMSE = \sqrt{\frac{\sum_{i = 1}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{n}}

(2)

MAE = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(3)

where

{\hat{y}}_{i}

represents the predicted yield (t/ha),

y_{i}

denotes the observed yield (t/ha), n is the total number of observations, and i serves as the summation index, incrementing by one.

After cross-validation, the final model was trained on the entire cross-validated calibration dataset and then evaluated on both the calibration and validation datasets using the same metrics. In the calibration stage, RMSE assessed the model’s fit to the data it was trained on, while R² measures how well the model captures the underlying data patterns. High R² values, coupled with low RMSE, suggest a good fit. However, it is crucial to compare these metrics with those from the validation stage. The validation stage involved assessing the model on unseen data, providing an indication of its generalization ability. Consistent performance across calibration and validation sets, characterized by similar R², and RMSE values, indicates a robust model.

Figure 2 displays the workflow of the methodology. The modeling was written in the R programming language using RStudio by utilizing packages such as “randomForest” and “e1071” for RF and SVR, respectively. In both models, the independent variables were the VIs. Data collected over the 8 dates were ran individually, then divided into two groups of “pre-heading” and “post-heading”. For each dataset, a 10-fold cross-validation was performed using packages “caret” and “kernlab”. The yield prediction raster was also created using the “raster”, “sp”, and “rasterVis” packages in RStudio.

3. Results

3.1. Cross-Validation of Regression Models

In our study, we performed 10-fold cross-validation on a total of 13 datasets. These datasets were divided into two categories: 8 individual growth stages and three groups of growth stages. The growth stage groups included 3 combinations of growth stages from pre-heading, 2 combinations of growth stages from post-heading, and all data. The rationale for forming grouped pre- and post-heading stage datasets is based on the fact that winter wheat undergoes a transition period marked by a slowdown in leaf growth due to shifts in developmental priorities and physiological changes. As the plant transitions to the reproductive phase, its focus shifts from vegetative to reproductive growth, including the formation and maturation of the inflorescence, causing the leaves to turn yellow. For the purpose of this study, ripening stage data were not included in the post-heading stage and all-data group because the model performance significantly dropped after the early fruit development stage.

The mean of the evaluation metrics was used to test the models’ generalizability on unseen data across all 10 folds. Among the individual growth stages, the early fruit development stage performed the best, while tillering-1 performed the worst, with both machine learning models performing similarly. As seen in Figure 3, there was a trend in increasing

\bar{R^{2}}

, and decreasing

\bar{R M S E}

and

\bar{M A E}

as the growth stages progressed from tillering-1 to early fruit development. The evaluation metrics displayed a significant drop in model performance for both RF and SVR afterwards in the ripening stage. The RF model (

\bar{R^{2}}

= 0.78,

\bar{R M S E}

= 0.4832 t/ha,

\bar{M A E}

= 0.3362 t/ha) explained the variance slightly better than the SVR model (

\bar{R^{2}}

= 0.78,

\bar{R M S E}

= 0.4834 t/ha,

\bar{M A E}

= 0.3330 t/ha). However, the SVR model was slightly less sensitive to outliers compared to the RF model. Overall, the machine learning regression models within each growth stage were similar in terms of stability, as seen in the sizes of the whiskers in Figure 3.

On the other hand, models that incorporated all data and combinations from the pre- and post-heading groups showed an increase in explanatory power as more data were added to the regression models. Models using post-heading stage datasets demonstrated greater robustness, with lower mean RMSE values, compared to those using pre-heading stage datasets. However, models using all data were the least stable, as indicated by their higher

\bar{R M S E}

standard deviation. Overall, both models demonstrated the highest generalizability on unseen data when all data were combined. The SVR (

\bar{R^{2}}

= 0.86,

\bar{R M S E}

= 0.3899 t/ha,

\bar{M A E}

= 0.2475 t/ha) explained data variance better and had lower prediction error than the RF (

\bar{R^{2}}

= 0.84,

\bar{RMSE}

= 0.4185 t/ha,

\bar{M A E}

= 0.2800 t/ha) model in cross-validation.

3.2. Yield Prediction Using Regression Models

Table 4 displays the calibration and validation performance of the RF and SVR models with datasets from all eight tested growth stages individually. Overall, the models best at explaining data variance could be found when the models were using early fruit development stage data. The calibration R² had a range of 0.54~0.96, and RMSE values had a range of 0.2039~0.7057 t/ha. The RF model had a significantly higher calibration, R² = 0.96, compared to the SVR model, R² = 0.79, indicating that the RF model fit the training data much better. This finding was consistent throughout the analysis, and is expected due to RF’s ensemble nature, which excels in capturing complex patterns. In terms of validation model metrics, R² values had a range of 0.50~0.77, and RMSE values had a range of 0.5008~0.7421 t/ha. The validation results were also consistent with calibration that both RF and SVR had the highest R² when paired with data from the early fruit development stage. RF (R² = 0.77, RMSE = 0.5008 t/ha) slightly outperformed SVR (R² = 0.77, RMSE = 0.5039 t/ha), making it the best performing prediction model when using data from individual stages.

The analysis extended to using combinations of datasets from growth stages. Table 5 displays the calibration and validation performance of the RF and SVR models with dataset groups of pre-heading stage, post-heading stage, and all data. Collectively, models using dataset groups as variables outperformed models using individual datasets from tilling-1, tilling-2, stem elongation, and booting stage, all of which were in the pre-heading stage group. In calibration, the R² values had a range of 0.75~0.98, while the RMSE values had a range of 0.1640~0.5189 t/ha. Both ranges were significantly narrower compared to their counterparts in individual growth stages as we saw improved performance of SVR. Overall, the calibration R² was consistently higher in RF models than in SVR models. However, the validation statistics show that SVR models outperformed RF models in yield prediction as validation, with higher R² values and lower RMSE values in SVR models when paired with each of the three dataset groups. The R² values had a range of 0.72~0.86, while the RMSE values had a range of 0.3925~0.5465 t/ha in validation. The best yield prediction model was found to be the SVR model using all data from tillering-1 to the early fruit development stage (R² = 0.86, RMSE = 0.3925 t/ha). Although the RF model performed better in calibration, it predicted yield with slightly lower accuracy (R² = 0.83, RMSE = 0.4257 t/ha).

3.3. Ranked Importance of Vegetation Indices from Different Growth Stages

RF modeling, which utilizes numerous decision trees, was employed to generate a variable importance plot in RStudio using the “varImpPlot()” function. This plot displays increasing node purity (IncNodePurity) on the x–axis, representing the importance of each variable in predicting yield across different dates. A higher IncNodePurity value indicates that the variable is more significant as a predictor, helping to identify the key predictors in the models. The variable importance plot revealed that NDRE-1 and NDRE-2 from the heading, flowering, and early fruit development stages were among the most important predictors of yield, with the top-ranked variable being NDRE-1 from the flowering stage, as shown in Figure 4. REP from the flowering and early fruit development stages ranked 4th and 7th, respectively, on the plot. NDRE-1, NDRE-2, REP, and ARVI from the multiple growth stages constituted 17 of the top 20 ranked VI variables. Beyond the top 7 ranked VIs, the IncNodePurity values of the remaining VI variables were relatively similar and gradually decreased throughout the list of the 147 tested VI variables in total.

3.4. Visuallization of Predicted Yield

A yield prediction map helps visualize the yield variations within a field, and VENμS’ higher spatial resolution enables readers to clearly identify areas of inaccuracy. Figure 5 demonstrates that the prediction generally captured the yield variations across the entire field, as reflected by both the map and the evaluation metrics. However, the prediction did not accurately capture the extreme values in the observed yield. For instance, the north side borders of the field showed extreme lows in the observed yield, but the predicted yield map did not reflect values as extreme. Similarly, in areas of extreme highs, the prediction failed to capture the highest values and sometimes incorrectly predicted the yield as a markedly different value.

4. Discussion

Implications of Model Performance on Yield Prediction with VENμS Imagery

PA uses advanced technologies and data analysis to optimize agricultural practices and assist in management decisions, with the goal of minimizing input while maximizing output and efficiency. We proposed using VENμS imagery for yield prediction as an alternative to other publicly available satellite data with its higher spatial and temporal resolution. The differences of performance of the machine learning regression models have been discussed in detail above. Though similar in prediction performance, SVR was the overall better machine learning regression model when more data were added to the regression, while RF was more accurate when predicting yield with data from individual stages. Our findings aligned with the study conducted by Han et al., and we were able to suggest that both RF and SVR were high performance techniques in yield prediction [27,28]. However, potential overfitting was observed in the RF models, even after careful tuning. Although this overfitting decreased as more growth stage data were added, we recognize that RF models can be prone to overfitting, especially when dealing with complex data. This limitation prompted the inclusion of SVR in the study, which exhibited less overfitting and proved to be the more reliable algorithm in this context.

The regression models were able to determine that the early fruit development stage was the best growth stage to predict yield from. This finding agreed with Hassan et al.’s study, in which the yield prediction accuracy increased as the growth stages progressed [47]. Overall, our study was able to achieve a higher accuracy by incorporating all data from tillering to the early fruit development stage. Among all the test VIs, NDRE is the most important predictor of yield as tested, which is consistent with previous studies [11]. As reported, NDRE-1, NDRE-2, REP, and ARVI from multiple growth stages made up 17 of the top 20 most important variables in predicting the final yield, with NDRE-1, NDRE-2, and REP being the major contributors to the prediction. The common characteristics of the three VIs are that they all used bands 8, 9, and 10 of VENμS, which are at the central wavelengths of 702, 741.1, and 782.2 nm. These bands fall in between the red-edge and NIR regions of the spectrum, and the VIs based on this wavelength range were previously proven to be effective at predicting grain yield [30].

Our results, when compared to studies based on different satellite data, displayed a similar or even higher prediction accuracy, often accompanied by lower errors [8,10]. Given that grouped stages of data performed better in the prediction models, we conducted an additional test with Sentinel-2 data, applying the same methodology and using overpass dates as close as possible to those used with VENμS data. Table 6 presents the regression statistics of the yield prediction models using Sentinel-2 data with grouped growth stages. Of the 21 VIs tested, 20 were recreated using Sentinel-2 data. Unfortunately, data from the Tillering-2 stage could not be included in the analysis due to cloud cover. The best prediction results obtained were at R² = 0.79 and RMSE = 0.5147 t/ha with SVR using all data.

The accuracy of the prediction model plateaued when using post-heading stage data only, with a minor decrease in RMSE as more data were added. Additionally, compared to the best prediction model using VENμS imagery, the model prediction error with Sentinel-2 data was still significantly higher. This could have contributed to Sentinel-2’s lower spatial resolution, as VENμS has four times more data than Sentinel-2. Although we were able to optimize and successfully create a robust and accurate winter wheat yield prediction model with VENμS data at a local, field-scale, it is not without its drawbacks. In contrast to most publicly available satellites which provide frequent coverage of Earth’s land surface, VENμS does not cover Earth’s entire land surface. Instead, it focuses on specific sites of interest and revisits these selected sites frequently. This means that the site of researchers’ interests may not be covered, even though the data are publicly available. Researchers are required to apply for VENμS coverage at their location of interest.

5. Conclusions

This study evaluated the effectiveness of VENμS multispectral imagery in predicting winter wheat yield in southwestern Ontario using machine learning methods. A total of 21 VIs, including eight variations of existing VIs based on their original development formulas, were tested. The best prediction result demonstrated a high correlation between VENμS data and observed yield, with an R² = 0.86 and an RMSE = 0.3925 t/ha using an SVR model. According to our results, a reliable prediction of yield can be achieved two months prior to harvest using the combined pre-heading stage data, and the best result can be obtained 39 days prior when using all data from the pre- and post- heading stages. The findings suggest that VENμS data can offer superior yield prediction accuracy compared to other publicly available satellites, and could potentially serve as a viable alternative to UAV data for local, field-scale studies.

Though machine learning algorithms are effective at capturing complex patterns among variables, it is important to recognize the empirical nature of these models. They rely on existing datasets for validation, and can only approximate the observed yield, which is only verifiable at harvest. This intrinsic limitation highlights the potential discrepancies between predicted and actual outcomes. Such limitations underscore the necessity for ongoing calibration and testing of these models under varied agricultural conditions and across different crop cycles to ensure their reliability and accuracy.

Additionally, while k-fold cross-validation and a 70/30 train-test split were employed in this study due to their broad adoption and effective use of the data, future work could explore spatial splitting as a viable alternative. Spatial splitting, which divides the dataset based on geographic location rather than by random subsets, may provide a more realistic evaluation of the model’s robustness across different parts of the field by better addressing spatial autocorrelation. Investigating this approach could enhance the model’s performance in capturing spatial variability within the field.

VENμS, as mentioned above, does not provide worldwide coverage, which is a significant drawback limiting the use of its superior high-resolution multispectral data. Although this study showed that Sentinel-2 is a less effective alternative, it remains the next best option for predicting yield with publicly available satellite data using our method. Its worldwide frequent coverage can produce comparable results to VENμS at the field-scale and potentially similar results at a regional scale. This research highlights the potential of high-resolution satellite data with multispectral cameras for yield prediction. Future studies may also consider using commercial satellites such as PlanetScope and WorldView-3 as alternatives for high-resolution multispectral data. Combining these satellite data with yield estimation models could lead to advancements in low labor costs, and non-destructive yet highly accurate yield predictions, providing a more detailed understanding of crop yield potential and distribution.

Author Contributions

Conceptualization, M.S.C. and J.W.; methodology, M.S.C. and J.W.; software, M.S.C.; validation, J.W.; formal analysis, M.S.C.; investigation, M.S.C.; resources, J.W.; data curation, M.S.C.; writing—original draft preparation, M.S.C.; writing—review and editing, M.S.C. and J.W.; visualization, M.S.C.; supervision, J.W.; project administration, J.W.; funding acquisition, J.W. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Natural Science and Engineering Research Council of Canada (NSERC) Discovery Grant (grant number RGPIN-2022-05051), awarded to Wang. Additional funding was from the Western Graduate Research Scholarship provided by The University of Western Ontario, granted to Chiu.

Data Availability Statement

The data collected for this paper are publicly available online through Theia’s website at https://theia.cnes.fr/atdistrib/rocket/#/home (accessed on 4 June 2024) for VENμS data, and Copernicus explorer https://browser.dataspace.copernicus.eu/ (accessed on 28 June 2024) for Sentinel-2 data.

Acknowledgments

The authors would like to thank A&L Canada Laboratories Inc. and the members of Wang’s GITA lab for their invaluable assistance with data collection, lab processing, and overall support. Special thanks go to Bo Shan, Robin Kwik, Naythan Samuda, Chunhua Liao, Jody Yu, and Yang Song for their dedicated help and guidance. Additionally, the authors extend their gratitude to Yu for her expertise and guidance in programming the machine learning analyses, and to James Brackett for conducting preliminary testing on the VENμS datasets. The authors would also like to thank the anonymous reviewers for their time, helpful comments, and feedback on this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tan, C.S.; Reynolds, W.D. Impacts of Recent Climate Trends on Agriculture in Southwestern Ontario. Can. Water Resour. J. 2003, 28, 87–97. [Google Scholar] [CrossRef]
Reid, S.; Smit, B.; Caldwell, W.; Belliveau, S. Vulnerability and Adaptation to Climate Risks in Ontario Agriculture. Mitig. Adapt. Strateg. Glob. Change 2007, 12, 609–637. [Google Scholar] [CrossRef]
Agriculture and Agri-Food Canada. Overview of Canada’s Agriculture and Agri-Food Sector. Available online: https://agriculture.canada.ca/en/sector/overview (accessed on 25 May 2024).
Hammer, G. Applying Seasonal Climate Forecasts in Agricultural and Natural Ecosystems—A Synthesis. In Applications of Seasonal Climate Forecasting in Agricultural and Natural Ecosystems; Hammer, G.L., Nicholls, N., Mitchell, C., Eds.; Springer Netherlands: Dordrecht, The Netherlands, 2000; pp. 453–462. [Google Scholar] [CrossRef]
Shafi, U.; Mumtaz, R.; García-Nieto, J.; Hassan, S.A.; Zaidi, S.A.R.; Iqbal, N. Precision Agriculture Techniques and Practices: From Considerations to Applications. Sensors 2019, 19, 3796. [Google Scholar] [CrossRef] [PubMed]
Liao, C.; Wang, J.; Shan, B.; Shang, J.; Dong, T.; He, Y. Near Real-Time Detection and Forecasting of within-Field Phenology of Winter Wheat and Corn Using Sentinel-2 Time-Series Data. ISPRS J. Photogramm. Remote Sens. 2023, 196, 105–119. [Google Scholar] [CrossRef]
Yu, J.; Wang, J.; Leblon, B.; Song, Y. Nitrogen Estimation for Wheat Using UAV-Based and Satellite Multispectral Imagery, Topographic Metrics, Leaf Area Index, Plant Height, Soil Moisture, and Machine Learning Methods. Nitrogen 2022, 3, 1–25. [Google Scholar] [CrossRef]
Skakun, S.; Franch, B.; Vermote, E.; Roger, J.-C.; Justice, C.; Masek, J.; Murphy, E. Winter Wheat Yield Assessment Using Landsat 8 and Sentinel-2 Data. In IGARSS 2018—2018 IEEE International Geoscience and Remote Sensing Symposium; IEEE: Valencia, Spain, 2018; pp. 5964–5967. [Google Scholar] [CrossRef]
Zhang, C.; Marzougui, A.; Sankaran, S. High-Resolution Satellite Imagery Applications in Crop Phenotyping: An Overview. Comput. Electron. Agric. 2020, 175, 105584. [Google Scholar] [CrossRef]
Zhao, Y.; Potgieter, A.B.; Zhang, M.; Wu, B.; Hammer, G.L. Predicting Wheat Yield at the Field Scale by Combining High-Resolution Sentinel-2 Satellite Imagery and Crop Modelling. Remote Sens. 2020, 12, 1024. [Google Scholar] [CrossRef]
Fu, Z.; Jiang, J.; Gao, Y.; Krienke, B.; Wang, M.; Zhong, K.; Cao, Q.; Tian, Y.; Zhu, Y.; Cao, W.; et al. Wheat Growth Monitoring and Yield Estimation Based on Multi-Rotor Unmanned Aerial Vehicle. Remote Sens. 2020, 12, 508. [Google Scholar] [CrossRef]
United States Geological Survey (USGS). Landsat 8 (L8) Data Users Handbook; USGS: Sioux Falls, SD, USA, 2019; pp. 1–93.
European Space Agency (ESA). Sentinel-2 User Handbook; ESA: Paris, France, 2015; pp. 1–64. [Google Scholar]
Centre National d’Etudes Spatiales (CNES); Israeli Space Agency (ISA). The VENμS Mission and Products; CNES: Paris, France; ISA: Tel Aviv, Israel, 2023; pp. 1–26.
Bukowiecki, J.; Rose, T.; Kage, H. Sentinel-2 Data for Precision Agriculture?—A UAV-Based Assessment. Sensors 2021, 21, 2861. [Google Scholar] [CrossRef]
Transport Canada. Flying Your Drone Safely and Legally. Available online: https://tc.canada.ca/en/aviation/drone-safety/learn-rules-you-fly-your-drone/flying-your-drone-safely-legally (accessed on 28 June 2024).
Xie, Q.; Huang, W.; Liang, D.; Chen, P.; Wu, C.; Yang, G.; Zhang, J.; Huang, L.; Zhang, D. Leaf Area Index Estimation Using Vegetation Indices Derived from Airborne Hyperspectral Images in Winter Wheat. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 3586–3594. [Google Scholar] [CrossRef]
Wu, C.; Niu, Z.; Tang, Q.; Huang, W. Estimating Chlorophyll Content from Hyperspectral Vegetation Indices: Modeling and Validation. Agric. For. Meteorol. 2008, 148, 1230–1241. [Google Scholar] [CrossRef]
Yu, J.; Wang, J.; Leblon, B. Evaluation of Soil Properties, Topographic Metrics, Plant Height, and Unmanned Aerial Vehicle Multispectral Imagery Using Machine Learning Methods to Estimate Canopy Nitrogen Weight in Corn. Remote Sens. 2021, 13, 3105. [Google Scholar] [CrossRef]
Silleos, N.G.; Alexandridis, T.K.; Gitas, I.Z.; Perakis, K. Vegetation Indices: Advances Made in Biomass Estimation and Vegetation Monitoring in the Last 30 Years. Geocarto Int. 2006, 21, 21–28. [Google Scholar] [CrossRef]
Fu, Y.; Yang, G.; Wang, J.; Song, X.; Feng, H. Winter Wheat Biomass Estimation Based on Spectral Indices, Band Depth Analysis and Partial Least Squares Regression Using Hyperspectral Measurements. Comput. Electron. Agric. 2014, 100, 51–59. [Google Scholar] [CrossRef]
Panek, E.; Gozdowski, D.; Stępień, M.; Samborski, S.; Ruciński, D.; Buszke, B. Within-Field Relationships between Satel-lite-Derived Vegetation Indices, Grain Yield and Spike Number of Winter Wheat and Triticale. Agronomy 2020, 10, 1842. [Google Scholar] [CrossRef]
Atkinson Amorim, J.G.; Schreiber, L.V.; de Souza, M.R.Q.; Negreiros, M.; Susin, A.; Bredemeier, C.; Trentin, C.; Vian, A.L.; de Oliveira Andrades-Filho, C.; Doering, D.; et al. Biomass Estimation of Spring Wheat with Machine Learning Methods Using UAV-Based Multispectral Imaging. Int. J. Remote Sens. 2022, 43, 4758–4773. [Google Scholar] [CrossRef]
Wang, F.; Yang, M.; Ma, L.; Zhang, T.; Qin, W.; Li, W.; Zhang, Y.; Sun, Z.; Wang, Z.; Li, F.; et al. Estimation of Above-Ground Biomass of Winter Wheat Based on Consumer-Grade Multi-Spectral UAV. Remote Sens. 2022, 14, 1251. [Google Scholar] [CrossRef]
van Klompenburg, T.; Kassahun, A.; Catal, C. Crop Yield Prediction Using Machine Learning: A Systematic Literature Review. Comput. Electron. Agric. 2020, 177, 105709. [Google Scholar] [CrossRef]
Chlingaryan, A.; Sukkarieh, S.; Whelan, B. Machine Learning Approaches for Crop Yield Prediction and Nitrogen Status Estimation in Precision Agriculture: A Review. Comput. Electron. Agric. 2018, 151, 61–69. [Google Scholar] [CrossRef]
Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of Winter Wheat Yield Based on Multi-Source Data and Machine Learning in China. Remote Sens. 2020, 12, 236. [Google Scholar] [CrossRef]
Nigam, A.; Garg, S.; Agrawal, A.; Agrawal, P. Crop Yield Prediction Using Machine Learning Algorithms. In 2019 Fifth International Conference on Image Information Processing (ICIIP); IEEE: Shimla, India, 2019; pp. 125–130. [Google Scholar] [CrossRef]
Hunt, M.L.; Blackburn, G.A.; Carrasco, L.; Redhead, J.W.; Rowland, C.S. High Resolution Wheat Yield Mapping Using Sentinel-2. Remote Sens. Environ. 2019, 233, 111410. [Google Scholar] [CrossRef]
Ontario Ministry of Agriculture, Food and Rural Affairs. Census Farm Data Collection; Ontario Data Catalogue, Canada. 2022. Available online: https://data.ontario.ca/dataset/census-farm-data-collection (accessed on 26 May 2024).
Zhang, Y.; Qin, Q.; Ren, H.; Sun, Y.; Li, M.; Zhang, T.; Ren, S. Optimal Hyperspectral Characteristics Determination for Winter Wheat Yield Prediction. Remote Sens. 2018, 10, 2015. [Google Scholar] [CrossRef]
Cao, Q.; Miao, Y.; Shen, J.; Yu, W.; Yuan, F.; Cheng, S.; Huang, S.; Wang, H.; Yang, W.; Liu, F. Improving In-Season Estimation of Rice Yield Potential and Responsiveness to Topdressing Nitrogen Application with Crop Circle Active Crop Canopy Sensor. Precision Agric. 2016, 17, 136–154. [Google Scholar] [CrossRef]
Kaufman, Y.J.; Tanre, D. Atmospherically Resistant Vegetation Index (ARVI) for EOS-MODIS. IEEE Trans. Geosci. Remote Sens. 1992, 30, 261–270. [Google Scholar] [CrossRef]
Richardson, A.J.; Wiegand, C.L. Distinguishing Vegetation from Soil Background Information. Photogramm. Eng. Remote Sens. 1977, 43, 1541–1552. [Google Scholar]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Fernandes, R.; Butson, C.; Leblanc, S.; Latifovic, R. Landsat-5 TM and Landsat-7 ETM+ Based Accuracy Assessment of Leaf Area Index Products for Canada Derived from SPOT-4 VEGETATION Data. Can. J. Remote Sens. 2003, 29, 241–258. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; de Colstoun, E.B.; McMurtrey, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Qi, J.; Chehbouni, A.; Huete, A.R.; Kerr, Y.H.; Sorooshian, S. A Modified Soil Adjusted Vegetation Index. Remote Sens. Environ. 1994, 48, 119–126. [Google Scholar] [CrossRef]
Gitelson, A.; Merzlyak, M.N. Quantitative Estimation of Chlorophyll-a Using Reflectance Spectra: Experiments with Autumn Chestnut and Maple Leaves. J. Photochem. Photobiol. B Biol. 1994, 22, 247–252. [Google Scholar] [CrossRef]
Rouse, J.W.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring Vegetation Systems in the Great Plains with ERTS. NASA Spec. Publ. 1974, 351, 309. [Google Scholar]
Rondeaux, G.; Steven, M.; Baret, F. Optimization of Soil-Adjusted Vegetation Indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
Roujean, J.-L.; Breon, F.-M. Estimating PAR Absorbed by Vegetation from Bidirectional Reflectance Measurements. Remote Sens. Environ. 1995, 51, 375–384. [Google Scholar] [CrossRef]
Guyot, G.; Baret, F. Utilisation de La Haute Resolution Spectrale Pour Suivre L’etat Des Couverts Vegetaux. Spectr. Signat. Objects Remote Sens. 1988, 287, 279–286. [Google Scholar]
Jordan, C.F. Derivation of Leaf-Area Index from Quality of Light on the Forest Floor. Ecology 1969, 50, 663–666. [Google Scholar] [CrossRef]
Huete, A.R. A Soil-Adjusted Vegetation Index (SAVI). Remote Sens. Environ. 1988, 25, 295–309. [Google Scholar] [CrossRef]
Chang, C.-C.; Lin, C.-J. LIBSVM: A Library for Support Vector Machines. ACM Trans. Intell. Syst. Technol. 2011, 2, 1–27. [Google Scholar] [CrossRef]
Hassan, M.A.; Yang, M.; Rasheed, A.; Yang, G.; Reynolds, M.; Xia, X.; Xiao, Y.; He, Z. A Rapid Monitoring of NDVI across the Wheat Growth Cycle for Grain Yield Prediction Using a Multi-Spectral UAV Platform. Plant Sci. 2019, 282, 95–103. [Google Scholar] [CrossRef]

Figure 1. Location of the studied wheat field near Melbourne, ON, Canada, in an ArcGIS Pro Basemap image.

Figure 2. Methodology flowchart of this study.

Figure 3. Mean cross-validation statistics histogram: analysis by growth stage datasets and modelling approach (RF and SVR) using 21 VI variables. The whiskers display the standard deviation of the metrics.

Figure 4. Variable importance plot produced with VIs with all data. Only top 20 of the 147 VI variables were displayed. Refer to Table 3 for the full names of the variables. The number denoted after the variables’ abbreviation is the date of the VENμS imagery.

Figure 5. The map of comparison between the observed and predicted yields.

Table 1. Growth stages at the study area with matching VENμS overpass dates.

Growth Stage	VENμS Overpass
Tillering-1	20200503
Tillering-2	20200513
Stem Elongation	20200521
Booting	20200525
Heading	20200606
Flowering	20200612
Early Fruit (Grain) Development	20200616
Late Fruit (Grain) Development	Cloud Cover
Ripening	20200706

Table 2. Spectral bands of the VENμS super-spectral camera.

Bands	Central Wavelength (nm)	Bandwidth (nm)
1	423.9	40
2	446.9	40
3	491.9	40
4	555	40
5	619.7	40
6	619.5	40
7	666.2	30
8	702	24
9	741.1	16
10	782.2	16
11	861.1	40
12	908.7	20

Table 3. Vegetation indices to be tested in this study.

VI ¹	Formula ²	Original Authors
ARVI	$\frac{{N I R}_{11} - [{R e d}_{7} - 1 \times ({{R e d}_{7} - B l u e}_{3})]}{{N I R}_{11} + [{R e d}_{7} 1 \times ({{R e d}_{7} - B l u e}_{3})]}$	Kaufman and Tanre [33]
DVI-1	${N I R}_{11} - {R e d}_{7}$	Richardson and Wiegand [34]
DVI-2	${N I R}_{12} - {R e d}_{7}$
EVI	$\frac{2.5 \times ({N I R}_{11} - {R e d}_{7})}{{N I R}_{11} + 6 \times {R e d}_{7} - 7.5 \times {B l u e}_{2} + 1}$	Huete et al. [35]
ISR-1	$\frac{{R e d}_{7}}{{N I R}_{11}}$	Fernades et al. [36]
ISR-2	$\frac{{R e d}_{7}}{{N I R}_{12}}$
MCARI	$[({R E}_{8} - {R e d}_{7}) - 0.2 \times ({R E}_{8} - {G r e e n}_{4})] \times {R E}_{8} \div {R e d}_{7}$	Daughtry et al. [37]
MSAVI-1	$[2 \times {N I R}_{10} + 1 - \sqrt{{(2 \times {N I R}_{10} + 1)}^{2} - 8 \times ({N I R}_{10} - {R e d}_{7})}] \div 2$	Qi et al. [38]
MSAVI-2	$[2 \times {N I R}_{11} + 1 - \sqrt{{(2 \times {N I R}_{11} + 1)}^{2} - 8 \times ({N I R}_{11} - {R e d}_{7})}] \div 2$
NDRE-1	$\frac{({N I R}_{10} - {R E}_{8})}{({N I R}_{10} + {R E}_{8})}$	Gitelson and Merzlyak [39]
NDRE-2	$\frac{({N I R}_{10} - {R E}_{9})}{({N I R}_{10} + {R E}_{9})}$
NDVI-1	$\frac{({N I R}_{11} - {R e d}_{7})}{({N I R}_{11} + {R e d}_{7})}$	Rouse et al. [40]
NDVI-2	$\frac{{(N I R}_{12} - {R e d}_{7})}{{(N I R}_{12} + {R e d}_{7})}$
OSAVI	$[1.16 \times ({N I R}_{11} - {R e d}_{7})] \div ({N I R}_{11} + {R e d}_{7} + 0.16)]$	Rondeaux et al. [41]
RDVI	${(N I R}_{11} - {R e d}_{7}) \div (\sqrt{{(N I R}_{11} + {R e d}_{7}})$	Roujean and Breon [42]
REP	$702 + 40 (\frac{(\frac{{R e d}_{7} + {N I R}_{10}}{2}) - {R e d}_{8}}{{R e d}_{9} - {R e d}_{8}})$	Guyot and Baret [43]
RVI-1	$\frac{{N I R}_{11}}{{R e d}_{7}}$	Jordan [44]
RVI-2	$\frac{{N I R}_{12}}{{R e d}_{7}}$
SAVI-1	$\frac{{(N I R}_{10} - {R e d}_{7})}{{(N I R}_{10} + {R e d}_{7} + 0.5)} (1.5)$	Huete [45]
SAVI-2	$\frac{{(N I R}_{11} - {R e d}_{7})}{{(N I R}_{11} + {R e d}_{7} + 0.5)} (1.5)$
SAVI-3	$\frac{{(N I R}_{12} - {R e d}_{7})}{{(N I R}_{12} + {R e d}_{7} + 0.5)} (1.5)$

¹ ARVI, atmospherically resistant vegetation index; DVI-1 and 2, difference vegetation index; EVI, enhanced vegetation index; ISR-1 and 2, infrared simple ratio; MCARI, modified chlorophyll absorption in reflectance index; MSAVI-1 and 2, modified soil-adjusted vegetation index; NDRE-1 and 2, normalized difference red edge; NDVI-1 and 2, normalized difference vegetation index; OSAVI, optimized soil-adjusted vegetation index; RDVI, renormalized difference vegetation index; REP, red edge position; RVI-1 and 2, ratio vegetation index; SAVI-1, 2 and 3, soil-adjusted vegetation index. ² Blue, blue reflectance; green, green reflectance; red, red reflectance; RE, red edge reflectance; NIR, near-infrared reflectance. Subscripts are the equivalent VENμS bands.

Table 4. Calibration and validation statistics: analysis by individual growth stage datasets and modelling approach (RF and SVR) using 21 VI variables ¹.

Growth Stage	Model	Calibration		Validation
Growth Stage	Model	R²	RMSE (t/ha)	R²	RMSE (t/ha)
Tillering-1	RF	0.94	0.3017	0.50	0.7335
Tillering-1	SVR	0.54	0.7057	0.50	0.7421
Tillering-2	RF	0.94	0.2953	0.53	0.7116
Tillering-2	SVR	0.55	0.6971	0.53	0.7208
Stem Elongation	RF	0.94	0.2727	0.61	0.6510
Stem Elongation	SVR	0.63	0.6358	0.61	0.6539
Booting	RF	0.95	0.2607	0.64	0.6264
Booting	SVR	0.66	0.6032	0.65	0.6177
Heading	RF	0.96	0.2186	0.74	0.5283
Heading	SVR	0.77	0.4989	0.74	0.5319
Flowering	RF	0.96	0.2181	0.75	0.5254
Flowering	SVR	0.77	0.4907	0.75	0.5183
Early Fruit Development	RF	0.96	0.2039	0.77	0.5008
Early Fruit Development	SVR	0.79	0.4696	0.77	0.5039
Ripening	RF	0.95	0.2653	0.61	0.6494
Ripening	SVR	0.61	0.6418	0.59	0.6709

¹ All models are significant at p-value < 0.001.

Table 5. Calibration and validation statistics: analysis by dataset groups and modelling approach (RF and SVR) using 21 VI variables ¹.

Dataset Group	Growth Stage Combinations	Model	Calibration		Validation
Dataset Group	Growth Stage Combinations	Model	R²	RMSE (t/ha)	R²	RMSE (t/ha)
Pre-heading Stage	Stem Elongation + Booting	RF	0.96	0.2241	0.72	0.5561
	Stem Elongation + Booting	SVR	0.75	0.5189	0.73	0.5465
	Tillering-2 + Stem Elongation + Booting	RF	0.97	0.2119	0.74	0.5353
	Tillering-2 + Stem Elongation + Booting	SVR	0.79	0.4788	0.75	0.5165
	Tillering-1, 2 + Stem Elongation + Booting	RF	0.97	0.2038	0.75	0.5210
	Tillering-1, 2 + Stem Elongation + Booting	SVR	0.82	0.4431	0.78	0.4917
Post-heading Stage	Flowering + Early Fruit Development	RF	0.97	0.1902	0.79	0.4810
	Flowering + Early Fruit Development	SVR	0.81	0.4462	0.79	0.4814
	Heading + Flowering + Early Fruit Development	RF	0.97	0.1798	0.81	0.4570
	Heading + Flowering + Early Fruit Development	SVR	0.84	0.4116	0.81	0.4507
All Data (Ripening excluded)		RF	0.98	0.1640	0.83	0.4257
All Data (Ripening excluded)		SVR	0.89	0.3437	0.86	0.3925

¹ All models are significant at p-value < 0.001.

Table 6. Calibration and validation statistics: analysis by dataset groups and modelling approach (RF and SVR) using 20 VI variables created using Sentinel-2 bands, matched to equivalent VENμS bands ¹.

Dataset Group	Growth Stage Combinations	Model	Calibration		Validation
Dataset Group	Growth Stage Combinations	Model	R²	RMSE (t/ha)	R²	RMSE (t/ha)
Pre-heading Stage	Stem Elongation + Booting	RF	0.96	0.2557	0.70	0.6280
	Stem Elongation + Booting	SVR	0.76	0.5452	0.72	0.5997
	Tillering-1 + Stem Elongation + Booting	RF	0.96	0.2476	0.71	0.6106
	Tillering-1 + Stem Elongation + Booting	SVR	0.78	0.5237	0.74	0.5835
Post-heading Stage	Flowering + Early Fruit Development	RF	0.97	0.2167	0.78	0.5379
	Flowering + Early Fruit Development	SVR	0.82	0.4728	0.79	0.5238
	Heading + Flowering + Early Fruit Development	RF	0.97	0.2121	0.78	0.5310
	Heading + Flowering + Early Fruit Development	SVR	0.83	0.4615	0.79	0.5190
All Data (Ripening excluded)		RF	0.97	0.2091	0.78	0.5287
All Data (Ripening excluded)		SVR	0.84	0.4434	0.79	0.5147

¹ All models are significant at p-value < 0.001.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chiu, M.S.; Wang, J. Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques. Remote Sens. 2024, 16, 3132. https://doi.org/10.3390/rs16173132

AMA Style

Chiu MS, Wang J. Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques. Remote Sensing. 2024; 16(17):3132. https://doi.org/10.3390/rs16173132

Chicago/Turabian Style

Chiu, Marco Spencer, and Jinfei Wang. 2024. "Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques" Remote Sensing 16, no. 17: 3132. https://doi.org/10.3390/rs16173132

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Local Field-Scale Winter Wheat Yield Prediction Using VENµS Satellite Imagery and Machine Learning Techniques

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Collection

2.2. VENμS Satellite Imagery and Preprocessing

2.3. Vegetation Indices

2.4. Yield Dataset

2.5. Machine Learning Regression Modelling and Cross-Validation

3. Results

3.1. Cross-Validation of Regression Models

3.2. Yield Prediction Using Regression Models

3.3. Ranked Importance of Vegetation Indices from Different Growth Stages

3.4. Visuallization of Predicted Yield

4. Discussion

Implications of Model Performance on Yield Prediction with VENμS Imagery

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI