1. Introduction
Wheat is one of the most widely grown cereal crops, covering about 220.62 million hectares (ha) worldwide in 2022/23 [
1,
2,
3]. During this period, the yield increased steadily, providing about 789.02 million metric tons globally [
4]. Wheat provides between 20% and 36% of calories for the world’s population [
5,
6]. The Food and Agriculture Organization (FAO) emphasises that the rapidly growing population and escalating demands for cereal production would require a 70% increase in cereal supply by 2050 [
1,
7,
8]. Monitoring wheat growth is essential for meeting future food demands and ensuring food security, which promotes sustainable agricultural management and enhances yields. However, variations in soil properties, agro-ecosystems, topography, and crop growth conditions within fields impact crop growth [
1,
7]. Furthermore, wheat growth is affected by variations in the intra-field soil properties; biological, physical, and chemical factors; and management practices [
7]. Accurate in situ measurements and establishing the distribution of soil properties within planted areas are crucial for understanding their impact on intra-field crop growth and promoting sustainable agricultural management [
9,
10,
11].
Soil physical and chemical properties regulate soil productivity, which influences crop development [
12]. Concurrently, infertile acidic soils are detrimental to crop development. Infertile soils are characterised by high aluminium (Al) toxicity, low pH (acidic), low microbial activity, low soil organic carbon, and a lack of essential chemical properties that hinder wheat growth at the early development stages [
5,
13]. These soil conditions and characteristics result in problems such as reduced root branching, deformed root tips, lodging, and the discolouration of leaf tissue with shades of yellow and purple [
13,
14]. Furthermore, wheat cultivated within infertile acidic soil experiences a reduced protein content and growth rate and lower yields, which result in reduced profits. Soil elements such as phosphorus (P), potassium (K), magnesium (Mg), calcium (Ca), sodium (Na), nitrogen (N), and pH are vital for crop growth and often exist in low concentrations in arid and semi-arid environments [
15,
16,
17,
18]. Deficiencies of N, P, and K in soil affect wheat growth and yield drastically [
19]. The influence of intra-field soil physiochemical properties’ variation and meteorological conditions are key factors on crop development across various crop stages. Other detrimental effects on wheat growth include abiotic stresses such as droughts, frost, waterlogging, salinity, high temperatures, and other natural calamities [
20,
21]. The biotic factors, which include the infestation of diseases, competing weeds, and pests, are common challenges for crop development [
22,
23,
24,
25].
There are various vegetation indices derived from the red and near-infrared (NIR) bands, which aid in the understanding of vegetation absorption and reflectance properties. These vegetation indices are commonly used in monitoring crop development, growth, and associated stresses during various phenological stages of the crop for timely interventions in farm management [
26,
27,
28]. In addition to vegetation indices derived from satellite products, UAV-derived indices can also aid in detecting the intra-field spatial variability of wheat crop growth with a higher spatial resolution and accuracy compared to most satellite products. The existing conventional methods (i.e., scouting and automated observation systems using computer vision) to monitor crop growth variation do not accommodate vegetation indices to model and predict intra-field crop growth. Furthermore, traditional methods are time-consuming, labour-intensive, and unrealistic for time-series modelling required by large-scale farms. They usually result in many forms of inaccuracies associated with human survey errors [
29]. Recent developments of UAVs in remote sensing provide an efficient, non-destructive, and rapid alternative approach that can provide cost-effective time-series data of vegetation indices for modelling crop growth variability [
29,
30]. However, the reflectance can be greatly affected by the surface temperature, atmospheric distortions, water content, saturation, landscape heterogeneity, and vegetation type, which can affect the modelling accuracies of actual crop growth [
27,
31]. Moreover, coarse spatial resolution satellite imagery limits the regression model estimation accuracy due to spectral mixing of different classes [
32]. Combining high-resolution UAV-derived vegetation indices with in situ soil properties’ data can enhance crop growth modelling. For example, a Belgian case study confirmed that soil properties account for 15 to 26% of the wheat growth variance using machine learning methods with UAV imagery [
7]. A case study of Southwest Montana in the USA has successfully predicted accurate soil properties and wheat growth variation using machine learning algorithms and vegetation indices derived from UAV imagery [
13]. The integration of UAV-based imagery and elevation data improved modelling accuracies based on machine learning methods for wheat height growth and above-ground biomass in Fengling Reservoir fields in China [
33]. Nevertheless, there is still a lack of methods, which integrate the multiple factors influencing plant growth as well as quantifying their importance in modelling.
The common modelling approaches include parametric and non-parametric regression for crop biophysical parameter estimation. These include partial least squares regression (PLSR), random forest (RF), support vector machine (SVM), extreme gradient boosting (Xgboost), conditional inference forest (CI-forest), artificial neural network (ANN), least squares linear regression (LSLR), multiple linear regression (LR), neural network (NN), decision tree (DT), regression tree (RegT), K-nearest neighbour (KNN), boost tree (BST), and bagging tree (BagT) ensemble learning algorithms [
7,
18,
30,
34,
35,
36]. PLSR provides a high level of interpretability and can overcome problems of collinearity in modelling, enhancing the accuracy of the model [
9,
17]. However, other studies suggest that PLSR is not always adequate for modelling the relationship between soil properties and crop height because this relationship is not always linear [
37,
38,
39]. This limitation has contributed to the rising need for exploring the use of nonlinear machine learning algorithm (MLA) methods and other models. RF has the capabilities to classify and handle complex data with continuous values, but it is not robust and sensitive to outliers, which can cause overfitting or poor generalisation, and it does not address collinearity when applied with large or small input data [
40,
41,
42]. SVM has similar merits and demerits to RF, except that it uses kernel-based functions for mapping input features at higher dimensional space and exploits support vectors for fixing regression fitting [
43]. In general, several MLAs such as NNs, RF, SVM, KNN, RegT, and Xgboost often experience black box problems, among others [
44,
45,
46]. Meanwhile, GPR has the capability of overcoming the black box challenges by employing kernel functions, which offer uncertainty estimates for model predictions across a spectrum of data inputs, ranging from simple to highly complex [
45,
46]. Kernel-based regression algorithms such as GPR are superior to several MLAs in retrieving modelling accuracy [
47,
48]. Few studies have reported the feasibility of kernel-based methods in modelling wheat biophysical variables such as crop height using time-series vegetation indices’ data for an entire season [
49]. A multispectral sentinel-2 dataset has shown a potential estimation of crop biophysical variables such as the plant height, leaf area index (LAI), leaf chlorophyll content (LCC), fraction of absorbed photosynthetically active radiation (FAPAR), fraction of vegetation cover (FVC), and canopy chlorophyll content (CCC) using random forest tree bagger (RFTB), BagT, LSLR, PLSR, and GPR [
46,
49,
50,
51]. However, studies focusing on soil properties and UAV datasets that have a high spatial, spectral, and temporal resolution are lacking. UAVs have a high potential for estimating field-scale wheat growth.
This study addresses a gap in the existing literature by focusing on the integration of high-resolution UAV-derived vegetation indices with in situ soil properties’ measurements [
7,
13]. This could contribute to a more comprehensive understanding of the factors influencing wheat growth and enhance the modelling accuracy. Furthermore, this study aims to investigate machine learning regressions such as GPR, ER, DT, and SVM for predicting wheat crop height using a combination of UAV-derived vegetation indices and soil properties. By considering multiple factors simultaneously, the research aims to fill a gap related to the absence of holistic approaches in previous studies that often focused on individual aspects of wheat growth. Additionally, while previous studies have explored UAV imagery and soil properties, this study specifically aims to address the gap in research focusing on field-scale wheat growth variability [
7,
14,
35]. This may involve considering the spatial, spectral, and temporal resolution of data to provide more detailed insights into wheat height variability patterns. The main objectives of this study were to (1) investigate and understand the contribution of soil properties and vegetation indices in modelling crop height of heterogeneous winter wheat planted in a dryland environment, and (2) assess the prediction accuracy changes when using the vegetation-index-only scenario and combined vegetation indices with soil properties scenario for wheat crop height. Although experiments were conducted in South Africa, the techniques developed in this study can be tested in other semi-arid regions as well. An example is Australia, which is a major producer of wheat and is also facing a decline in wheat production [
52]. India is also a significant wheat-producing country with diverse agro-climatic zones and unique challenges related to smallholder farming, decreasing soil nutrients, and issues in water resource management [
53].
4. Discussion
According to descriptive statistics, the soil pH of both farms is acidic, ranging from 3.5 to 6.94. The range of the pH conforms with previous findings that indicate a pH of about 5.5 in the study area [
105]. This scenario is anticipated in dryland wheat production within the study area. Low pH is detrimental to wheat growth [
106,
107]. Furthermore, results revealed that soil properties, particularly Ca, Mg, K, and clay, have a moderate positive correlation with wheat crop height. Similar findings from other studies have revealed that an abundance of soil chemical properties such as K, Mg, and Ca have an influence on the wheat crop height throughout the growth period [
7,
21,
59]. Other studies also demonstrated that the clay content, silt%, and pH values are more significant factors influencing plant growth [
8,
104]. Vegetation indices showed a weak positive correlation with wheat crop height. In contrast, other studies found a strong correlation between vegetation indices and wheat height and yields [
108,
109]. Additionally, the correlation increases as the winter wheat grows [
108]. Ordinary kriging is widely known for its ability to generate spatial interpolation maps in precision agriculture applications [
110,
111,
112]. This study confirmed that ordinary kriging is a robust method to produce soil property maps for both farms based on the low cross-validation RSME of semi-variogram models.
Four predictive machine learning models were evaluated. Results show that the GPR model outperformed ER, DT, and SVM models when predicting crop height at the wheat farms. The GPR model prediction accuracy results ranged between 65% and 75% for wheat height in the entire season. These results are better than findings from previous studies that obtained a 13% to 84% prediction accuracy for monitoring winter wheat growth using the PLSR model during the entire growing season [
113]. Other studies have found 68%, 88%, and 90% prediction accuracy of field-scale wheat biophysical variables, wheat yield, and wheat plant nitrogen density using the GPR model [
44,
45,
114]. These findings are similar to previous studies that showed the higher capabilities of GPR modelling performance compared to algorithms such as LR, RF, PLSR, LSLR, BagT, KNN, DT, NNs, ANN, and RegT when estimating different crop biophysical parameters [
34,
44,
46,
114]. Furthermore, the result showed that the GPR model has a lower prediction accuracy with soil and UAV imagery derived from data fusion compared with the UAV vegetation indices scenario. In contrast, previous research showed that hyperspectral UAV and soil data fusion improve GPR modelling precision while providing more accurate results with vegetation indices for estimating wheat above-ground biomass [
42,
115]. The improved performance of GPR can be linked to its use of kernel functions when dealing with input [
46,
47,
82]. Furthermore, GPR is flexible and reduces the potential of overfitting with highly dimensional observations in crop parameter estimation [
42,
50,
116,
117]. In contrast, other studies show that PLSR and SVM achieved the highest prediction modelling accuracy compared to GPR for wheat crop height, above-ground biomass, and wheat yield [
35,
44]. Additionally, ANN and RF have outperformed the GPR model for plant height and biomass estimation in previous studies [
118,
119]. However, the robustness of the MLA model depends on the amount of input data and its features to calibrate nonlinear and complex data structures [
42,
45,
46]. Despite the advantages of the GPR model such as the kernel function when dealing with the input training data, it cannot be generalised that GPR always performs better than other machine learning models.
The GPR model variable importance analysis indicates that RENDVI is vital for predicting wheat crop height. A similar study revealed that vegetation indices such as the enhanced vegetation index (EVI) performed better than soil properties in modelling crop height [
44,
113]. These findings showed that the vegetation indices, especially those using the red-edge band, are superior for forecasting crop growth. Several studies have concluded that red-edge bands are anticipated to have a higher-ranking variable of importance in predicting crop growth because of their higher sensitivity in crop changes [
31,
120]. Moreover, the wheat crop height changes throughout the season could have influenced the top ranking of RENDVI computed with red-edge bands in the current study. Meanwhile, this study showed that soil properties play a lesser role when estimating wheat crop height. pH had a lower ranking in all soil properties used to estimate crop growth. All other soil properties such as sand, clay, Na, Mg, Ca, K, and P showed no contribution to the GPR variable importance. However, previous studies highlighted contrasting findings that pH and K are top-ranking soil properties [
7,
14]. In addition, random forest variable importance has revealed that the Ca_Mg ratio ranked highly compared to other soil properties and vegetation indices in soil organic carbon content [
121]. It is worth noting that clay plays a very important role in growing crops, whereas sand is not an ideal environment for growing crops [
104]. The changes within findings of variable importance are attributed to differences in the model input predictor variables. Understanding the different growth stages helps farmers plan and implement appropriate agricultural practices, such as timing irrigation, fertilization, and harvesting. The techniques developed in this study can be used in other semi-arid regions facing challenges related to optimising crop yield, resource management, and sustainable agriculture practices [
52].
This study highlights the importance of vegetation indices and soil properties to predict crop height, which provides valuable information about basic crop management. However, the limitation of this research includes high fieldwork costs that resulted in one visit per month for data collection at different crop development stages. This study focused on time-series modelling but may not fully capture the temporal dynamics of wheat growth. The effects of short-term environmental fluctuations and seasonal variations on crop growth may not be adequately addressed. This study acknowledges that vegetation indices’ reflectance can be affected by various factors such as surface temperature, atmospheric distortions, ambient light, water content, and vegetation type. These factors could introduce uncertainties in the accuracy of the models. We recommend incorporating climate data, soil indices, and environmental variables for a holistic understanding of crop growth while optimising model estimation accuracy. Furthermore, we recommend to investigate the benefits of fusing data from multiple sensors, such as thermal imaging, LiDAR, and hyperspectral sensors. This can provide a more comprehensive characterisation of crop health and growth status.