**1. Introduction**

Water stress in vineyards is a common phenomenon that occurs in the Western Cape of South Africa during the summer [1]. Water stress promotes stomatal closure [2], which inhibits photosynthesis and transpiration, leading to an increase in vine leaf temperature [3,4]. Reduced water availability impacts on vine health and productivity, and ultimately on grape quality [5]. Additionally, under increased climate change scenarios, greater drought periods may be experienced in the near future [6], with this strain on water resources further inhibiting the development of grapes [5]. There is consequently an imminent need for the real-time monitoring of water stress in vineyards.

Remote sensing provides a fast and cost-effective method for detecting vineyard water stress [4], and can thereby help alleviate devastating losses in crop production [7] and safeguard high-quality grape yields [8]. Several studies, for example [7,9], have modelled water stress in vineyards using spectral remote sensing techniques. Plant leaves reflect the majority of the near-infrared (NIR) spectrum, with the majority of the visible (VIS) spectrum, i.e., 400–680 nm, being absorbed by plant chlorophyll pigments [3]. Water stress changes the spectral signatures of plants due to decreased photosynthetic absorbance [3], resulting in decreased NIR reflectance [10]. This phenomenon is known as the "blue-shift", where the red-edge (680–730 nm) shifts toward the VIS end of the spectrum [11]. Therefore, the red-edge position has subsequently been used to detect water stress in plants [10].

The high spectral resolution of hyperspectral (spectroscopy) data allows for a more detailed analysis of plant properties [11], and provides a non-destructive approach for assessing vineyard water stress [12]. Consequently, the application of hyperspectral remote sensing techniques to model vineyard water stress is becoming common practice in precision viticulture [8]. For example, De Bei et al. [12] used near infrared (NIR) field spectroscopy to predict the water status of vines using leaf spectral signatures and in-field leaf water potential measurements. Similar studies were conducted by [13,14]. All three studies found that wavebands ranging between the 1000–2500 nm were ideal for detecting the water stress of vines. Alternatively, studies conducted by Zarco-Tejada et al. [7] and Pôças et al. [15] successfully demonstrated the viability of the VIS and red-edge, i.e., 400–730 nm, regions of the electromagnetic (EM) spectrum to predict water stress in vines.

Moreover, the advancement of remote sensing technology in recent years has prompted an increased availability of hyperspectral imaging (imaging spectroscopy) sensors. Hyperspectral imaging integrates spectroscopy with the advantages of digital imagery [16]. Each image provides contiguous, narrow-band (typically 10 nm) data, collected across the ultraviolet (UV), VIS, NIR, and shortwave infrared (SWIR) spectrum; typically 350–2500 nm, coupled with high spatial resolutions; typically 1 mm–2 m [16,17]. A major limitation to the application of hyperspectral data is the inherent "curse of dimensionality" [18], which gives rise to the Hughes effect [19] in a classification framework [20]. High dimensionality can result in reduced classification accuracies [21], as the number of wavebands (*p*) are often many times more than the number of training samples (*n*), i.e., *p* > *n* [22]. However, using variable importance (VI) to create an optimised feature space, i.e., to create an optimal subset of input features, has been shown to be effective in reducing the effects of high dimensionality [23]. For example, Pedergnana et al. [20] exploited the RF mean decrease Gini (MDG) measure of VI to reduce the dimensionality of AVIRIS hyperspectral imagery. The study found that the subset selected based on RF VI produced an increase in accuracy of approximately 1.0%. Alternatively, Abdel-Rahman et al. [23] utilised the RF mean decrease accuracy (MDA) measure to rank the waveband importance of an AISA Eagle hyperspectral image dataset. The subset produced using MDA VI resulted in a 3.5% increase in accuracy. Contrary to this, Abdel-Rahman et al. [23] and Corcoran et al. [24] also utilised RF MDA values to create an optimal subset of features but observed a 4.0% decrease in accuracy. However, in both studies, it was concluded that RF VI could effectively be utilised to increase classification efficiency. Machine learning algorithms, such as Random Forest (RF) [25], have proven to be particularly adept at mitigating the Hughes effect (for example, see [22,26,27]). RF is an ensemble of weak decision trees used for classification and regression [22]. It uses bagging (i.e., bootstrap aggregation) and random variable selection to grow a multitude of unpruned trees from randomly selected training samples [25]. RF classification has recently gained significant recognition for its applications in precision viticulture. For example, Sandika et al. [28] used RF and digital terrestrial imagery to classify Anthracnose, Powdery Mildew, and Downy Mildew diseases within vine leaves. The study found that RF produced the highest accuracy with 82.9%, outperforming Probabilistic Neural Network (PNN), Back Propagation Neural Network (BPNN), and Support Vector Machine (SVM) models. Similar results were found by Knauer et al. [29] using RF and terrestrial hyperspectral imaging. RF produced an overall accuracy of 87% for modelling Powdery Mildew on grapes. Additionally, Knauer et al. [29] found that dimensionality reduction led to an increase in classification accuracy.

More recently, another tree-based classifier known as Extreme Gradient Boosting (XGBoost) [30], has shown considerable promise in various applications (for example, see [31–33]). XGBoost is an optimised implementation of gradient boosting [34], designed to be fast, scalable, and highly efficient [35]. Gradient boosting (or boosted trees) combines multiple pruned trees of low accuracies, or weak learners, to create a more accurate model [36]. The difference between RF and XGBoost is the way the tree ensemble is constructed. RF grows trees that are independent of one another [25], whereas XGBoost grows trees that are dependent on the feedback information provided by the

previously grown tree [30]. Essentially, each tree in an XGBoost ensemble learns from previous trees and tries to reduce the error produced in subsequent iterations.

Mohite et al. [37] is the only known study to have employed XGBoost classification in precision viticulture. The study used hyperspectral data to detect pesticide residue on grapes. Four classifiers were compared, i.e., XGBoost, RF, SVM, and artificial neural network (ANN). Additionally, the study also investigated the utility of LASSO and Elastic Net feature selection. Results indicated that RF produced the most accurate classification models when using both the LASSO and Elastic Net selected wavebands.

A review of the literature indicated that no study to date has investigated the use of terrestrial hyperspectral imaging to model vineyard water stress. Furthermore, no study has utilised RF or XGBoost classification to detect leaf level water stress in the precision viticulture domain. The aim of the present study was to develop a remote sensing-machine learning framework to model water stress in a Shiraz vineyard. The specific objectives of the study are to evaluate the utility of terrestrial hyperspectral imaging to discriminate stressed and non-stressed Shiraz vines, and investigate the efficacy of the RF and XGBoost algorithms for modelling vineyard water stress.

#### **2. Materials and Methods**
