**4. Discussion**

Ensemble classifiers, like RF and XGBoost, have been widely used to address the classification challenges inherent in high dimensional data [52]. The present study evaluated the use of terrestrial hyperspectral imaging to model vineyard water stress. More specifically, we tested the utility of two tree-based ensemble classifiers, namely RF and XGBoost, to model water stress in a Shiraz vineyard. The experimental results are discussed in further detail in the following sections.

#### *4.1. Efficacy of the Savitzky-Golay Filter*

The Savitzky-Golay filter has become a popular algorithm for smoothing spectroscopic data (for example, see [43,47,60]). In this study, the filter proved adept at smoothing the hyperspectral signature without significantly altering the originality of the input data. However, the results of this study showed that the filter negatively impacted the classification accuracy, producing reduced accuracies for RF ( −1.6%) and XGBoost ( −3.3%). The decrease in classification accuracy may be attributed to the specific parameter values used to implement the filter. The study only meant to test the functionality of the Savitzky-Golay filter. Therefore, the filter was implemented using the hyperparameter values as recommended by [47]. Consequently, the recommended values may not be optimal for the specific dataset used in this study.

Carvalho et al. [61] utilised the Savitzky-Golay filter to smooth magnetic flux leakage (MFL) signals. Similar to our study, the authors found that using the smoothed data with an ANN classifier, resulted in reduced classification accuracies. It is, therefore, evident that careful consideration has to be taken when applying the Savitzky-Golay filter.

#### *4.2. Classification Using All Wavebands*

Both tree-based ensemble classifiers tested in our study successfully demonstrated their efficiency for analysing hyperspectral data. However, our analysis found the RF bagging ensemble to outperform the boosting-based XGBoost ensemble when using all wavebands (*p* = 176).

Published comparisons between RF and boosting classifiers, similar to XGBoost, have reported mixed results. For example, Miao et al. [62] found that RF (93.5%) and AdaBoost (95.3%) produced similar overall accuracies when classifying ecological zones using multi-temporal and multi-sensor data. This is contrary to [62,63], which reported that RF outperformed boosting ensemble classifiers when classifying RADARSAT-1 imagery. Moreover, when directly comparing RF and XGBoost, within the context of spectroscopic classification, our findings contradict the results reported by [31]. Their study reported that XGBoost (96.0%) yielded significantly better results than RF (87.0%) when classifying supernovae. However, it should be noted that their study optimised RF and XGBoost parameters. More specifically, within viticulture, the results of our study compare favourably to those reported by [37]. The authors found that RF (87.8%) produced an improved accuracy compared to XGBoost (81.6%) when using hyperspectral data in combination with feature selection. A review by [50] concluded that RF generally achieves greater accuracies compared with boosting methods when used for the classification of high dimensional data such as hyperspectral imagery.

When comparing the utility of both algorithms, a key advantage shared between them is that RF and XGBoost effectively prevent overfitting [25,30]. However, given that RF grows trees independently (i.e., parallel to one another), whereas XGBoost grows trees sequentially, it is less complex and, therefore, less computationally intensive. Furthermore, RF requires the optimisation of only two parameters [25], whereas XGBoost has various parameters that could be optimised for a given dataset [30].

#### *4.3. Classification Using Subset of Important Wavebands*

Dimensionality reduction of hyperspectral data using machine learning has been extensively researched (for example see [20,22,23]). The results of our study indicate the VI ranking provided by RF and XGBoost can successfully be used to select a subset of wavebands for classification. This was evident from the increased accuracies obtained for both RF and XGBoost.

Our results compare favourably to those reported by [20,23], who demonstrated the feasibility of VI to reduce the high dimensionality of hyperspectral data and improve the classification accuracy. We, therefore, attribute the improved classification performance to the subset of most important wavebands. Although the subset of important wavebands did not result in massive accuracy gains (accuracy increase of RF ranged from 1.7% to 3.3% and from 0.7% to 3.3% for XGBoost), it did improve classification accuracy using only 10% of the original data. The majority of important wavebands, for RF (*p* = 9) and XGBoost (*p* = 10), were located in the green region of the EM spectrum (Table 2). The selected wavebands correspond to similar wavebands reported by [7,15]. The green region (i.e., between 500–600 nm) is highly sensitive to plant chlorophyll absorption [15]. Consequently, water stress in plants is closely related to lowered chlorophyll leaf concentrations [15], which can present a possible explanation for the selection of these wavebands.

Moreover, Shimada et al. [10] reported the use of the blue (490 nm) and red wavebands (620 nm) as indicators of plant water stress, and these wavebands correspond to similar wavebands present in the XGBoost subset (484.06 nm and 630.23 nm). In this study, only three red-edge wavebands (Table 2) were selected by XGBoost with none selected by RF. These results contradict those reported by [4], which found that wavebands in the red-edge region (695–730 nm) were ideal for early water stress detection in vineyards. However, given the overlapping wavebands that occur in the blue and green regions, and the results of our study, we can conclude that the red-edge wavebands may not be

important for discriminating between stressed and non-stressed Shiraz vines. The results of this study subsequently demonstrate the feasibility of VIS wavebands to model water stress in a Shiraz vineyard.

Various aspects of the current research lend themselves to be operationalised within precision viticulture. For instance, the developed remote sensing-machine learning framework can be readily applied to model vegetative water stress. Furthermore, the identification of important wavebands can potentially lead to the construction of custom multispectral sensors that are less expensive and application specific.
