**3. Results**

#### *3.1. Spectral Smoothing Using the Savitzky-Golay Filter*

Figure 4 shows the results of smoothing the spectral data using the Savitzky-Golay filter. It is evident that the Savitzky-Golay filter produced smoothed spectra without changing the shape of the original spectra. Additionally, the filter successfully preserved the original reflectance values, with the mean difference in reflectance values being less the than 0.3% with a standard deviation of 0.003 across all wavebands. All spectra (*n* = 120) were subsequently smoothed, and the smoothed spectra used as the input to classification.

**Figure 4.** Spectra comparison before (red) and after (black) applying Savitzky-Golay filter.

#### *3.2. Important Waveband Selection*

The top 10% (*p* = 18) of importance wavebands as determined by RF MDA and XGBoost gain are shown in Figure 5A,B, respectively. The results in Table 2 show that RF selected wavebands across blue and green (473.92–585.12 nm) regions of the EM spectrum. In comparison, XGBoost selected wavebands across the VIS (473.92–646.04 nm) and red-edge (686.69–708.32 nm) regions. It is evident from Figure 5 that the location of the wavebands selected by RF and XGBoost are significantly different. We attribute the difference in waveband location to the difference in VI measures used

for RF and XGBoost. Nevertheless, as illustrated in Figure 5C, there were common wavebands selected by both RF and XGBoost. The overlapping wavebands (*p* = 6) were located across blue and green (473.92–585.12 nm) regions. Consequently, those wavebands may be the most important for discriminating between stressed and non-stressed Shiraz vines.

**Figure 5.** The importance wavebands as determined by RF (**A**); XGBoost (**B**); and overlapping (**C**). The grey bars represent the important wavebands selected by RF and XGBoost, respectively. The red bars indicate the overlapping wavebands. The mean spectral signature of a sample is shown as a reference.


**Table 2.** Location of the RF and XGBoost selected important wavebands in the EM spectrum.

#### *3.3. Classification Using Random Forest and Extreme Gradient Boosting*

The classification results for RF and XGBoost are shown in Table 3 Training accuracies for all models were above 80.0%, with test accuracies ranging from 77.6% to 83.3% (with KHAT values ranging from 0.60 to 0.87). Overall, the results indicate that RF outperformed XGBoost, producing the highest accuracies for all the classification models.


**Table 3.** Classification accuracies of both the RF and XGBoost models constructed using all the wavebands and the subset of important wavebands.

Using all wavebands (*p* = 176), RF yielded a training accuracy of 90.0% (KHAT = 0.80) and a test accuracy of 83.3% (KHAT = 0.67). In comparison, XGBoost produced significantly lower accuracies, i.e., a training accuracy of 85.0% (KHAT = 0.7) and a test accuracy of 78.3% (KHAT = 0.57). These results indicate that the XGBoost ensemble resulted in reduced accuracies (approximately −5.0%) when using all wavebands to classify stressed and non-stressed Shiraz leaves.

Using the subset of important wavebands (*p* = 18) resulted in an overall improvement in classification accuracies for the RF and XGBoost. Training accuracy for RF increased by 3.3% to 93.3% (KHAT = 0.87). However, the test accuracy remained unchanged. Although XGBoost produced less accurate results, it did experience a greater increase in accuracy (5.0%), producing a training accuracy of 90.0% and a KHAT value of 0.8. The greater increase in accuracy may be attributed to the red-edge wavebands that were only present in the XGBoost subset. Moreover, the XGBoost subset also produced a slight increase (1.7%) in test accuracy (80.0%, KHAT = 0.6). We attribute the superior performance of the RF algorithm to its use of bootstrap sampling [25], which introduces model stability, and its robustness to noise [50].

Classification using the Savitzky-Golay smoothed spectra resulted in reduced accuracies overall. The decrease in accuracy ranged from 0.7–3.3% for all models. Furthermore, according to the McNemar's test results, the difference in classifier performance was not statistically significant. For all the classification models, the chi-squared values were less than 3.84 with *v*2 values ranging from 0.14 to 1.29.
