*3.2. Clustering Interpretation*

Based on the mosaic plot results, the accuracy in estimating the vigour classes or plant health shifts was tested via a confusion matrix (Figure 4). The clustering of multispectral data performed similarly for both methods, with better statistics for K-means compared to hierarchical clustering. Dead plants (V1) were the class with the highest predictive statistics for all the numbers of clusters tested in both methods, obtaining the best score set when four clusters were used. By reducing the number of clusters, the precision dropped to a minimum of 60%, but the sensitivity increased greatly (98%). The same behaviour was observed for plants with high vigour (V4) that were correctly classified into clusters KM4 or HM4 and KM3 or HM3, when the data were grouped into four or three clusters, respectively.

**Figure 4.** Confusion matrices of unsupervised clustered data for the assessment of disease spreading (plant vigour) and the prediction of future outbreak (plant health shifts). K-means (**<sup>a</sup>**–**<sup>c</sup>**) and hierarchical clustering (**<sup>e</sup>**–**g**) performance in assessing disease spreading using 4, 3 or 2 clusters, and using multispectral data. (**d**) and (**e**), performance of thermal clusters in predicting future outbreaks using 2 clusters, for K-means and hierarchical clustering, respectively. In (**b**–**d**) and (**f**–**h**) reference data were merged if the number of clusters was inferior to the number of reference classes. In (**b**) and (**f**), a class was created aggregating weakened and diseased plants (V3 and V2). In (**c**) and (**g**), the three classes with leaves (V4, V3 and V2) were grouped together. In (**d**) and (**h**), plants that remained asymptomatic until 2018 (left) were split apart from those that showed wilting symptoms in 2018 (right). Abbreviation: Sesn, Sensitivity; Prec, Precision; F1, F1-score; Acc, accuracy; KM1-KM4 and KT1, KT2 multispectral and thermal clusters.

For the K-means method, using four clusters was the approach that showed the best association with plant vigour. Each cluster was associated with a single class with an accuracy above 73% (Figure 4a). In particular, the extreme classes (V1 and V4) were precisely identified (precision above 78%), with a low number of false positives. Misclassification errors mostly occurred between neighbour classes, mostly due to false positives that occurred in the association of cluster KM3 with the vigour class V3. Reducing the number of clusters to three, the central cluster KM2 was associated with weakened plants (V3 and V2) with an accuracy of 73% and a precision above 82% (Figure 4b). Finally, using only two clusters, it was possible to distinguish between dead plants with no canopy and plants with leaves (Figure 4c).

Hierarchical clustering was more sensible than K-means clustering for highly vigorous plants (V4), especially when four clusters were used (Figure 4e). The classification of the middle classes V3 and V2 was the major reason for errors. Indeed, with four clusters, this caused a reduction of the accuracy by as much as 0.68% and 0.67%, when V3 and V2 classes were respectively associated with HM3 and HM2 (Figure 4e). By reducing the number of clusters to three (Figure 4f), the association between weakened plants (V3+V2) and the middle cluster HM2 still existed (precision 81%), but the accuracy was reduced by the association of some plants of class V3 (diseased) with the HM1 cluster that was highly correlated with dead plants as well (V1) (Figures 2e and 4f). Finally, the predictive statistics obtained with two clusters were almost equal to those obtained by the K-means method (Figure 4g).

The predictive capability of thermal data was tested for health shifts, which occurred in plants that were highly vigorous in 2017 (Figure 4d,h). The best predictive capability for disease spread was observed when clustering was performed with only two clusters, which allowed discrimination between plants that showed wilting and those that did not. The performances of the two methods were similar, with K-means being slightly better than hierarchical clustering (Figure 4d,h).
