*2.5. Cluster Interpretation*

Finally, clustering results were compared with the reference classes of plant vigour in 2017, and the plant health shifts between 2017 and 2018, via a two-step analysis: firstly, mosaic plots were used to visualise and explore possible correlations between clusters and reference data; secondly, predictive statistics were evaluated based on confusion matrices considering the result of the correlation analysis.

Mosaic plots can display the relationship between categorical variables using rectangles whose areas represent the proportion of cases for any given combination of the multivariate categorical data [50]. Paired with residual analysis, such as Pearson Xˆ2, the significance of such a correlation can be estimated [51]. A cluster can be considered representative of a reference class if it is positively correlated with only one class and negatively correlated with all the others.

Predictive statistics (sensitivity, accuracy, precision and F1 score) were then evaluated for combinations (reference x cluster) whose correlation was significant (*p* > 0.05) and meaningful (Figure 4). Therefore, vigour classes were compared to the clusters obtained from multispectral data, while thermal data were used for disease outbreak prediction. If the number of clusters was inferior to the number of reference classes, the latter was reduced to match the former, aggregating classes of the reference data (Figure 4). Reclassification was performed by merging the neighbour classes reported in Table 2 only if they fit the following criteria: (i) they were correlated (or at least highly represented) with the same cluster, (ii) the overall accuracy of the confusion matrix increased after class aggregation, and (iii) the new aggregated classes preserved an internal logic. Therefore, the four reference classes for plant vigour (Table 2 and Figure 4) were reduced to three and two by respectively merging the V3 and V2 classes (weakened plants) and V4, V3 and V2 classes (plants with leaves). The new aggregated reference data for vigour were then compared via multispectral clustering using three or two clusters respectively (Figure 4).

The same criteria adopted for the aggregation of vigour classes were applied also for the plant health shifts, which were compared with two clusters derived from temperature data. The plants that remained asymptomatic without changing their vigour in 2018 (S1) were merged with those displaying slightly reduced vigour without showing symptoms on the leaves (S2), creating a new class characterised by plants that remained unharmed, whereas the plants that died in 2018 (S4) or were heavily compromised (S3) were grouped in a new class composed of plants that most likely were already diseased in 2018 (Figure 4).

All the analyses regarding decorrelation, segmentation, clustering and statistics of the clusters were performed with RStudio [52].
