2.4.2. Accuracy Assessment

Accuracy assessment was performed per canopy extraction method on the datasets for 21 July and 12 August. All orchard pixels were reclassified into two categories using the final extraction layer per image as pure-canopy and non-canopy pixels. Synchronization between the final extraction layer (thermal or other) and the RGB ground truth image was verified. One hundred sample points were divided equally between these categories. A different set of 100 sample points was distributed for each of the four methods and two dates. A total of 800 sample points were used in the analysis. Ground truth validation was visually determined per sample point with the original RGB image from each respective date, 21 July and 12 August. The ensuing confusion matrix included the following: sample points that were correctly classified as canopy pixels (true positive—TP); sample points that were classified as canopy but were actually non-canopy pixels (false positive—FP); sample points that were correctly classified as non-canopy pixels (true negative—TN); and sample points that were classified as non-canopy but were actually canopy pixels (false negative—FN). The following parameters were calculated, enabling the evaluation of canopy extraction quality: overall accuracy (Equation (2)); precision (Equation (3)); recall (Equation (4)); and F1-score, which is the harmonic mean of precision and recall [33] (Equation (5)):

$$\text{Overall accuracy} = (TP + TN) / (TP + TN + FP + FN) \tag{2}$$

*Precision* = *TP*/(*TP* + *FP*) (3)

$$Recall = TP / (TP + FN) \tag{4}$$

$$F1 - score = 2 \times (Precision \times Recall) / (Precision + Recall) \tag{5}$$
