2.3.4. Classification

Five different training sets were independently used to train CNNs with the structure described in Table 2: first, a binary training set for the identification of non-vegetation objects; second, a multispectral training set with the identified four vegetation units for *G*1*T*3; third, a multitemporal training set for *G*1; fourth, a multispectral training set with the three vegetation units for *G*2*T*<sup>3</sup> and last a multitemporal training set for *G*2. For the monotemporal classification, both *G*1*T*<sup>3</sup> and *G*2*T*<sup>3</sup> were chosen, as they are closest to the harvest date in each growth and therefore most relevant for agricultural purposes. The models trained on vegetation units were used to classify the whole orthomosaic via a moving window approach to select and classify squared subimages. For both monotemporal models, each subimage was first classified with the object identification model to exclude misclassifications and then classified by the monotemporal model. The subimages of the multitemporal models were not pre-classified with the object identification model, since it was assumed that misclassifications of objects that only appear at a specific date can be avoided by the multitemporal features. The classification results of the subimages were aligned and rasterized with *n* channels. This workflow is depictured in Figure 2.

**Figure 2.** Schematic workflow of preprocessing, training, validation, and classification.

### 2.3.5. Validation Metrics

For evaluation of the classification model and the generated maps both dependent test data, which were 25% of the augmented samples set aside prior to training, and independent data, which were the resampled spectral information of the observed plots, were used. The number of true positives (*tp*), true negatives (*tn*), false negatives (*fn*), and false positives (*fp*) were calculated by using confusion matrices for each classification and for both the dependent and independent test data. The threshold for class probability was

set to 50%; classification results below this threshold were listed as misclassification. The following metrics were used to estimate the performance of the models [54]:

$$Precision = \frac{t\_p}{t\_p + f\_p} \tag{1}$$

$$Recall = \frac{t\_p}{t\_p + f\_n} \tag{2}$$

$$Overall\ Accuracy = \frac{t\_p + t\_n}{t\_p + t\_n + f\_p + f\_n} \tag{3}$$
