2.3.2. Data Normalization

Plant geometry can present severe distortions due to varying leaf angles, leaf distances to the camera, and specular reflections on particular parts of the leaves. To compare the reflectance characteristics, omitting the additive and multiplicative factors, the standard normal variate (SNV) has been developed [33]. It is able to remove scaling factors due to varying distance or leaf angle, as well as additional factors like specular reflection, e.g., on leaf tips. The normalization was performed on both the ground canopy and field data. The SNV representation was calculated per spectrum *S* and focuses the shape of the spectral curve:

$$SNV = \frac{S - mean(S)}{std(S)},\tag{2}$$

#### *2.4. Prediction Algorithms*

Multiple algorithms can perform predicting a class or continuous value based on features of a sample. In general, they use a vector representation as input. In this study, the classifiers spectral angle mapper (SAM) and support vector machine (SVM), as well as the regression algorithm—support vector regression (SVR)—were applied to the ground canopy data (taken with the phytobike). To train and evaluate the models, four images of one measuring day were annotated to be used as training data and four images were annotated to be used as test data. The number of annotated pixels differed in the different images due to natural heterogeneity in the crop stand. Pixel numbers were at least several thousand for each class, up to several hundred thousand pixels for all classes in one image. Based on the huge number of annotated pixels, models were trained on a subsampled data set, to make them trainable and to rebalance the classes. With the exception of the water class, all classes were trained with 1000 samples per class after subsampling of training data. The SAM was used because it has been described in the literature to work resiliently under inhomogeneous light conditions [34]. The development of the classification model was easy and fast. The SVM was used because, in theory, it is trained on the whole data set and considers the spectrum of each pixel as training data. Vegetation indices (VIs) were used because various published works have focused on VIs as tool for disease detection. VIs can be seen as established representatives for optical measurements of plant parameters. The models were trained using three data representations: full spectra, SNV normalization, and 20 spectral VIs. The results were compared to a SAM that represented the base line accuracy. The comparison was performed on the YR test data from 23 May 2018. The evaluation of different feature representations showed a small advantage of SNV normalizations, whereas it was treated as standard representation in the following. As performance measures, we applied the overall accuracy using six classes for the model, combining the background and the old leaves/straw class. Furthermore, we evaluated the F1 score (Table 1) for the class disease, providing a homogenized combination of precision and recall. The F1 score declares the number of pixels of one class that are correctly classified

into this class after the formula 2 × (precision × recall) ÷ (precision + recall)). The two performance measures corresponded in tendency; however, the F1 score decreased faster as the large number of background pixels stabilized the overall accuracy.

**Table 1.** Comparison of evaluation parameters obtained on test data for different data representations and prediction algorithms on the ground scale for the support vector machine (SVM) and the spectral angle mapper (SAM).

