2.4.2. Image Data

In past studies, k-means clustering was successfully applied for the analysis of soil data (e.g., [11,12,62]). In this study, clustering was carried out in the spectral domain using the k-means algorithm proposed by Hartigan and Wong [67]. Partitioning into clusters was performed using a random sample, consisting of 10% of the pixels of the respective raster stack [45,55] in order to reduce computational time. The maximum number of iterations allowed was set to 50, and three random sets were created as seed points. The resulting partition was used for prediction via the clue package for R [68].

Following Haburaj et al. [12], multiple combinations of the processed image data were used as input for the k-means clustering: All image derivatives created were combined with one another to assess the influence of each processing step regarding image noise and cluster quality. All combinations examined are presented in the supplemental material (Online Resource 1 in Supplementary Materials).

The number of clusters was set to 15, allowing the inclusion of potential disturbances and transitional layers in comparison to the six clusters of the sedimentological data. This rather high number of clusters leads to the necessity of subsequent manual grouping of the clusters with respect to the on-site stratigraphic documentation. This step was conducted in QGIS (v2.18). As argued by Haburaj et al. [12], manual grouping should be conducted attentively, as it allows for a more transparent documentation of the where and how of drawing borders between layers.

Evaluation of the image clustering results was carried out manually and by a visual interpretation of the homogeneity of the resulting clusters, as well as their conformity with the delineation of stratigraphic layers depicted in Figures 2 and 5.
