5.1. Classification of the LiDAR Points
A Random Forest classifier was used to classify LiDAR ground points either as ‘vLE’ or as ‘other non-ground’. The Random Forest classifier was chosen because it has proven to be successful at classifying LiDAR points as belonging to landscape features [
27,
40,
55]. The Random Forest classifier showed an overall accuracy between 0.92 and 0.97 in the testing phase over the three study areas. This is comparable to the accuracy of 0.97 found by Lucas et al. [
27] who used a Random Forest model to distinguish vegetation LiDAR points from other LiDAR points.
A recall of the vLEs between 0.86 and 0.91 was calculated. False negative vLE LiDAR points mainly occurred at the outer ends of a vLE object. A possible explanation for this is the time lag between the recording of the LiDAR data and the ground measurements which could have resulted in reference 2D objects that are larger than they would have been in case the reference data were recorded simultaneously with the LiDAR data acquiring. Further, the values of the neighborhood features of points at the outer ends of a vLE object are influenced by spatially nearby non-vLE ground points and can therefore be falsely classified as such since the algorithm assumes that the class labels of neighboring points are correlated [
41]. Other false negative vLE LiDAR points were observed when few LiDAR data points were located within the vLE reference object. An example of this is a field with 50 young trees in SA1 (
Figure 4B). Of these 50 trees, LiDAR points of only ten of these trees were classified as vLE points. The average LiDAR point count of these ten objects is noticeably higher than the average point count of the objects of which the LiDAR points were not correctly classified (212 compared to 77). This suggests that points of objects are classified more correctly when the vegetation object has a more dense point cloud and therefore classification results would improve when the LiDAR data was acquired during the leaf-on period.
A precision of the vLE class between 0.94 and 1.00 was calculated. False positive vLE LiDAR points mainly occurred at field margins. Field margins are typically characterized by a strip of weeds, especially those margins with a fence or terrace slope. Removing LiDAR points in fields with fruit tree orchards increased the precision of the vLE class from 0.10 to 0.94 in SA1 as 15.45% of this area is covered by fruit tree orchards.
5.2. Clustering and Segmentation of the vLE Points
The LiDAR points classified as vLE points by the Random Forest classifier in the testing phase were used to create clusters by means of the density-based clustering algorithm DBSCAN [
48]. This algorithm has proven to be successful in clustering LiDAR point data in a number of applications [
56,
57,
58]. These clusters were subsequently converted into 2D objects (i.e., the modelled objects) by applying the alpha-shape algorithm [
51] per cluster of points, with the index α set to 0.10. This resulted in 18, 34, and 12 2D modelled objects for SA1, SA2, and SA3 respectively, while 62, 36, and 40 reference objects were identified in SA1, SA2, and SA3 respectively. In SA1, the difference between the number of modelled and reference objects can be explained by the Random Forest algorithm incorrectly classifying vLE LiDAR points as other non-ground points. When overlaying the modelled and reference objects, 46.43% of the total area of the reference objects of SA1 is missing in the modelled objects. For SA2 and SA3, the surface area of the reference objects missing in the modelled object dataset is only 14.95% and 6.16% respectively. The difference between the number of reference and modelled objects in SA3 is primarily the result of several reference objects being merged into one larger modelled object.
In SA1, the general flow direction goes from southwest to northeast and the modelled 2D objects are mostly positioned along this flow direction. In SA2, the general flow direction goes from northeast to southwest. The modelled 2D objects are position both alongside the flow direction and perpendicular to it. In SA3, the general flow direction goes from west to east. The modelled 2D objects are position both alongside the flow direction and perpendicular to it, creating a natural border between higher positioned agricultural parcels and the stream flow path. The position of the vLE objects along these flow paths indicates that they can be of importance in the catchment’s hydrology and that it is meaningful to make a distinction between different types of vLEs that can have a distinct effect on the hydrological cycle. Previously developed applications do not make this distinction between different types of vLEs [
27,
59].
5.3. Classification of the 2D Objects
A set of features was calculated for each 2D vLE object (
Table 4). These features are based on the geometry of the 2D object, the geometric and radiometric characteristics of the LiDAR points within the 2D objects, and the 3D point distribution of the LiDAR points within the 2D object. A Logistic Regression model was trained to calculate the probability of the 2D objects being either a shrub element or not, in which case the vLE object was classified as a tree element. A probability threshold value of 0.5 was applied to distinguish the two classes. This Logistic Regression model was applied by using all features or a set of features selected by using the SP
70 and RFE feature selection models. The selected features varied between the two feature selection models and the three study areas (
Table 6). The features Z
Pmax and Z
10/Z
2 were selected in all three study areas using both feature selection methods while the overall point density (P
Dens) was only selected using the RFE feature selection model in SA1 and SA3. This shows the importance of the point distribution in the 3D space when distinguishing between tree and shrub objects. The same is observed when trying to distinguish different tree species [
60]. The feature h
min was also selected in all three study areas using both feature selection models while h
max was omitted in all cases. This indicates there is no clear difference between the height of shrub and tree objects while the height at which foliage starts to grow differs between both object types.
To assess the performance of the classification, the classes of the overlapping segments of the reference and modelled 2D objects were analyzed (
Table 7). In SA1, the Logistic Regression classification performed best when only the features selected by the RFE feature selection model were used. However, for all three models, the precision for the ‘Tree element’ class is low (0.21) meaning a high rate of false positive tree elements was modelled. Because of the large portion of shrub elements in SA1 (90.56% of the total surface area of all reference 2D objects in SA1), this did not translate to a low recall of the ‘Shrub element’ class or a low overall accuracy. A Logistic Regression model using all features or the features selected using the SP
70 feature selection model resulted in a slightly lower overall accuracy (0.83 compared to 0.90). In SA2, the Logistic Regression model performed best when all features were used with an overall accuracy of 0.60. The overall accuracy decreased to 0.32 when features were selected by using the SP
70 or RFE feature selection models. In SA2, a low precision for the ‘Shrub element’ class and a low recall for the ‘Tree element’ class could be observed when all features were used in the Logistic Regression model (0.38 and 0.47 respectively), meaning a high rate of false positive shrub elements and a high rate of false negative tree elements were modelled. The majority of the reference objects used for training of the tree element class in SA2 (50 out of 64 tree element objects) were young trees that were planted shortly before acquiring the LiDAR data as part of an erosion mitigation project in SA1. As these trees have different characteristics compared to the tree elements located in SA2, this likely influences the classification accuracy of the tree objects in SA2. This finding confirms the importance of a reference dataset that is representative of all objects that are being identified and classified [
61]. In SA3, no difference in the performance and accuracy of the Logistic Regression model was found when all or only a selection of the features were used. The overall accuracy of the Logistic Regression models was calculated to be 0.95. The precision for the ‘Tree element’ class was low (0.28) meaning a high rate of false positive tree elements was modelled in SA3. Because of the large portion of shrub elements in SA3 (92.27% of the total surface area of all reference 2D objects), this did not translate to a low recall of the ‘Shrub element’ class or a low overall accuracy, as was the case for SA1.
For each Logistic Regression model, the regression coefficients of the predictor variables were calculated (
Table 6). The absolute value of a regression coefficient gives an indication of the relative importance of the predictor variable this regression coefficient is linked to. A higher value for predictor variables linked with a negative Logistic Regression coefficient indicates a lower probability of the object being a shrub element, and therefore a higher probability of the object being a tree element. A higher value for a predictor variable linked with a positive Logistic Regression coefficient indicates a higher probability of the object being a shrub element, and therefore a lower probability of the object being a tree element. In general, the sign of the regression coefficients linked with the predictor variables is identical in the three study areas, especially for predictor variables with a high absolute value of the linked regression coefficient. The absolute values of the regression coefficients and therefore the importance of the predictor variables differ between the three study areas, however. In SA1, the predictor variables ‘Z
Pmax’, ‘h
mean’ and ‘Z
7/Z
1′ have the highest absolute values. In SA2, the predictor variables ‘Z
Pmin’, ‘h
min’, and ‘Z
10/Z
2‘ have the highest absolute values. In SA3, the predictor variables ‘Z
Pmax’, ‘h
min’, and ‘h
mean’ have the highest absolute values. The differences in importance of the predictor variables between the three study areas can be explained by the differences in the characteristics of the vLE objects in the study areas. In SA1 for example, the vLE objects are less established and are therefore smaller and less dense. The predictor variables ‘h
min’, ‘h
mean’, and ‘Z
Pmin’ are linked with a negative regression coefficient for all Logistic Regression models in the three study areas. The negative correlation coefficients for the predictor variables ‘h
min’ and ‘h
mean’ are in line with the expected correlation in
Table 4 as shrub elements are expected to have lower minimum and mean normalized height values. The negative correlation coefficient for the predictor variable ‘Z
Pmin’ is however not in line with the expected correlation in
Table 4. It was hypothesized that the relative height at the lowest point density would be lower for tree elements as they are characterized by a leafless trunk and have therefore a low point density near the ground surface. The negative correlation between the predictor variable ‘Z
Pmin’ and the probability of the object being a shrub element can be explained by the high biomass density of shrub elements. LiDAR pulses are more attenuated at penetrating vLE objects with a high biomass density and therefore the laser beam might not reach the lower portion of the object. This has previously been described in the context of potential errors in LiDAR-derived DEMs [
62,
63]. This explanation is confirmed by the LiDAR ground point density underneath the reference objects calculated as the average ground point density, which is 13.07 points/m
2 for shrub elements and 20.78 points/m
2 for tree elements suggesting the laser pulse is less likely to penetrate the object down to the ground surface for shrub elements.
The predictor variables ‘Z
Pmax’, ‘area’, and ‘P
Dens’ are linked with a positive regression coefficient for all Logistic Regression models in the three study areas. The positive correlation coefficients for the predictor variables ‘area’ and ‘P
Dens’ are in line with the expected correlation in
Table 4 as shrub elements are expected to have a larger area and 3D point density. The positive correlation coefficient for the predictor variable ‘Z
Pmax’ is however not in line with the expected correlation in
Table 4. It was hypothesized that the relative height at the highest point density would be lower for shrub elements since tree elements are characterized by a leafless trunk and have therefore a low point density near the ground surface. The positive correlation coefficient for ‘Z
Pmax’ could again be explained by the dense biomass of shrub elements that is difficult to penetrate with the laser pulse [
62,
63]. The sign of the regression coefficient for the predictor variables ‘h
max’ and ‘n
mean’ was inconsistent for the three study areas. A negative correlation coefficient for ‘h
max’ was calculated in SA1 and SA3, which is in line with the expected correlation in
Table 4, while a positive correlation coefficient was calculated in SA2. A positive correlation coefficient for ‘n
mean’ was calculated in SA1 and SA2, which is in line with the expected correlation in
Table 4, while a negative correlation coefficient was calculated in SA3.
For the delineation and classification of vLE objects in areas with variations in vLE characteristics, the training dataset should be large enough to be representative of the whole area. Recalibration of the logistic regression model is needed when the model is applied to a new area. This study aids in proving the relevance of making a distinction between different types of vLE objects with unique structural characteristics.