*3.1. Classes of Land and Sea Covers Investigated*

A total of 21 classes of land and sea covers has been designed based on on-site observations and photo-interpretation. These classes are illustrated in Table 1.


**Table 1.** Classes of land and sea covers identified in the study area.

**Table 1.** *Cont.*

#### *3.2. Data Pre-Processing*

The classification algorithm developed for this research aimed at processing bispectral lidar data. Two PCs were therefore used, but full-waveforms were available only for the green wavelength. IR data were thus incorporated as reflected intensities only. Due to the sensor's configuration, IR and green lasers of the topobathymetric lidar used are not co-focal. They also do not have the same swath, density, and precision. The two resulting PCs are consequently different, and the points acquired do not have identical locations. A preprocessing step was therefore required to obtain the IR backscattered intensity at each green waveform's location. Intensities of the IR PC were matched with each point of the green waveforms PC, which was kept as the reference PC, using the median IR intensity of the 10 nearest neighbors of each green waveform's location in the IR PC, which was computed and assigned to the waveform as an attribute. To this end, we used the PC processing software "CloudCompare" [43], in which the neighborhood search is performed by computing the 3D Euclidean distance between each point of the reference PC and the points of the compared PC. The coordinates of the waveforms' cover component were used to locate each waveform and obtain a green waveform PC. Consequently, each waveform was synthesized as a point, and we obtained a green waveforms PC. The IR PC was cleaned manually beforehand, to ensure all noise points, significantly above the surface, were removed from the data.

The median intensity of the 10 closest neighbors in the IR PC was chosen for two reasons. First, the number of 10 neighbors was relevant considering the difference of the two lasers' spot sizes and the resulting density of the PCs. Second, the use of the median intensity was more suited to the task than the mean intensity to avoid outliers' artefacts in spheres located at the interface of two habitats.

#### *3.3. Training and Testing Datasets*

Two main datasets were designed to perform the data classifications and assess their quality, called the training dataset and test dataset. We collected 1000 training and 500 testing samples of each class. To do so, we first drew polygons over representative areas of the 21 land and water covers, based on the ground-truth data, and made sure none of the polygons intersected. We used these to segment the PC and extract points located within representative areas of each class. For each class, we then randomly selected 1000 of them

for the training datasets and 500 others for testing, which resulted in training and test datasets of 21,000 and 10,500 items, respectively. This draw without replacement ensured that no training and testing points were the same and that they were all independent. We chose to use the same areas for random drawing of training and testing points in order to account for the widest diversity possible in the training and testing samples. The resulting training and test points have a mean distance of 1.6 m; their repartition is presented in Figure 4.

**Figure 4.** Repartition of the training and test data over the study area (datum: WGS 84; projection: UTM 30N). (**a**) Training data distribution; (**b**) test data distribution. S. = submerged, Ev. = evergreen, Dec. = deciduous. The size of the points in the illustration may give a false impression of overlapping, but all points have distinct locations.

Each array of coordinates, IR intensities, and waveform features was associated to one integer label between 1 and 21, forming a dataset with the structure illustrated in Figure 5.


**Figure 5.** Structure of the dataset obtained after waveform and infrared intensities processing.

The point-based classification method is described in the following paragraphs. It relied on the interpolated IR intensities and on spectral features extracted from the green waveforms.

#### *3.4. Waveform Processing Method*

The waveform processing steps are presented in Figure 6 and detailed in the following paragraphs. All these steps were performed using tailored scripts developed with Python 3.8.8.

**Figure 6.** Flowchart of the overall methodology.

The base concept consists in extracting tailored features from the relevant part of the waveforms. Here, we considered this part to be any return detected after the water surface component in submerged domains, and the part of the waveform encompassing the backscattered signal in terrestrial zones. In these zones, the sole processing was to isolate the useful part of the green waveform by identifying where the backscatter begins and the noise ends. This was made by evaluating the mean level of noise observable at the beginning and at the end of the waveform and extracting the part of the wave where the intensity was above this noise level.

To distinguish marine and terrestrial waveforms, we relied on a flag attributed by the Leica survey software Leica LIDAR Survey Studio (LSS) to points in the PC that correspond to an interpolated water surface under which refraction correction is made. Since waveform files integrate this information, it was possible to use it to differentiate submerged and emerged areas. This step is a pre-classification made by LSS before the data was delivered to us.

In submerged areas, further processing was required to detect the different peaks and isolate the parts of the signal that correspond to the water surface and the water column components. All green waveforms were first filtered using the Savitzky–Golay piecewise polynomial functions estimation method to remove the noise. As explained in [29], a threshold was then applied to the first derivative of the smoothed waveforms to bring out the peaks. This step was well suited to the detection of the most significant peaks; however, depending on local conditions affecting the reflection of light, some bottom returns may be less intense and hard to expose. We therefore included a second thresholding: when only one peak was identified with the first threshold, another derivative thresholding step was

introduced to try to detect peaks after the water surface (i.e., the peak already detected). This second threshold had a lower value, which would exacerbate noise if it were used on the whole waveform, but it was adapted to the detection of more attenuated returns when used on the underwater part of the waveform. If no additional return was identified with this first derivative thresholding, we concluded that no seabed was retrieved and discarded the waveform since there was no return to compute features on.

The same correction of the signal's attenuation in water as the one in [29] was applied to bathymetric waveforms; it relied on the fitting of an exponential function on the water column component to compensate for the effects of water absorption and diffusion in water on the bottom return. This was based on the estimation of the diffuse attenuation coefficient [21,22] through the evaluation of the intensity gradient at the end of the surface return. However, since there were mathematical limitations to this approach in very shallow water areas, no correction was applied in depths under 0.125 m since the optimization was impossible on a water column component containing less than two waveform samples (one sample corresponds to 0.063 m in that scenario). In places where depths were smaller than 0.125 m and over land, the attenuation was fixed at 0.

The waveform processing is summarized and illustrated in Figure 7.

**Figure 7.** Waveform processing method flowchart and illustration on two different waveforms: one acquired over the sea, the other over land.

#### *3.5. Waveform Features' Extraction*

Once all waveform components corresponding to ground or seabed covers were isolated, these intensities time series were converted into pseudo-reflectance series by dividing them with the emitted laser pulse intensity. This allowed us to remove potential bias induced by slightly varying emitted laser intensity. Statistical parameters were then computed on these truncated and normalized waveforms. They were selected based on [27,29,44] and are described in Table 2.

**Table 2.** Name and definition of the features extracted from the green waveforms during processing and used as input variables to the random forest model.


The terrain's elevation value was also extracted: for topographic waveforms, it corresponds to the last return's altitude (computed using traditional time of flight range measurement, extracted from the PC). For bathymetric waveforms, it was computed using the depth of the last return identified by our algorithm and withdrew to the altitude of our detected surface return, positioned with the PC. The vertical reference was the IGN 1969 system.

The spectral features computed on the truncated green waveforms, the IR intensities associated with the points and the elevations were then used as predictors for random forest classifications of ground covers over the study area. The 21,000 items of the dedicated dataset were used for the algorithm's training, and the 10,500 items of the testing dataset were used to assess the quality of the predictions.

#### *3.6. Random Forest Classification*

Contrary to [29], the data were not rasterized but features were directly classified to produce a 3D habitat map so as to avoid information loss. We also relied on a different classifier to predict data labels. A random forest model with 150 trees was used for the classification step. Considering that we wished to apply it to a dataset containing 24.5 million items after fitting, we chose a high number of trees to populate the forest, knowing that more trees theoretically equal to better classification accuracy and that the number of trees needs to be adapted to the complexity of the dataset. We also based our choice on the observation made in [45] on several different datasets that state that past 128 trees in the forest, classification accuracy gains become negligible for each additional tree, compared to computational demands. The maximum tree depth was not fixed so that nodes expanded until the leaves were pure. We relied on impurity to determine whether a leaf has to be split or not using the Gini impurity criterion, which was calculated using the following formula:

$$\text{GiniIndex} = 1 - \sum\_{j} p\_{j}^{2},\tag{1}$$

where *pj* is the probability of class *j*. This criterion is close to 0 when the split is optimal.

We controlled the generalizability and over-fitting of the model by monitoring the generalization score obtained on random out-of-bag samples at each fitting step. The random forest implementation of the Python library *scikit-learn* was used.

#### *3.7. Comparative Study*

Random forest classifications were launched on several sets of predictors that were clustered based on their conceptual similarity. The performance metrics of each group of features were then retrieved. This allowed us to evaluate the contribution of each family of feature to the classification accuracy. The feature sets were the following:


We also performed importance variable contribution analysis by dropping green waveform features one by one and computing the classification accuracy difference between the reduced set of 15 predictors and the full 16 attributes.

#### *3.8. Results' Quality Assessment*

Classification performances were assessed by considering the values obtained on the test dataset by the random forest classifier for the following metrics: overall accuracy (OA, ratio of correct predictions, best when its value is 1), precision (fraction of correct predictions among each ground truth classes, best when its value is 1), recall (fraction of correct estimation for each predicted classes, best when its value is 1), and F-score (combination of precision and recall, best when its value is 1). The precision, recall, and F-score were computed for each label, and their unweighted mean was used to compare the results obtained. Confusion matrixes presenting the producer's accuracies (PA) were also created to analyze the performances of classification on each of the 21 classes.

#### *3.9. Spatialization of the Random Forest Predictions*

Although coordinates were not used during classification, the arrays of features were geolocated with the position of the waveform's last echo, as illustrated by Figure 5. To visualize our classification results as PCs, we associated the predictions' vector to these coordinates. This allowed us to obtain the result under the form of a classified PC. The fact that we did not rasterize our data had the advantage of preserving the spatial density and therefore the spatial resolution, while also avoiding the critical issue of mixed pixels [46].

Each waveform was localized with the planar coordinates of its last return using the PC coordinates. For bathymetric waveforms, this ensured that the effects of refraction of the laser beam on the air/water interface were considered, since the green PC was corrected before data delivery.

#### **4. Results**

#### *4.1. Random Forest Classifications' Performances on the Test Dataset*

Ten random forest models were trained on the training dataset—one for each configuration of predictors defined in Section 3.7. Their performances were evaluated using four metrics computed on the predictions made on the test dataset. Four of them were then used to predict labels for the complete study area and produce habitat maps under the form of PCs.

Fitting the models to the 21,000 items training dataset took on average 0.4 s. Predictions on the test dataset (which contains 10,500 items) required a mean computing time of 0.2 s. The complete area, representing a total of more than 24.5 million items, was covered in 17 to 18 min. All computations were made on a machine equipped with an AMD Ryzen 9 5900X 12-Core CPU and an NVIDIA GeForce RTX 3080 GPU.

The classification metrics obtained on each set of features defined in Section 3.7. are presented in Table 3. The very close values observed between the four criteria for each feature set are due to the averaging of each metric's value obtained for each classification label. They are also explained by the fact that all classes are balanced. The scores obtained show that our method does not have a systematic tendency to over- or under-estimate some classes.

When using only subsets of green waveform features, the random forest predictions were more often false than they were correct. However, grouping the waveform variable families improved the accuracy by at least 4%. The best configuration—composed of peak shape features and lidar return features—provided a classification accuracy of 69%. Globally, coupling statistical features with peak shapes features tended to result in accuracy loss while using combinations where they were not mixed resulted in classification performance gains. Indeed, the complete set of features obtained through green waveform processing predicted the habitat type with an OA of 56%, while the output of the classification of statistical and lidar return features was correct in 67% of cases (see Table 3).

The addition of IR intensity values to the full set of green waveform parameters improved the OA of 13%, which is 5% above the best OA obtained on a subset of green waveform attributes. The contribution of the elevation to the classification accuracy was even higher: there was a 31.5% gain in OA between the green spectral features' classification and the classification of green spectral features and elevations. Gathering the three sources of information produced the most accurate result, with an OA of 90.5% (Table 3).


**Table 3.** Performance metrics obtained by each random forest experiment.

#### *4.2. Green Waveform Features' Contribution to the Classification Accuracy*

Figure 8 illustrates the contribution of each predictor to the accuracy of the classification of green waveform features. It sheds light on the value-added of each attribute computed on the truncated waveforms. This assessment reveals that 9 of the 16 features extracted on each waveform contributed positively to the classification performance. The seven others had a negative impact on the quality of the random forest predictions. They were mostly parameters of the "statistical features" set, although each type of waveform parameter—statistical, peak shape-related, or lidar return-related—was represented in the nine positively contributing attributes.

**Figure 8.** Predictors' contribution to the green waveform features' classification accuracy (in fraction of accuracy).

An in-depth analysis of the four last random forest experiments is provided in the following sections.

#### *4.3. Land-Water Continuum Habitat 3D Modelling*

By running the waveform processing algorithm and the classifiers on the whole study area, we obtained a 21-class semantic segmentation of our complete bathymetric lidar dataset.

As expected, when dealing with PCs and not rasters, the results were noisy, but some areas had lower speckle than others. The observable ratio between information and noise gradually improved with the addition of IR intensity and elevations. The best output was obtained by combining green waveform features, IR intensities, and elevations; it is presented in Figure 9. The other results are presented in Appendix A (Figures A1–A3).

**Figure 9.** Projected 3D map of the habitats obtained with the predictions of a random forest classifier on green spectral features, infrared intensities, and elevation values; orthoimage of the study area. The orthoimage was captured in 2014, while lidar data are not contemporaneous as they date from 2019. S. = submerged, Ev. = evergreen, Dec. = deciduous.

The main strength of this result was the distinction between submerged and emerged domains: except for boat false-positives on the water surface, marine classes were rarely detected over land, and vice versa. Globally, this map showed better definition of the considered habitats than the others (see Figures A1–A3), and fewer classification errors, although improvements on some classes coexisted with poorer performance on other classes. Figures 10–13 gave a more detailed insight into the classification in specific areas.

**Figure 10.** Urban area and sand beach classification: Extract of the projected 3D map of the habitats obtained with the predictions of a random forest classifier on green spectral features, infrared intensities, and elevation values, and extract of an orthoimage of the same area. The orthoimage was captured in 2014, while lidar data are not contemporaneous as they date from 2019. S. = submerged, Ev. = evergreen, Dec. = deciduous.

**Figure 11.** Salt marsh classification: Extract of the projected 3D map of the habitats obtained with the predictions of a random forest classifier on green spectral features, infrared intensities, and elevation values, and extract of an orthoimage of the same area. The orthoimage was captured in 2014, while lidar data are not contemporaneous as they date from 2019. S. = submerged, Ev. = evergreen, Dec. = deciduous.

**Figure 12.** Seagrass meadow classification: Extract of the projected 3D map of the habitats obtained with the predictions of a random forest classifier on green spectral features, infrared intensities, and elevation values, and extract of an orthoimage of the same area. The orthoimage was captured in 2014, while lidar data are not contemporaneous as they date from 2019. S. = submerged, Ev. = evergreen, Dec. = deciduous.


**Figure 13.** Confusion matrix obtained by the random forest classification of the green waveforms' features, infrared intensities, and elevations. The three highest and three lowest class accuracy values are highlighted in green and orange, respectively. S. = submerged, Ev. = evergreen, Dec. = deciduous.

#### 4.3.1. Urban Area and Sand Beach Classification

As Figure 10 shows, the detection of planar, highly reflective surfaces such as tar and concrete was accurate, but their specific type was sometimes misidentified. Although the pier located north of Figure 10 was correctly mapped as concrete, there was confusion between soil and tar on the sandy dune (southeast of the same figure).

The classification of roof was satisfactory both west and east of the area presented in Figure 10 but showed cases of overdetection. In the urban area developed on the sandy dune (southeast of Figure 10), tar, rooves, trees, and shrubs were distinguished precisely, but sandy dune vegetation, shrubs, and trees were sometimes confused for rooves or tar.

A weaker aspect of this result was the classification of the sand beaches: patches of wet sand were well defined, but the borders between wet and dry sand featured many false detections of pebble. Confusion between pebble and sand was high, as shown in the wide area of sand beach that is classified as pebble in the northeast in Figure 10.
