*2.2. Full-Waveform Airborne Topobathymetric Lidar*

Airborne lidar is an active remote sensing technique that uses the backscatter of laser light in the environment to compute ranges to the ground cover and produce 3D maps of the environment, knowing the absolute position of the sensor. Topobathymetric lidar relies on two different lasers with distinct wavelengths: a green laser is added to the usual IR laser to detect the seabed or riverbed [21,22,39]. It exploits the physical properties of the green spectrum that penetrates the water surface, whereas the IR light does not. Lidar waveform is the recording of the full backscattered signal on the surveyed environment. A waveform consists in samples of recorded backscattered intensities over time. Since laser light is reflected by objects standing on its path, each element illuminated by the lidar laser backscatters a fraction of the emitted beam, which results in a peak in the waveform. Peaks are theoretically more or less intense depending on the object's albedo, its geometry and the laser incidence angle [40]. Due to the different layers of coverage in the environment (tree canopy, tree branches, bushes, soils, for example), there can be several peaks in the same waveform, all corresponding to a different layer in the reality. A typical topographic waveform (i.e., resulting from a laser beam hitting the land) has as many peaks as there are objects of different elevation in its way [28,39]. A bathymetric waveform usually has three main components: the first is the water surface return, the second is the water column return (originating from the backscatter of photons on particles suspended in the water such as sediments or nutrients), and the third is the return originating from the reflection on the seabed or riverbed [21,22,39]. Since all objects present in the cone illuminated by the laser reflect some light towards the sensor, they all contribute to the recorded waveform's shape [28]. Each element of the landscape/seascape is therefore characterized by the shape of its component in the waveform, which can be used for land or sea cover detection and classification [27,29,34,41]. In this work, backscattered intensities are converted into pseudo-reflectances by dividing them with the emitted pulse's intensity. Examples of a typical bathymetric waveform and a typical topographic waveform are presented in Figure 2.

**Figure 1.** Study area (datum: WGS 84; projection: UTM 30N).

**Figure 2.** Examples of typical lidar waveforms (for a green laser with a wavelength of 515 nm). (**a**) Bathymetric waveform acquired in a coastal area; (**b**) topographic waveform acquired in a vegetated area. A sample corresponds to 556 picoseconds.

#### *2.3. Datasets*

The lidar data [42] used for this research were acquired over the coast of Sables d'Or les Pins in September 2019 by the Shom as part of the Litto3D® project [38], using a Leica HawkEye 4X sensor. The HawkEye 4X produces laser pulses at wavelengths of 515 nm and 1064 nm on three different channels. Depths under 10 m have a dedicated shallow green laser, while a more powerful laser, the deep channel, is used to detect deeper seabed. These two channels provide PCs with a density of at least five points per m2 and one point per m<sup>2</sup> and they have a laser spot size diameter of 1.8 m and 3.4 m, respectively. The IR laser has a laser spot size of 0.2 m and a point density of at least 10 points per m2. For green channels only, backscattered intensities are recorded with a time frequency of 1.8 GHz, providing waveforms with a sample every 556 picoseconds. This information is not available for the IR laser.

The survey was conducted with a constant laser amplification. Due to the power needed to penetrate through several meters of water, the shallow laser' backscattered intensities tend to be saturated over highly reflective land surfaces, but they are still usable. The deep channel's returned intensities, however, are systematically saturated and do not provide usable information for land cover classification. In this study, only shallow full waveforms were used, considering the selected area's range of depths. The green waveforms used were available for every shallow green laser shot. Over the studied area, their average density was 3.75 waveforms per m2. Reanalyzed echo PCs for both shallow green and IR wavelengths were also used: the IR PC brought additional spectral information, while the green PC was used to accurately position the raw waveforms, since this PC underwent refraction correction before delivery. The effects of refraction were not corrected in the raw waveform files.

To provide knowledge on the environment on site, ground truth data (presented in Figure 3) were acquired in the form of photoquadrats and UAV imagery. They helped label the lidar data to perform habitat classification. UAV imagery was acquired over five smaller areas of interest, each representing typical coastal habitats, in March and April 2021 using an RGB DJI Phantom 4 Pro V2, and a Parrot Sequoia+ including a near IR nadiral sensor (770 nm to 810 nm) with a zenithal irradiance sensor. These flights were calibrated with a total of 55 ground control points. An array of 150 photoquadrats were captured with RGB cameras and georeferenced, to seize the ecological diversity of the study area. Over marine parts of the study site, a PowerVision unmanned surface vehicle (USV) was used to gather knowledge of the seabed covers. The underwater images were acquired in September 2021. An RGB orthoimage acquired in 2014 over the whole area was also used to give extra information on the habitats present on site four years prior to the data acquisition.

**Figure 3.** Ground truth data spatial coverage (datum: WGS 84; projection: UTM 30N).

#### **3. Methodology**

The algorithm developed in this study was first introduced in [29] to identify marine habitats using green full-waveform spectral features. In [29], only seagrasses and sediments (two classes) at few meters' depths were classified. We significantly improved this algorithm and adapted it to the classification of 21 habitats across the land-water interface, to test its abilities in supratidal and intertidal environments.

The enhanced version presented here was tailored to the identification of land and sea covers: the seabed or riverbed type was considered in the presence of water while we focused on the surface cover over terrestrial areas (i.e., if there were two layers of surface covers, such as a trees and grass beneath it, the land cover was labelled as tree). It used a supervised point-based classification algorithm trained on various sets of input features and evaluated on a test dataset. Classified PCs of the whole area were also produced to observe the ability of each predictor sets to produce a map of the habitats in the study area using this approach.
