1. Introduction
High-resolution bathymetry is valuable for various underwater applications. Traditionally multi-beam echo sounder (MBES) is used to collect bathymetric data due to its ability to return three-dimensional measurements thanks to its two perpendicular arrays. Sidescan sonar (SSS) with its single linear transducer array, on the other hand, can not be directly used to reconstruct the seabed’s geometry. However, its returns can be well approximated by a Lambertian model which makes it possible to use shape-from-shading (SFS) techniques [
1] to reconstruct bathymetry from sidescan images. Reconstructing bathymetry from a single SSS line is a well-known ill-posed problem but combining multiple SSS lines and leveraging recent advances in Deep Learning make the problem possible under certain assumptions. However, one challenge is that the returned intensity is not only a function of the surface normal but also affected by the seabed reflectivity. One approach is to classify the seabed sediments from the collected sidescan data (via objective or subjective analysis [
2]), while another is to estimate the seabed reflectivity directly without determining the sediment types explicitly [
3,
4].
Although the new technological advances on phase differencing sonars, also known as interferometric sonars (e.g., Edgetech 6205 from EdgeTech, United States and Klein HydroChart 3500 from KLEIN, MIND Technology, United States), could acquire bathymetric measurements, which could be of great help to many applications [
5], they also have some restrictions. For example, the Edgetech 6205 is mainly suitable for shallow water, less deep than 35 m [
6]. Another restriction is that the measurements of interferometric SSS are noisier containing many outliers that must be filtered out [
7]. However, in this work we focus on using non-interferometric SSS for the following reasons: they are ubiquitous in the AUV community and provide a more affordable solution requiring less power.
Many previous works have proposed different methods to reconstruct bathymetry from sidescan [
3,
4,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17]. They can be categorized into different groups according to different criteria, see
Table 1. In terms of whether the external bathymetric data are required, refs. [
8,
9,
10,
11,
12,
13,
14] need external bathymetric data, either from sparse direct bathymetric measurements [
8,
10,
11,
12,
13,
14] or coarse multi-beam data [
9]. On the other hand, refs. [
3,
4,
15,
16,
17] do not require external bathymetric data where [
15] assume the altitude of the towfish is known and reconstruct the relative shapes of the seafloor with a linear method using simulated sidescan. Refs. [
3,
4] use a flat seafloor assumption as the initialization and [
16,
17] use the first bottom return in the sidescan waterfall images to obtain the altitude and by combining data from the pressure sensor to get sparse bathymetric information. Note that both [
15,
17] use actual seabed data and simulated sidescan from the actual seabed to assess their proposed methods. In terms of how to model the scattering process, most of the previous works use the Lambertian model [
8,
9,
10,
12,
14,
15,
16,
17] while [
11,
13] use data-driven approaches, i.e., deep neural networks for the scattering modeling.
This work also uses the Lambertian model for the scattering modeling, whose advantage over using a data-driven model is that it does not require the “ground truth” bathymetry to create the training data as was needed in [
11,
13]. In addition, the accurate registration between the ground truth bathymetry formed from MBES data and sidescan is crucial for a data-driven model, where less accurate registration would result in less accurate estimates [
11,
13]. That registration can never be perfect in practice as it is impossible to remove small timing issues or to know the true sound velocity profile experienced by each sonar beam.This work is an extension of our previous work [
12]. The extension is that we remove the requirement of external bathymetric data by modelling the vertical sidescan sonar beam pattern near the nadir. Naively, the idea is that knowing the beam pattern and seeing the range of the first return closest to the nadir allows a bathymetric estimate that can replace the altimeter values used in our previous work. Therefore, our proposed method can be used in the situation where only sidescan data are available. The altitude of the sonar is implicitly modeled from the nadir of the waterfall images and similar to [
17] that will be optimized directly by minimizing the loss on sidescan intensities. The mathematical model in this work is similar to [
3,
4] but we use a neural network as the bathymetric representation instead of a grid as most of the previous work or triangle meshes [
17]. Unlike [
15,
17], the method proposed in this work was evaluated on real sidescan data instead of simulated data. In addition, unlike [
16] where the reconstructed bathymetry is assessed by comparing to linear interpolation of single-beam echo sounder (SBES) data, we in this work evaluate our methods by comparing the reconstructed map to high-precision MBES bathymetry.
One of the benefits of using implicit neural representations (INR) is that the memory requirement is independent of the spatial resolution due to the neural network representation being continuous. The past few years have seen a rising interest in implicit neural representations or coordinate-based representations to represent a scene rather than explicit representations such as voxels; point clouds and triangle meshes. Different implicit representations have been proposed, such as Occupancy Networks [
18], Deep Signed Distance Function (DeepSDF) [
19], Scene Representation Networks (SRNs) [
20], Neural Radiance Fields (NeRF) [
21], and Sinusoidal Representation Networks (SIRENs) [
22]. Among these, Occupancy Networks learn a binary classifier and use its decision boundary to represent the scene. NeRF is more suitable for applications such as novel views synthesizing. SIRENs outperform ReLU-based Multi-layer Perceptrons (MLPs) such as DeepSDF and SRNs in terms of capturing details when modeling large scenes [
22], since the sine activation functions make SIRENs be able to create high quality gradients of the represented signals.
The overview of the proposed method (we plan to release the code in the future) is shown in
Figure 1. The estimated bathymetry is parameterized by a neural network that takes the Euclidean easting and northing coordinates as input and predicts the corresponding seafloor height estimates. Given a sidescan ping, the sonar’s position and orientation are used to calculate the position of echoes on the seafloor. The surface normal at that point can be calculated using the gradient of the seafloor, which then can be used in a Lambertian scattering model to approximate the returned intensity. By comparing the measured intensity and the approximated intensity, the gradient of the intensity loss can be backpropagated to update the neural bathymetry. Note that the loss is calculated on all the intensities, including the nadir area, which indicates the altitude of the sonar, allowing the constraints from the sidescan itself to estimate the relative seafloor height. Combing the readings from the pressure sensor, we can estimate the absolute seafloor height from sidescan without external bathymetric data.
The main contribution here is that the proposed method could produce high-quality bathymetry with sidescan data only, thus having the potential of using smaller AUV platforms with relatively simple equipment to carry out high-quality bathymetric surveys. We also validate that our neural rendering method is capable of generating these maps given good navigation and a sidescan sonar alone.
4. Results
We assess our proposed method by comparing the estimated bathymetry to the high-precision bathymetry from MBES with a resolution of 0.5 m. We use several metrics to evaluate the quality of reconstructed bathymetry, mean absolute error (MAE), maximum, minimum and standard deviation of the errors of the bathymetric map and the cosine similarity [
25] (bounded in
, 1 being identical) of the gradients of the bathymetric map, similar to our previous work [
12].
We begin with the qualitative results of the estimated bathymetry.
Figure 7 shows the comparison between the ground truth bathymetry and the one estimated from sidescan. In addition, the gradients of the bathymetric maps are displayed in
Figure 8, which shows that the proposed method manages to reproduce not only the large-scale features such as the hill and ridges but also some of the smaller features such as rocks. However, there are many small rocks that are not reconstructed.
To assess the estimated bathymetry further we calculate the errors on the bathymetry for the proposed method and compare them with previous works [
12,
16], see
Table 4. The estimated bathymetry of the proposed method has a bias of around 10 cm and an average error of 20 cm with standard deviation of 23 cm. However, the maximum and minimum errors are a lot larger than the mean absolute error of 20 cm. The probability distribution function (PDF) curves of the error is shown in
Figure 9. We also compare the performance using our previous method [
12] and method used in [
16] in
Table 4. Note that our previous method [
12] used the MBES to simulate the altimeter to provide external bathymetric measurement so that the mean absolute error and standard deviation are at centimeter level. In addition, the results from [
16] are compared with the Digital Elevation Model (DEM) constructed from SBES and the surveyed area has less variation on the terrain (from 10–13 m depth), while [
12] and this work are assessing the proposed methods with data collected from high-resolution MBES from an area at 9–25 m depth.
The albedo and beamform are also estimated with the bathymetric map at the same time. The estimated albedo (see
Figure 10) seems to have higher coefficients in the hill and rocky areas, which is expected. However, the upper left corner area also has higher albedo coefficients which should be related to the artifacts in the same area in
Figure 8.
The estimated beamform is displayed in
Figure 11 and
Figure 12, where we can see the estimated beam profile matches the analytical model [
26] using a linear-phased array with a tilt angle
. The theoretical beam pattern is calculated as:
where
for a (one-way) 3 dB beam width of
and
is half of the beam width.
When estimating the beamform, we set beamform corresponding to
under
to a constant value. Doing such considerably helps the convergence of the optimization. We found out that using different values for the constant
C did not affect the quality of the gradient maps, as shown in
Figure 13a. The qualitative results for different
C are displayed in
Appendix A, from which one could barely tell the difference from each other. However, as one can see from
Table 5, the absolute height map could deviate from the actual bathymetry, also shown in
Figure 13b. This shows the proposed method could estimate the relative height pretty well from sidescan data itself, but one might argue that giving only a few points of absolute height of the seafloor (or even one) could “pin down” the seafloor and reduce the absolute error.
Given estimated bathymetry
, estimated albedo
R and estimated beam profile
, we can calculate the simulated sidescan intensities given sonar’s attitude. An example (more in
Appendix B) is displayed in
Figure 14 where on the left is the measured waterfall image downsampled to 64 bins per side and on the right is the simulated SSS. The simulated SSS can be georeferenced using the estimated bathymetry, as shown in
Figure 15. We can also find that using a Lambertian model gives an image with much less noise. This suggests that some speckle noise filtering on the measured SSS with a higher resolution (or even without downsampling) might further improve the estimated bathymetry, especially for the small features such as rocks.