1. Introduction
Nearshore water depth is a crucial geophysical parameter in the coastal environment, with significant importance for scientific research, navigation management, coastal zone protection, and coastal disaster mitigation [
1,
2,
3]. Traditional methods for acquiring water depth data have primarily involved single-beam and multibeam sonar soundings, as well as airborne LiDAR measurements. While these methods provide high accuracy in obtaining water depth information, they are limited by the prevailing climatic conditions at the measurement sites, often require substantial human and material resources, and are insufficient for capturing extensive water depth information. Moreover, regions that are inaccessible to ships or airborne platforms, such as dangerous waters with reefs or islands involved in political disputes, present challenges for using traditional methods to collect water depth data [
4,
5,
6,
7]. In recent years, there has been significant attention given to research on water depth inversion using satellite remote sensing. The use of satellite remote sensing for water depth inversion offers several advantages in terms of extensive coverage, cost-effectiveness, rapid data acquisition, and ease of remote sensing image acquisition. Additionally, this approach allows for observations of areas that are inaccessible to ships or airborne platforms, thus providing promising prospects for development and application [
8,
9]. Previous studies in this field have led to numerous achievements in estimating shallow water depth using satellite remote sensing images. These efforts can be broadly categorized into semi-analytical models, empirical models, and machine learning models [
10,
11,
12].
A semi-analytical model is developed based on the physical radiative transfer theory of visible light in water bodies. It establishes analytical expressions that connect spectral reflectance with bottom reflectance, water depth, and inherent optical properties (IOPs). The primary objective of this model is to minimize the discrepancy between observed and simulated spectral reflectance, thus enabling the inversion of water depth and IOPs. For example, Lee et al. utilized the Hyperspectral Optimization Process Exemplar (HOPE) algorithm and Hyperion hyperspectral data to estimate water depth and IOPs in the Florida Keys region. They observed a relative error of 11% in the inverted water depth compared to LiDAR measurements [
13]. Similarly, Wei et al. employed marine color satellite data and temporal variations in water column IOPs from two satellite measurements to perform water depth inversion. Their results demonstrated the accuracy of this method in estimating depth within the range of 0–30 m [
14]. Another approach was introduced by Xia et al., who developed the L-S model by integrating the log-ratio model with the semi-analytical model. They used remote sensing data from four bands to estimate the water depth around Ganquan Island in the South China Sea. Interestingly, the study found that the L-S model yielded comparable results to those of the log-ratio model, even without utilizing actual depth measurements [
15]. Zhang et al. improved the HOPE algorithm by utilizing Hyperion hyperspectral images to estimate water depth around Saipan Island and Zhongye Island. This enhancement led to significantly improved inversion results, addressing the issue of overestimating water depth in low-reflectance areas encountered by traditional HOPE algorithms [
16]. From the above research, the semi-analytical model demonstrated robust physical interpretability and eliminated the need for empirical depth data. However, it involved more than five unknown parameters related to bottom reflectance and IOPs. Hyperspectral images, with their high spectral resolution across the visible light wavelength range, were found to be more suitable than multispectral images for resolving these unknown parameters in the modeling context. Nonetheless, acquiring appropriate hyperspectral images of target regions remained a challenge, often limited by spatial resolution.
Empirical models require in situ water depth measurements to establish a statistical relationship between spectral reflectance and actual water depth. For example, Lyzenga employed empirical regression analysis to develop expressions that relate image reflectance to measured water depth, enabling remote sensing-based water depth inversion. The study demonstrated the model’s remarkable applicability in shallow water depth inversion [
17]. Paredes proposed a two-band log-linear model based on a constant ratio of reflectance between two spectral bands on different seafloor substrates. This model was later extended to multiple bands and has been widely recognized for its superiority over single-band counterparts [
18]. Stumpf further enhanced the log-transformed ratio model, which showed improved stability and higher inversion accuracy in turbid waters compared to linear models [
19]. These empirical models were found to be relatively simple to establish and required only a modest collection of water depth points to extrapolate information over large marine areas. However, it should be noted that the performance of these models is highly influenced by the complexity of the aquatic environment within the study area.
In recent years, several studies have focused on using shallow machine learning techniques to invert water depth in specific marine regions. For instance, Zheng et al. utilized WorldView-2 remote sensing images as their data source and developed two artificial neural network models, namely Back Propagation (BP) and Radial Basis Function (RBF), for water depth inversion at Meiji Reef in the South China Sea. The results indicated that the RBF neural network model was more suitable for water depth inversion [
20]. Sagawa et al. employed multi-temporal Landsat-8 remote sensing images and a random forest model to accurately invert water depth in highly transparent coastal waters [
21]. Similarly, Sun et al. used Gaofen-6 remote sensing images and applied a support vector regression algorithm along with two regression tree models to invert shallow water depth in the southern part of the Shandong Peninsula. The experimental results demonstrated that the two regression tree models exhibited superior inversion accuracy [
22]. From the above research, it is worth noting that the effectiveness of shallow water depth inversion has been significantly limited by the reliance on manual expertise or feature transformation techniques to extract remote sensing features.
Currently, most semi-analytical, empirical, and shallow machine learning models usually obtain water depth information based on the relationship between the spectral reflectance of individual pixels and the measured water depth. However, the limited amount of information provided by individual pixels in spectral reflectance images is constantly affected by sensor noise, imprecise atmospheric correction, and undulating water surfaces. Despite these limitations, there is still significant potential for improving the effectiveness of nearshore water depth inversion [
23,
24]. Some studies have utilized multi-temporal satellite data to create time series, increasing the dimensions of the data and enhancing the fitting of the nonlinear relationships between satellite data and water depth [
21,
25,
26]. However, the constraints imposed by revisit cycles and potential cloud cover can hinder the acquisition of temporally close and high-quality remote sensing data in the target region, thus limiting the feasibility of this approach for water depth inversion. The extraction of feature information from satellite remote sensing data relevant to water depth signals while considering the effects of complicated nonlinear factors remains a significant challenge in nearshore water depth inversion research.
Deep learning methods, particularly convolutional neural networks (CNNs), have been extensively utilized in remote sensing image processing [
27,
28]. CNNs have shown great proficiency in feature extraction and integration, making them highly effective in various domains such as image classification, object detection, and change detection [
29]. In addition to remote sensing image processing, CNNs have also been explored for remote sensing surface parameter inversion [
30,
31]. By utilizing stacked convolutional and pooling layers, CNNs automatically capture crucial spectral and spatial features from remote sensing data [
32]. Their ability to integrate multiple features through fully connected layers makes them well suited for nearshore water depth inversion research. The inclusion of spatial location information of pixels plays a significant role in enhancing the accuracy of empirical methods and shallow machine learning techniques for water depth inversion [
33,
34,
35,
36,
37]. However, there is a lack of conclusive evidence regarding the impact of spatial location information on the results of water depth inversion using CNNs. In many cases, the spatial location information of pixels is disregarded when applying CNNs to water depth inversion [
38,
39,
40]. Despite CNNs inherently possessing the capability to capture local spatial correlations among neighboring pixels, the inclusion of spatial location information could potentially enhance the performance of water depth inversion using CNNs.
To address the challenge of extracting feature information from remote sensing data for nearshore water depth inversion, this study proposes a novel approach called the convolutional neural network with spatial location integration (CNN-SLI). The CNN-SLI integrates spatial location information by adding two additional channels that represent the pixel locations to the input feature image. These concatenated image data are then fed into the CNN, allowing it to better capture and understand spatial variations. By doing so, the proposed method aims to mitigate the loss of spatial information caused by successive convolution and pooling operations, ultimately improving the accuracy of water depth inversion. The water depth inversion experiment was conducted in the waters near Nanshan Port. The experiment utilized a GF-6 remote sensing image and measured water depth data from electronic nautical charts. To evaluate the effectiveness of the CNN-SLI model, a comparative analysis was performed between the water depth inversion results obtained using the proposed model and those obtained using the Lyzenga, MLP, and regular CNN models.
The innovation of this article is to consider additional feature information, that is, the spatial location information of pixels when using a CNN to invert water depth. At present, there are few studies that prove that the spatial location information of pixels is helpful for water depth inversion of CNNs because CNNs already have the ability to capture the local spatial correlation of adjacent pixels. This study confirms that considering the spatial location information of pixels can effectively improve the water depth inversion performance of CNNs.
5. Conclusions
This paper aims to address the issue of insufficient extraction of feature information from remote sensing data in nearshore water depth inversion research. To overcome this, the paper proposes a novel method called the CNN-SLI model, which integrates spatial location information into the convolutional neural network. By integrating the normalized spatial location information of pixels as additional channels, the model effectively captures deep features of remote sensing data in the spatial dimension, enabling accurate estimation of nearshore water depth. An experiment was conducted using GF-6 remote sensing images and measured water depth points from ENCs in the waters near Nanshan Port. The inversion results obtained using the CNN-SLI model were compared with those of the Lyzenga, MLP, and CNN models. The main conclusions are as follows:
The overall accuracy evaluation demonstrates that the CNN-SLI model outperforms the other models, with a significantly improved overall inversion accuracy and regression fit. The CNN-SLI model achieves an RMSE of 1.34 m, MAE of 0.94 m, and coefficient of determination R2 of 0.97. The CNN-SLI model achieves the lowest RMSE and MAE at all water depth intervals, outperforming the traditional models (Lyzenga and MLP) and the CNN model. This confirms the superiority of the CNN-SLI model for water depth inversion in shallow and deep waters. Comparative analysis with Kriging demonstrates that the inverted water depth of the CNN-SLI model is closer to the actual water depth compared to the Lyzenga, MLP, and CNN models. Moreover, in this study area, when the number of training samples is no less than 250, the CNN-SLI model fully exploits its performance advantages and exhibits the best inversion accuracy among all the models. Accuracy evaluation on an independent dataset shows that the CNN-SLI model has better generalization ability than Lyzenga, MLP, and CNN models under different conditions and is more suitable for water depth inversion work in deep and shallow waters under different conditions.
This study highlights the effectiveness of considering the spatial location information of pixels in convolutional neural networks for improving water depth inversion performance.