A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning

Liu, Yu; Ma, Benjun; Qin, Zhiliang; Wang, Cheng; Guo, Chao; Yang, Siyu; Zhao, Jixiang; Cai, Yimeng; Li, Mingzhe

doi:10.3390/jmse12111943

Open AccessArticle

A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning

by

Yu Liu

^1,2,

Benjun Ma

^1,2,3,*

,

Zhiliang Qin

^1,2,3,

Cheng Wang

^1,4,

Chao Guo

^1,2,

Siyu Yang

^1,2,

Jixiang Zhao

^1,2,

Yimeng Cai

^1,2 and

Mingzhe Li

^1,2

¹

College of Underwater Acoustic Engineering, Harbin Engineering University, Harbin 150001, China

²

Qingdao Innovation and Development Center of Harbin Engineering University, Qingdao 266400, China

³

Sanya Nanhai Innovation and Development Base of Harbin Engineering University, Sanya 572000, China

⁴

Wuhan R&D Center, Raisecom Technology Co., Ltd., Wuhan 430000, China

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2024, 12(11), 1943; https://doi.org/10.3390/jmse12111943

Submission received: 18 September 2024 / Revised: 22 October 2024 / Accepted: 28 October 2024 / Published: 31 October 2024

(This article belongs to the Special Issue Machine Learning Methodologies and Ocean Science)

Download

Browse Figures

Versions Notes

Abstract

:

As sound speed is a fundamental parameter of ocean acoustic characteristics, its prediction is a central focus of underwater acoustics research. Traditional numerical and statistical forecasting methods often exhibit suboptimal performance under complex conditions, whereas deep learning approaches demonstrate promising results. However, these methodologies fall short in adequately addressing multi-spatial coupling effects and spatiotemporal weighting, particularly in scenarios characterized by limited data availability. To investigate the interactions across multiple spatial scales and to achieve accurate predictions, we propose the STA-ConvLSTM framework that integrates spatiotemporal attention mechanisms with convolutional long short-term memory neural networks (ConvLSTM). The core concept involves accounting for the coupling effects among various spatial scales while extracting temporal and spatial information from the data and assigning appropriate weights to different spatiotemporal entities. Furthermore, we introduce an interpolation method for ocean temperature and salinity data based on the KNN algorithm to enhance dataset resolution. Experimental results indicate that STA-ConvLSTM provides precise predictions of sound speed. Specifically, relative to the measured data, it achieved a root mean square error (RMSE) of approximately 0.57 m/s and a mean absolute error (MAE) of about 0.29 m/s. Additionally, when compared to single-dimensional spatial analysis, incorporating multi-spatial scale considerations yielded superior predictive performance.

Keywords:

sound speed prediction; deep learning; long short-term memory neural networks; spatiotemporal attention mechanisms

1. Introduction

Underwater acoustic technology finds extensive application across various domains, including marine environmental monitoring, resource exploration, intelligence gathering, and underwater communication [1,2]. As a fundamental parameter that characterizes marine acoustics, the speed of sound in oceanic environments plays a crucial role in determining the propagation characteristics of acoustic waves. It serves as a prerequisite for advancing research and applications in underwater acoustic technology. Accurately obtaining information on ocean sound speed is vital for enhancing both the precision and efficiency of underwater detection and communication systems, as well as for improving the operational performance of weapons and equipment.

Currently, the acquisition of ocean sound speed primarily relies on two methodologies: observation and prediction. In terms of observation, the sound speed profile of seawater can be directly measured using a sound speed profiler, while sound speed information at various locations can also be derived from underwater measurements of temperature, salinity, and depth data [3]. However, oceanic observations are costly and constrained by in situ measurement conditions. Given the vastness of the marine environment, the density of measurement points remains insufficient [4].

Forecasting is primarily categorized into two types: numerical forecasting and statistical forecasting. Ocean numerical model forecasting relies on kinetic theory as well as physical and chemical processes to simulate the evolutionary dynamics of the ocean [5]. However, owing to the inherent complexity of ocean processes, the accuracy of predictions is significantly influenced by the computational resources employed [6]. Statistical forecasting methods utilize statistical analysis of historical data to investigate the temporal and spatial distribution patterns of ocean sound speed, thereby enabling predictions regarding sound speed information. While this approach is considered highly reliable, it may prove less effective in scenarios characterized by complex correlations [7,8]. Phase space reconstruction (PSR), a technique employed for the analysis of nonlinear time series data, is extensively utilized in numerous fields. Nevertheless, the selection of appropriate parameters can be rather challenging [9].

In recent years, researchers have increasingly employed deep learning techniques to address the challenges of ocean sound speed prediction, aiming to mitigate the limitations inherent in statistical forecasting methods when confronted with nonlinear and complex phenomena. Relevant studies have demonstrated promising predictive outcomes, thereby affirming the applicability of deep learning approaches. However, these methodologies often fall short in adequately accounting for multi-spatial coupling effects and spatiotemporal weights, particularly in contexts characterized by limited data availability.

To investigate the multi-spatial scale interactions of sound speed and achieve accurate predictions with limited data, we propose the STA-ConvLSTM framework along with a multi-spatial scale sound speed prediction method that integrates the coupling of spatial structures. This approach entails processing initial data through ConvLSTM and subsequently passing the results to spatiotemporal attention modules to extract relevant feature information. The outputs from both ConvLSTM and spatiotemporal attention modules are then concatenated to facilitate the integration of original features with attention-weighted features. It is important to note that when predicting sound speed in multi-spatial scale structures, the design of the STA-ConvLSTM adapts according to variations in data dimensions.

The contributions of this paper can be summarized as follows:

To achieve enhanced prediction accuracy with few-shot data, we propose an interpolation method for ocean temperature and salinity data based on the KNN algorithm to improve dataset resolution.
To address the inadequacies in accounting for multi-spatial coupling effects and spatiotemporal weights in ocean sound speed prediction, we introduce the STA-ConvLSTM framework along with a multi-spatial scale sound speed prediction method that integrates spatial structure coupling.
To validate the efficacy of STA-ConvLSTM, we conducted experiments to assess the model’s accuracy in predicting ocean sound speed using the BOA_Argo dataset.

The remainder of the paper is structured as follows: In Section 2, we provide a brief review of related works on ocean sound speed predictions. Section 3 begins with an introduction to the data sources and context, followed by a presentation of the interpolation method for ocean temperature and salinity data based on the KNN algorithm, culminating in our proposed multi-scale sound speed prediction method utilizing STA-ConvLSTM. In Section 4, we present the experimental results pertaining to both temperature and salinity data interpolation as well as sound speed prediction. Section 5 offers an analytical discussion that thoroughly validates the feasibility and effectiveness of both KNN and STA-ConvLSTM methods. Finally, conclusions are drawn in Section 6.

2. Related Work

Over the past four decades, researchers have conducted extensive investigations into various methods for ocean sound speed inversion and prediction. Given the complex nature of the ocean environment, accurately predicting ocean sound speed presents a significant challenge [10]. Traditionally, approaches to the spatiotemporal prediction of marine environmental variables have relied on numerical simulations, which are often hindered by substantial computational demands, resulting in predictive inefficiency.

Matched field processing (MFP) and compressed sensing (CS) are established methods for sound speed profile (SSP) inversion [11]. Among these, MFP was the pioneering technology developed for SSP inversion. It fundamentally relies on the determination of the optimal SSP by matching the measured sound field with its simulated counterpart. As early as 1991, Tolstoy et al. [12] proposed an MFP framework that combined empirical orthogonal function (EOF) decomposition, providing an effective solution for SSP inversion. Subsequently, Taroudakis et al. [13] integrated MFP with modal phase inversion to enhance SSP inversion techniques. Yu et al. [14] developed an MFP method for SSP inversion utilizing a genetic algorithm. Due to the computational complexity associated with MFP, CS has been progressively developed to expedite the inversion process [15]. The fundamental principle of CS is to utilize a sparse representation of the ocean sound speed field in conjunction with minimal measurement data for reconstruction purposes. Gerstoft et al. [16] introduced a method for inverting sound speed profiles based on compressive sensing that achieved high-resolution estimations of small-scale sound speed variations. In contrast to MFP, the CS framework reduces computational complexity and thereby enhances SSP calculation efficiency. However, this increased efficiency may come at the expense of inversion accuracy.

PSR is a prevalently employed approach for constructing model input data and represents an instrumental technique in the research of chaotic dynamics [17,18]. PSR is intended to reconstruct the phase space structure of the system from the time series data of the system, enabling the representation of the system’s dynamics in the reconstructed phase space [19]. PSR is highly beneficial in ocean sound speed prediction with greater interpretability. For example, the integrate-and-differentiate approach [20] proves useful in analyzing the complex physical processes involved in sound propagation in the ocean. By integrating various physical factors such as temperature, salinity, and pressure, and subsequently differentiating to determine their effects on sound speed, this method can offer a more comprehensive understanding of the underlying mechanisms. Structural and parametric identification methods [21] can also be of great value. These methods enable the identification of the specific structural characteristics and parameters that exert influence on sound speed in the ocean. By analyzing the available data and employing these methods to determine the relevant structural elements and parameter values, we can develop more precise predictive models. However, the performance of PSR largely hinges on two key parameters, namely delay dimension and phase scale, which ought to be meticulously set. Although numerous methods concerning the selection of delay dimension and phase scale have been put forward, how to determine these appropriate parameters remains a challenging and unresolved issue [9].

Machine learning applications in the field of acoustics have advanced rapidly in recent years, offering new ideas and methods for ocean sound speed prediction. For instance, Yu et al. [22] proposed a sound speed inversion method using radial basis function neural networks. Zhang et al. [23] introduced a prediction model based on LSTM neural networks for sea surface temperature forecasting, achieving higher accuracy than traditional regression methods. Ali A et al. [24] compared the effects of deep learning and traditional statistical methods on the prediction of sea surface temperature, significant wave heights, and other marine parameters. The results show that the prediction performance of the deep learning model is much better than that of the statistical model. Li et al. [25] developed a marine sound speed model based on ConvLSTM, which can capture the temporal and spatial characteristics of historical data. Ou et al. [26,27] proposed an SSP inversion algorithm based on a comprehensive learning model using random forest and a method for reconstructing SSP using the extreme gradient boosting model. Piao et al. [28] proposed an orthogonal representation of SSPs considering background field variations. Based on the statistical characteristics of time series SSPs, high-precision SSP prediction is realized by LSTM. Wu et al. [29] introduced a data-fusion-driven multi-input multi-output convolutional regression neural network, integrating a satellite-based real-time remote sensing of sea surface temperatures, historical SSP feature vectors, and corresponding spatial coordinate information. The model gets rid of the dependence on sonar observation data and can be applied in a wider spatial region. Gao et al. [30] proposed a round-by-round training approach to avoid being trapped in poor local optima. The results indicate that the proposed Neural ODE sound speed forecasting model is more effective in long-term forecasting than traditional models and can accurately predict sound speed at any time.

The relevant studies have demonstrated promising prediction results, thereby validating the applicability of machine learning methods. However, these approaches fall short in adequately accounting for multi-spatial coupling effects and spatiotemporal weights, which adversely impact predictive effectiveness.

ConvLSTM is a specialized neural network architecture integrating LSTM’s ability to process time series data and convolutional operations for extracting spatial features [31]. When handling time series data, it retains long-term dependencies and responds dynamically to sequence changes [32]. This characteristic makes it effective in tasks like weather forecasting, traffic flow prediction, and demand analysis, which require consideration of the temporal dimension [33,34]. The spatiotemporal attention mechanism focuses on significant features in both dimensions, helping the model identify changes in research objects at different times and locations [35]. By assigning weights, the model can selectively emphasize critical input data to enhance performance and effectiveness [36]. Incorporating this mechanism into ConvLSTM enhances its focus on key information. It enables a more accurate identification of historical information relevant to the current task when processing long sequences of sound speed data, improves the ability to capture long-term dependencies, and reduces the impact of noise and irrelevant information to enhance the model’s accuracy and robustness.

In summary, Table 1 presents a comparison of sound speed predictions obtained through various methods.

3. Data and Methods

In this section, we outline the data sources, introduce the KNN regression model for temperature–salinity data interpolation, and present the modeling concepts and workflow of the STA-ConvLSTM framework for sound speed prediction.

3.1. Data

“Argo Global Ocean Observing Network” provides maximum temperature and salinity profile data across the global ocean, collected by buoys. We chose to utilize historical data from the Global Ocean Argo Grid Dataset (BOA_Argo). In this study, we focused on the region near the Nansha Islands in South China Sea (110.5° E~117.5° E, 9.5° N~11.5° N), as illustrated in Figure 1.

The depth range extended from 0 to 1500 m, and the time span encompassed 216 months, from January 2004 to December 2021. The data types included year, month, longitude, latitude, depth, temperature, and salinity, among others. The initial dataset had a horizontal resolution of 1.0° × 1.0°. In the vertical direction, the segmentation was as follows: for the range of 0–180 m, layers were divided every 10 m; for depths of 180–500 m, layers were divided every 20 m; for depths of 500–1300 m, layers were divided every 50 m; and for depths of 1300–1500 m, layers were divided every 100 m—resulting in a total of 53 layers.

3.2. KNN Regression Model

Ocean temperature and salinity data form the foundation for calculating sound speed in the ocean. Due to the influence of multiple factors on field observations, temperature and salinity data typically exhibit a high degree of spatial sparsity, rendering them susceptible to issues such as data loss and low resolution. It affects the prediction of ocean sound speed [37,38,39]. The interpolation of temperature and salinity data constitutes a regression problem that can address data gaps, enhance spatial resolution, and provide effective support for multi-scale analyses of ocean sound speed and other characteristics.

The KNN algorithm is frequently employed to address regression problems due to its simplicity and ease of implementation. It does not require training and offers strong interpretability [40]. Additionally, it demonstrates good adaptability to nonlinear issues and is suitable for multi-class classification tasks as well as datasets with missing values. Consequently, we selected it for interpolation, aligning with our research requirements.

The KNN algorithm is a supervised learning method grounded in examples. Its fundamental principle involves making predictions based on the proximity of input samples within the feature space [40,41]. For a sample requiring prediction, the KNN algorithm identifies the k nearest neighbors from the training set and predicts the output for this sample based on the labels of these k neighbors. In regression task, the KNN algorithm computes the average of the target values of these k nearest neighbors as the predicted value of the sample to be predicted, typically represented as a weighted average, as shown in Equation (1) [42,43].

\hat{y} = \frac{\sum_{i = 1}^{K} w_{i} y_{i}}{\sum_{i = 1}^{K} w_{i}}

(1)

where

\hat{y}

represents the predicted value for the sample in question; w_i represents the weight of the i-th nearest neighbor sample; and y_i signifies the label value of the i-th nearest neighbor sample.

In this study, the KNN algorithm was employed for temperature–salinity data interpolation. When constructing the KNN regression model for temperature and salinity data, the input vector x_i = {x_1i, x_2i, x_3i, x_4i, x_5i} was used to represent the year, month, latitude, longitude, and depth, respectively. The output vector y_i, was used to denote the corresponding temperature or salinity value.

Attention must be given to both the weight and k value. The two common weighting methods in KNN regression are uniform weighting and distance weighting. Given that the original dataset in this study was not evenly distributed spatially, we selected distance weighting. Specifically, the weights were set to be inversely proportional to the distance between each nearest neighbor data point and the new data point. This implied that closer data points exerted a greater influence on prediction results.

The magnitude of the k value directly impacts classification or regression outcomes within the KNN algorithm. Smaller values of k may render the model overly sensitive to local variations in data, resulting in overfitting. Conversely, larger values of k may lead to excessive smoothing of predictions, causing underfitting. In this model, potential k values were drawn from {2, 3, 4, 5, 6, 7, 8, 9}, with optimal selection determined through a grid search method. It is a parameter tuning approach that iterates through various combinations of specified parameters to identify optimal settings.

3.3. Sound Speed Prediction Model

We propose a multi-scale ocean sound speed prediction method based on the STA-ConvLSTM framework, which integrates ConvLSTM with temporal and spatial attention mechanisms. By leveraging existing ocean sound speed data, this method enables predictions at various scales, including sound speed profile, section, and structure.

3.3.1. ConvLSTM Model

LSTM is a specific type of Recurrent Neural Network (RNN). The internal structure of its unit is illustrated in Figure 2 [44]. It consists of the input x_i at the current time step, the output h_t−₁ from the previous time step, the memory unit c_t−₁ from the previous time step, as well as the output h_t and memory unit c_t at the current time step, along with gating information of the LSTM unit.

Unlike RNN, LSTM introduces an information transfer channel known as the “cell state”, which traverses the entire network and facilitates the passage of information between different time steps. By incorporating a gating structure to regulate the input, output, and retention of information, LSTM can effectively propagate layers over extremely long time series, thereby addressing the issues of vanishing and exploding gradients that RNN encounters when dealing processing long sequences [23,45]. At each time step, the cell state moves through the network, transmitting information from one time step to another. During this process, the input gate, forget gate, and output gate are responsible for determining what information should be added to or removed from, and read from the cell state. The outputs of these gates are governed by an activation function and range from 0 to 1. When a gate’s output value approaches 0, it indicates that relevant information is blocked. Conversely, when it approaches 1, it signifies that the relevant information is allowed to pass. Through these mechanisms, LSTM efficiently learns long-term dependencies in time series data [46].

ConvLSTM is a neural network that integrates Convolutional Neural Networks (CNNs) with LSTM [47]. It introduces convolutional operations to traditional LSTMs, enabling the model to simultaneously process both sequential and spatial information. In ConvLSTM, the computations for input, forget, and output gates, as well as the updating of cell states and hidden states, are all performed through convolutional operations. This architecture allows ConvLSTM to capture local spatial features in the input data while preserving the long-term dependency processing capabilities inherent in LSTM [25,48].

Compared to traditional LSTMs, a key characteristic of ConvLSTM is its replacement of fully connected operations with convolutional ones [49]. This modification ensures that when computing gated information and state updates, ConvLSTM retains the spatial structure of the input data. Furthermore, due to the translation invariance afforded by convolutional operations, ConvLSTM can more effectively manage spatial information and local features. Consequently, it demonstrates superior performance when processing sequential data with an underlying spatial structure [50]. As illustrated in Figure 3, during the training phase, ConvLSTM receives a series of input data characterized by spatial features and learns spatiotemporal dependencies among these inputs [51]. During the prediction phase, given one or more initial inputs, ConvLSTM generates predictions for future matrix states based on learned spatiotemporal characteristics.

The ConvLSTM network determines the future state of a cell within the grid by utilizing the past states of adjacent cells, as illustrated in Equations (2)–(6) [52].

i_{t} = σ (W_{x i} * X_{t} + W_{h i} * H_{t - 1} + W_{c i} \circ C_{t - 1} + b_{i})

(2)

f_{t} = σ (W_{x f} * X_{t} + W_{h f} * H_{t - 1} + W_{c f} \circ C_{t - 1} + b_{f})

(3)

C_{t} = f_{t} \circ C_{t - 1} + i_{t} \circ \tanh (W_{x c} * X_{t} + W_{h c} * H_{t - 1} + b_{c})

(4)

o_{t} = σ (W_{x o} * X_{t} + W_{h o} * H_{t - 1} + W_{c o} \circ C_{t} + b_{o})

(5)

H_{t} = o_{t} \circ \tanh (C_{t})

(6)

where i, f, c, and o denote the input gate, forget gate, control unit, and output gate, respectively; σ represents the nonlinear activation function; X_t signifies the input at time t; W_xi, W_hi, W_ci, W_xf, W_hf, W_cf, W_xc, W_hc, W_xo, W_ho, and W_co are the weight matrices; ‘∗’ denotes the convolution operator; ‘

\circ

’ indicates the Hadamard product; H_t represents the output value at time t; and o_t signifies the gating information in the output gate.

3.3.2. Attention Mechanism

The attention mechanism is a technique that emulates the allocation of human attention within deep learning models [53,54]. Through this approach, the model can automatically learn to assign varying weights to different segments of the input data, allowing it to focus more on information pertinent to the current task, thereby enhancing performance when addressing complex challenges [55,56]. The fundamental principles are as follows:

Feature extraction from the input data: Initially, the model encodes the input data to derive a set of hidden representations. These representations may take the form of vectors, matrices, or tensors that encapsulate information regarding the characteristics of the input data.
Computation of attention weights: The model calculates attention weights for each hidden representation in accordance with current task requirements. These weights reflect the degree of attention assigned by the model to each component during processing. The computation typically relies on the contextual information pertaining to both the input data and the target task.
Weighted summation of the hidden representations: The model applies attention weights to these hidden representations to perform a weighted summation. This step can be viewed as aggregating the different components of the input data based on their significance.
Output generation: The model produces output derived from the representation obtained through weighted summation. This result may manifest as a predicted value, vector, or complex data structure.

In this paper, both temporal and spatial attention mechanisms are utilized in the prediction of ocean sound speed across various spatial scales to effectively capture significant features in space and time. For the aforementioned ConvLSTM model structure, taking the sound speed profile and the prediction of sound speed profile as examples, the temporal attention module and the spatial attention module are designed, respectively.

3.3.3. STA-ConvLSTM Framework

The temporal attention module and the spatial attention module are integrated into the ConvLSTM network, resulting in a LSTM referred to as STA-ConvLSTM (spatiotemporal attention based ConvLSTM) [57,58,59]. The overall structure of the model is illustrated in Figure 4.

Taking the sound speed section and the prediction of the sound speed section as examples, the layers of the STA-ConvLSTM model are briefly introduced as follows:

Initially, the model receives the original sound speed sequence data through the input layer, which serves as input for subsequent layers. The input shape is defined as “samples, time, height, width, channels”, representing sequentially the number of samples, time step, height of the input 2D matrix, width of the input 2D matrix, and number of channels.
Subsequently, two ConvLSTM layers are connected following the input layer. The first ConvLSTM layer utilizes 64 filters and 7 × 7 convolution kernels to capture local features and temporal correlations within the input data through convolution in both time and space. The ReLU activation function is employed to introduce nonlinearity into the model. Padding is set to ‘same’ to ensure that the output size matches the input size. ‘return_sequences’ is set to True to retain the output for all time steps. The second ConvLSTM layer mirrors the first one by also employing 64 filters and 7 × 7 convolutional kernels. This layer further extracts features from the input data, enhancing the model’s ability to capture temporal and spatial information.
Then, the output of the ConvLSTM layer serves as the input of the temporal attention module, which can focus on assessing the significance of different time steps. A spatial attention module is then connected following the temporal attention module, enhancing its ability to capture information from key spatial locations.
After extracting spatiotemporal features, a Concatenate layer is introduced. This layer concatenates the output from both the original ConvLSTM layer and the spatial attention module along the channel dimension. The objective is to merge original features with those weighted by attention, allowing the model to retain essential information while also capturing critical insights through this mechanism.
Finally, the concatenated feature map is mapped to a sonic profile or sonic profile prediction using a two-dimensional convolutional layer (including 1 filter and 7 × 7 convolutional kernel) as the output layer. The activation function of this output layer defaults to linear, with its value directly representing the prediction. Padding is set to ‘same’ in order to maintain an output size consistent with that of the input size. Additionally, ‘data_format’ is configured as ‘channels_last’ to preserve channel order in alignment with the input data.

It is important to note that when predicting the three-dimensional sound speed structure, the input dimension is represented as a 6D tensor. Furthermore, the dimensions of depth, latitude, and longitude must be summed and weighted, introducing an additional dimension compared to predictions of both the sound speed section and the sound speed profile. Additionally, in order to predict the three-dimensional sound speed structure, ConvLSTM 2D should be replaced with ConvLSTM 3D, and Conv 2D should be substituted with Conv 3D.

4. Experiments and Results

4.1. Interpolation Experiments of Temperature and Salinity

4.1.1. Dataset Preprocessing

According to the study area (110.5° E~117.5° E, 9.5° N~11.5° N) mentioned in Section 3.1 of this paper, the training dataset was divided into a training subset and a test set in a 4:1 ratio. The training subset underwent a 4-fold cross-validation, with 25% of it being utilized as the validation set. As a result, the overall ratio of the training subset, validation set, and test set was 3:1:1.

The three-dimensional grid data of temperature and salinity in the study area were extracted from the global datasets and concatenated along the time dimension. Subsequently, the model was trained using sample data to develop a multi-time span interpolation model by Python version 3.8. Following interpolation, the spatial resolution of the data was 0.5° × 0.5° in the horizontal direction, with layer divided every meter in the vertical direction.

To better analyze the impact of time span on the regression model’s interpolation performance, temperature and salinity data from March and September 2020, as well as June and December 2021, were selected to construct a single-time span interpolation model. The results for each corresponding month were compared against those obtained from the multi-time span (2004~2021) interpolation model. The evaluation index took the form of a combination of mean square error (MSE) and mean absolute error (MAE).

4.1.2. Temperature Interpolation Results and Analysis

Regarding temperature interpolation, the interpolation results of different time span models are shown in Figure 5. The abscissa represents the true value, the ordinate is the predicted value, and the blue dashed line indicates that the true value and the predicted value are equal. The results show that in different months, the interpolation results of the multi-time span model and single-time span models are highly consistent and are distributed closely along the blue dotted line, with a small number of data points scattered on both sides of the dotted line. Moreover, the degree of deviation is weak and mainly concentrated in the range of 20 °C to 30 °C.

The calculated errors, MSE and MAE, of the predicted values of different models relative to the true values of each model are presented in Table 2 and Figure 6. It can be observed that under the conditions of this paper, the MSE of the multi-time span model interpolation lies between the single-time span model interpolations in different months, and the difference is not significant. The MAE of each model tends to be consistent, at approximately 0.3 °C.

4.1.3. Salinity Interpolation Results and Analysis

Regarding salinity interpolation, the interpolation results of models with different time spans are shown in Figure 7. The results indicate that in different months, the interpolation results of the multi-time span model and single-time span models are highly consistent and are generally distributed along the blue dotted line. Compared with other months, in September 2020 and December 2021, the interpolation results of the two models differed more and deviated more from the true value.

The calculated errors, MSE and MAE, of the predicted values of different models relative to the true values of each model are presented in Table 3 and Figure 8. It can be observed that the MSE of the multi-time span model interpolation lies between the mean square errors of the single-time span model interpolations in different months, and the difference is not significant. The MAE of each model tends to be consistent, at approximately 0.0220‰.

4.2. Prediction Experiments of Sound Speed

Utilizing the measured ocean temperature and salinity data, this study employed the ocean multi-scale sound speed prediction method based on the STA-ConvLSTM model to predict the ocean sound speed from multiple spatial scales such as sound speed profile, sound speed section, and sound speed structure by Python version 3.8. Subsequently, the results of on-site observations were compared to analyze and verify the validity and accuracy of the prediction method.

4.2.1. Dataset and Model Preprocessing

Based on the study area defined by coordinates (110.5° E~117.5° E, 9.5° N~11.5° N) as outlined in Section 3.1 of this paper, point O (114.0° E, 10.5° N) was selected for sound speed profile prediction. Points A (114.0° E, 10.0° N), O (114.0° E, 10.5° N), and B (114.0° E, 11.0° N) were selected to form a sound speed section for sound speed section prediction. Points O (114.0° E, 10.5° N), B (114.0° E, 11.0° N), C (114.5° E, 11.0° N), and D (114.5° E, 10.5° N) were selected to form a sound speed structure for sound speed structure prediction, as shown in Figure 9.

Based on a total of 216 months of temperature and salt data from January 2004 to December 2021 in the BOA_Argo dataset, the temperature and salt data spatial interpolation method based on the KNN algorithm was employed to interpolate and supplement the data. Subsequently, the ocean sound speed values at different spatial locations were calculated through the simplified Del Grosso sound speed expression [60], as shown in Equation (7).

C = 1449.2 + 4.6 T - 0.055 T^{2} + 0.000029 T^{3} + (1.34 - 0.01 T) (S - 35) + 0.016 Z

(7)

where C represents the speed of sound, T represents temperature, S represents salinity, and Z represents depth.

Using the dataset division of the sound speed profile as an example, the calculated sound speed data were concatenated, resulting in an increase in dimensions. The dimensions of the sound speed profile sequence data were (216, 1501, 1, 1). To maximize the model’s utilization of the dataset, a sliding window approach was employed to traverse through it. The window width was set to 12 months and the sliding stride was set to 1 month. Consequently, both the feature and label sets for the sound speed profile sequence data were obtained, each having a dimensions of (193, 12, 1501, 1, 1). Herein, ‘193’ indicates that there are 193 sets of data; ‘12’ signifies that each set contains profiles spanning over a period of 12 months; ‘1501’ denotes that predicted depths range from 0 to 1500 m across a total of 1501 points; the first ‘1’ represents that one sound speed profile will be predicted; and the second ‘1’ indicates that there is one characteristic dimension corresponding to the sound speed value.

Considering the last time series groups as a test set allowed for predicting ocean sound speed from January to December 2020 using the data collected from January 2004 to December 2020. The following comparative analysis revealed that optimal validation results corresponded to one group. Thus, we obtained dataset division outcomes: there were now 191 groups within the training dataset with a dimensions of (191, 12, 1501, 1, 1); one group in the validation set with a data dimension of (1, 12, 1501, 1, 1); and one group in the test set with a data dimension of (1, 12, 1501, 1, 1).

In predicting the sound speed section, the preprocessing and division of the data were largely consistent with those applied to the sound speed profile. However, there was a slight variation in the model input size. After dataset division, the training dataset comprised groups with a dimension of (191, 12, 1501, 3, 1); and the test set also consisted of one group with a dimension of (1, 12, 1501, 3, 1); and the test set had one group with a data dimension of (1, 12, 1501, 3, 1). The prediction of the three-dimensional ocean sound speed structure following the dataset division again revealed that the training dataset included 191 groups characterized by a data dimension of (191, 12, 1501, 2, 2, 1), while both validation and test sets each contained one group represented by a dimension of (1, 12, 1501, 2, 2, 1).

The batch size, activation function, convolutional kernel size, number of filters, and number of stacked layers of the ConvLSTM had a great impact on the learning ability of the network model. Therefore, it was necessary to test different network structures to determine the optimal network structure parameters in this experiment, and select the best ConvLSTM structure for the prediction of the sound speed profile. These were the empirical values obtained in the application of the ConvLSTM and the experiment. According to the prediction accuracy, the parameter setting that performed best in the experimental data studied was taken as the final network structure for the sound speed prediction. After preliminary experimental analysis, we determined the parameters as shown in Table 4.

4.2.2. Sound Speed Profile Prediction

To explore the role of spatiotemporal attention mechanisms, the ConvLSTM and STA-ConvLSTM were, respectively, employed to predict the ocean sound profile at point O (114.0° E, 10.5° N). The datasets, data division, and prediction methods of the two models were consistent. Root mean square error (RMSE) and MAE were utilized as the evaluation indexes for the prediction of the sound speed profile. The prediction result is shown in Figure 10. The blue line represents the RMSE of the model, corresponding to the left coordinate axis; the red line represents the MAE of the model, corresponding to the right coordinate axis.

As can be seen from Figure 10, under the conditions of this paper, the errors of the ConvLSTM and STA-ConvLSTM in predicting the ocean sound speed profile at point O are relatively close, and the error trend is consistent with the month. The overall performance is that the error is small from April to August and from November to December. Among the two models, from January to March and from September to December, the error of the ConvLSTM is smaller; from April to August, the error of the STA-ConvLSTM is smaller. The change is more pronounced within the water depth range of 100 m.

The temporal attention and spatial attention weights in the prediction process of the STA-ConvLSTM are shown in Figure 11. It can be found that temporal attention shows a trend of first rising and then falling, maintaining a large weight from March to August. Spatial attention shows significant variation within a range of 100 m and is relatively uniform thereafter.

4.2.3. Sound Speed Section Prediction

The ConvLSTM and STA-ConvLSTM were, respectively, employed to predict the ocean sound speed section of the A (114.0° E, 10.0° N), O (114.0° E, 10.5° N), and B (114.0° E, 11.0° N) sections. The datasets, data division, and prediction methods of the two models were consistent. RMSE and MAE were utilized as the evaluation indexes for the prediction of the sound speed profile. The prediction results are shown in Figure 12. The blue line represents the RMSE of the model, corresponding to the left coordinate axis; the red line represents the MAE of the model, corresponding to the right coordinate axis.

As can be seen from Figure 12, under the conditions of this paper, the errors predicted by the ConvLSTMl and STA-ConvLSTM for the ocean sound speed profiles of the A, O, and B sections are relatively close, and the errors are consistent with the monthly change trend. The overall performance is that the errors are small from April to August and from November to December. The errors of the STA-ConvLSTM are smaller than those of the ConvLSTM.

The temporal attention and spatial attention weights in the process of STA-ConvLSTM prediction are shown in Figure 13. It can be found that the temporal attention weights were relatively high in the first few time steps and then showed a gradual downward trend, maintaining a low level from April to December. Spatial attention shows significant variation within a range of 100 m and is relatively uniform thereafter.

4.2.4. Sound Speed Structure Prediction

The ConvLSTM and STA-ConvLSTM were, respectively, employed to predict the three-dimensional ocean sound speed structures of the O (114.0° E, 10.5° N), B (114.0° E, 11.0° N), C (114.5° E, 11.0° N), and D (114.5° E, 10.5° N) structures. The datasets, data division, and prediction methods of the two models were consistent. RMSE and MAE were utilized as the evaluation indexes for the prediction of the sound speed profile. The prediction results are shown in Figure 14. The blue line represents the RMSE of the model, corresponding to the left coordinate axis; the red line represents the MAE of the model, corresponding to the right coordinate axis.

As can be seen from Figure 14, under the conditions presented in this paper, the errors predicted by the ConvLSTM and STA-ConvLSTM for the ocean sound speed profiles of the O, B, C, and D structures are relatively close, and the errors are in line with the monthly change trend. The overall performance shows that the errors are small from April to December, and the errors of the STA-ConvLSTM are smaller than those of the ConvLSTM.

The temporal attention and spatial attention weights in the prediction process of the STA-ConvLSTM are shown in Figure 15. It can be found that temporal attention exhibits a trend of first decreasing and then increasing, maintaining a low level from March to December. Spatial attention shows significant variation within a range of 100 m and is relatively uniform thereafter, and there is a certain gap between different points.

5. Discussion

5.1. Interpolation of Temperature and Salinity

In this study, we propose a spatial interpolation method of temperature and salinity data utilizing the KNN algorithm. As detailed in Section 4.1, this approach effectively generated interpolated data for both temperature and salinity, accurately reflecting the spatial distribution of seawater properties while generally achieving high predictive accuracy. Compared to single-time span models, multi-time span models exhibit enhanced stability. Thus, when computational resources allow, the multi-time span model is preferred. Nonetheless, certain notable phenomena persisted during the interpolation process of temperature and salinity interpolation.

The predominant source of the error in temperature interpolation was concentrated within the range of 20 °C to 30 °C. It represents a high-temperature phase within the dataset. This phenomenon arose because elevated seawater temperatures are primarily found at surface levels and are significantly influenced by environmental factors such as air temperature, sunlight exposure, wind conditions, and wave activity [61]. Consequently, these variables contributed to greater variability in measurements compared to lower-temperature phases.

In comparison with other months, September 2020 and December 2021 exhibited more pronounced discrepancies between the two models’ interpolation results relative to the actual values. This observation underscores distinct seasonal characteristics. Our target marine area experiences its rainy season from September through December, with substantial fluctuations in salinity due to varying rainfall patterns [62]. As a result, the interpolation effect of the models was reduced.

Obviously, the method is susceptible to outliers, and the prediction accuracy is sensitive to changes in sea water temperature and salinity. In addition, the computational cost of data interpolation using KNN algorithm is high. Therefore, in subsequent research, these problems can be further improved and other temperature–salt interpolation methods can be explored.

5.2. Prediction of Sound Speed

Based on ConvLSTM and the fusion of temporal and spatial attention mechanisms, this paper proposes a multi-scale ocean sound speed prediction method based on the STA-ConvLSTM model. This method was employed to predict the sound speed profile, sound speed section, and sound speed structure. According to Section 4.2, this method exhibited some characteristics.

5.2.1. Error Comparison and Analysis

We compared the sound speed prediction errors of different model methods, as shown in Figure 16. The line in the middle of the box represents the average. The blue part represents the sound speed structure prediction error of our proposed STA-ConvLSTM model, which contains 12 data points corresponding to different months. The other parts represent the sound speed prediction errors of the H-LSTM model, back propagation (BP) neural network model [22], and polynomial fitting (PF) method [63], respectively. They all contain 14 data points corresponding to different water depths, which were calculated by Lu et al. [64]. Obviously, compared with the other three methods, our proposed STA-ConvLSTM model predicted the lowest RMSE mean of the sound speed and the data distribution is relatively dense. The above phenomenon proves the accuracy and stability of our proposed STA model for predicting ocean sound speed.

5.2.2. Multi-Space Coupling Analysis

As the predicted object transformed from profile and section to structure, the error curve of sound speed prediction using the STA-ConvLSTM model continued to shift downward, indicating that the prediction accuracy gradually improved, as shown in Figure 17. The figure shows the prediction error of different predicted object for the STA-ConvLSTM model. The blue line represents the RMSE of the model, corresponding to the left coordinate axis; the red line represents the MAE of the model, corresponding to the right coordinate axis. Throughout the year, the errors of the three predicted objects were smaller in April–August and larger in January–March and September–December. This is because the target sea area in this paper was studied during the warming period from January to March and the rainy season from September to December [65]. On the other hand, affected by the monsoon, the sea area experienced strong winds and waves from September to April of the following year [66]. The combination of these factors caused frequent fluctuations in seawater temperature and salinity. Therefore, the changes in ocean sound velocity showed weak regularity. For the prediction of the sound speed structure, the RMSE and MAE decreased from September to December, reflecting the advantages of structure prediction compared with sound speed profile and section. It incorporated more spatial information, reducing the impact of local fluctuations. Although the RMSE increased from January to March, the MAE remained at a low level, which is considered to be affected by outliers.

6. Conclusions

With the rapid advancement of computational power and continuous breakthroughs in deep learning technology, the predictive accuracy of spatiotemporal sequence prediction algorithms for ocean sound speed has significantly improved. In this paper, we propose an interpolation method for ocean temperature and salinity data based on the KNN algorithm to enhance dataset resolution. To investigate multi-spatial scale interactions of sound speed and achieve precise predictions with limited data, we introduce the STA-ConvLSTM framework, which integrates spatiotemporal attention mechanisms with ConvLSTM. We validated the accuracy of our method through experiments utilizing the BOA_Argo dataset. The main conclusions are as follows:

The spatial interpolation method of temperature and salt data based on the KNN algorithm proposed in this paper can provide effective interpolation data, accurately reflect the spatial distribution of seawater temperature and salinity, and have high prediction accuracy on the whole. Under the verification condition, the mean square errors of temperature and salt data interpolation were 0.2003 °C and 0.0039‰, respectively.

The proposed STA-ConvLSTM model has high accuracy in multi-spatial scale ocean sound speed prediction, and the prediction accuracy is further improved by fusing the spatiotemporal attention mechanism. Under the verification conditions, the RMSE of the model for the prediction of the sound speed profile, sound speed section, and sound speed structure reached about 0.57 m/s, and the MAE reached about 0.29 m/s.

The prediction of the sound speed profile, sound speed section, and sound speed structure reflects the sound speed distribution law under the characteristics of the ocean’s multi-spatial scale. The ocean sound speed prediction method based on STA-ConvLSTM model has better prediction effect when considering the coupling effect of multiple spatial scales. This study lays an effective foundation for the three-dimensional development of marine sound speed prediction technology.

However, at present, our summary of multi-spatial scale coupling effects is preliminary and crude, and it cannot accurately describe the interaction between the sound velocities at each spatial location. At the same time, the sound velocity prediction lacks consideration of the dynamic process of ocean temperature and salt correlation.

In future research, we will focus on exploring the effects of multi-dimensional spatial coupling effects on sound speed prediction, with a view to obtaining the results of sound speed prediction with spatially continuous properties, improving the accuracy and efficiency of prediction. In addition, we plan to combine the STA-ConvLSTM model with the physical knowledge of ocean dynamic processes, which is designed to improve the efficiency of the STA-ConvLSTM in dealing with physical relationships in space and time, and to improve the interpretability of the model.

Author Contributions

Conceptualization, Y.L., B.M. and Z.Q.; data curation, C.W. and C.G.; formal analysis, C.G., Y.C. and M.L.; investigation, Z.Q., C.W. and J.Z.; methodology, Y.L. and C.W.; resources, Y.L. and M.L.; software, C.W., S.Y. and J.Z.; validation, Y.L., B.M. and C.W.; visualization, C.G. and S.Y.; writing—original draft, Y.L., C.W. and Y.C.; writing—review and editing, Y.L. and B.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Taishan Scholars Program, Natural Science Foundation of Shandong Province, China (grant number ZR2024QD082) and the Key Research and Development Program of China (grant number 2021YFC***1105).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data supporting this study’s findings are available from the authors upon reasonable request.

Acknowledgments

We are deeply indebted to the BOA_Argo data for their crucial support in our research. The comprehensive and valuable information provided by this dataset has been instrumental in advancing our study. We express our sincere gratitude for the financial support from the Taishan Scholars Program, Natural Science Foundation of Shandong Province, China (grant number ZR2024QD082) and the Key Research and Development Program of China (grant number 2021YFC***1105). Their generous funding has made it possible for us to conduct this research and has significantly contributed to the success of our work.

Conflicts of Interest

Authors Benjun Ma and Zhiliang Qin declare that the company Sanya Nanhai Innovation and Development Base of Harbin Engineering University, Cheng Wang declare that the company Raisecom Technology Co., Ltd. both had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results and they have no conflicts of interest. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Luo, X.; Chen, L.; Zhou, H.; Cao, H. A Survey of Underwater Acoustic Target Recognition Methods Based on Machine Learning. J. Mar. Sci. Eng. 2023, 11, 384. [Google Scholar] [CrossRef]
Lei, Z.; Lei, X.; Wang, N.; Zhang, Q. Present Status and Challenges of Underwater Acoustic Target Recognition Technology: A review. Front. Phys. 2022, 18, 1044890. [Google Scholar] [CrossRef]
Sun, K.; Cui, W.; Chen, C. Review of Underwater Sensing Technologies and Applications. Sensors 2021, 21, 7849. [Google Scholar] [CrossRef] [PubMed]
Lin, M.; Yang, C. Ocean Observation Technologies: A Review. Chin. J. Mech. Eng. 2020, 33, 32. [Google Scholar] [CrossRef]
Sonnewald, M.; Lguensat, R.; Jones, D.C.; Dueben, P.D.; Brajard, J.; Balaji, V. Bridging Observations, Theory and Numerical Simulation of The Ocean Using Machine Learning. Environ. Res. Lett. 2021, 16, 073008. [Google Scholar] [CrossRef]
Fennel, K.; Mattern, J.P.; Doney, S.C.; Bopp, L.; Moore, A.M.; Wang, B.; Yu, L. Ocean Biogeochemical Modelling. Nat. Rev. Method Prime 2020, 2, 76. [Google Scholar] [CrossRef]
Januschowski, T.; Gasthaus, J.; Wang, Y.; Salinas, D.; Flunkert, V.; Bohlke-Schneider, M.; Callot, L. Criteria for Classifying Forecasting Methods. Int. J. Forecast. 2020, 36, 167–177. [Google Scholar] [CrossRef]
Vanem, E.; Zhu, T.Y.; Babanin, A. Statistical Modelling of The Ocean Environment-A Review of Recent Developments in Theory and Applications. Mar. Struct. 2022, 86, 103297. [Google Scholar] [CrossRef]
Xu, X.F.; Hu, S.T.; Shi, P.M.; Shao, H.S. Natural phase space reconstruction-based broad learning system for short-term wind speed prediction: Case studies of an offshore wind farm. Energy 2022, 262, 125342. [Google Scholar] [CrossRef]
Yuan, H.X.; Liu, Y.; Tang, Q.H.; Li, J.; Chen, G.X.; Cai, W.X. ST-LSTM-SA: A New Ocean Sound Velocity Field Prediction Model Based on Deep Learning. Adv. Atmos. Sci. 2024, 41, 1364–1378. [Google Scholar] [CrossRef]
Huang, W.; Li, D.S.; Zhang, H.; Xu, T.H.; Yin, F. A meta-deep-learning framework for spatio-temporal underwater SSP inversion. Front. Mar. Sci. 2023, 10, 1146333. [Google Scholar] [CrossRef]
Tolstoy, A.; Diachok, O.; Frazer, L. Acoustic tomography via matched field processing. J. Acoust. Soc. Am. 1991, 89, 1119–1127. [Google Scholar] [CrossRef]
Taroudakis, M.I.; Markaki, M.G. On the use of matched-field processing and hybrid algorithms for vertical slice tomography. J. Acoust. Soc. Am. 1997, 102, 885–895. [Google Scholar] [CrossRef]
Yu, Y.X.; Li, Z.L.; He, L. Matched-field inversion of sound speed profile in shallow water using a parallel genetic algorithm. Chin. J. Oceanol. Limnol. 2010, 28, 1080–1085. [Google Scholar] [CrossRef]
Li, Q.Q.; Shi, J.; Li, Z.L.; Luo, Y.; Yang, F.L.; Zhang, K. Acoustic sound speed profile inversion based on orthogonal matching pursuit. Acta Oceanol. Sin. 2019, 38, 149–157. [Google Scholar] [CrossRef]
Bianco, M.J.; Gerstoft, P.; Traer, J.; Ozanich, E.; Roch, M.A.; Gannot, S.; Deledalle, C.A. Machine learning in acoustics: Theory and applications. J. Acoust. Soc. Am. 2019, 146, 3590–3628. [Google Scholar] [CrossRef]
Gao, Z.K.; Jin, N.D. Complex network from time series based on phase space reconstruction. Chaos 2009, 19, 033137. [Google Scholar] [CrossRef]
Jiang, Y.; Bao, X.; Hao, S.n.; Zhao, H.T.; Li, X.Y.; Wu, X.N. Monthly Streamflow Forecasting Using ELM-IPSO Based on Phase Space Reconstruction. Water Resour. Manag. 2020, 34, 3515–3531. [Google Scholar] [CrossRef]
Du, S.S.; Song, S.B.; Wang, H.M.; Guo, T.L. A novel method of nonuniform phase space reconstruction for multivariate prediction of daily runoff. J. Hydrol. 2024, 638, 131510. [Google Scholar] [CrossRef]
Karimov, A.I.; Kopets, E.; Nepomuceno, E.G.; Butusov, D. Integrate-and-Differentiate Approach to Nonlinear System Identification. Mathematics 2021, 9, 2999. [Google Scholar] [CrossRef]
Ding, F.; Xu, L.; Zhang, X.; Zhou, Y.H. Filtered auxiliary model recursive generalized extended parameter estimation methods for Box–Jenkins systems by means of the filtering identification idea. Int. J. Robust Nonlinear Control 2023, 33, 5510–5535. [Google Scholar] [CrossRef]
Yu, X.K.; Xu, T.H.; Wang, J.T. Sound Velocity Profile Prediction Method Based on RBF Neural Network. In Proceedings of the China Satellite Navigation Conference (CSNC) 2020 Proceedings, Chengdu, China, 22–25 November 2020; Volume III, pp. 475–487. [Google Scholar]
Zhang, Q.; Wang, H.; Dong, J.Y.; Zhong, G.; Sun, X. Prediction of Sea Surface Temperature Using Long Short-Term Memory. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1745–1749. [Google Scholar] [CrossRef]
Ali, A.; Fathalla, A.; Salah, A.; Bekhit, M.; Eldesouky, E. Marine Data Prediction: An Evaluation of Machine Learning, Deep Learning, and Statistical Predictive Models. Comput. Intell. Neurosci. 2021, 27, 8551167. [Google Scholar] [CrossRef] [PubMed]
Li, B.Y.; Zhai, J.S. A Novel Sound Speed Profile Prediction Method Based on the Convolutional Long-Short Term Memory Network. J. Mar. Sci. Eng. 2022, 10, 572. [Google Scholar] [CrossRef]
Ou, Z.Y.; Qu, K.; Liu, C. Estimation of sound speed profiles using a random forest model with satellite surface observations. Shock Vib. 2022, 2022, 2653791. [Google Scholar] [CrossRef]
Ou, Z.Y.; Qu, K.; Shi, M.; Wang, Y.F.; Zhou, J.B. Estimation of sound speed profiles based on remote sensing parameters using a scalable end-to-end tree boosting model. Front. Mar. Sci. 2022, 9, 1051820. [Google Scholar] [CrossRef]
Piao, S.C.; Yan, X.; Li, Q.Q.; Li, Z.L.; Wang, Z.W.; Zhu, J.L. Time series prediction of shallow water sound speed profile in the presence of internal solitary wave trains. Ocean Eng. 2023, 283, 115058. [Google Scholar] [CrossRef]
Wu, P.F.; Zhang, H.; Shi, Y.J.; Lu, J.J.; Li, S.J.; Huang, W.; Tang, N.; Wang, S.J. Real-time estimation of underwater sound speed profiles with a data fusion convolutional neural network model. Appl. Ocean Res. 2024, 150, 104088. [Google Scholar] [CrossRef]
Gao, C.; Cheng, L.; Zhang, T.; Li, J.L. Long-term Forecasting of Ocean Sound Speeds At Any Time Via Neural Ordinary Differential Equations. In Proceedings of the OCEANS 2024—Singapore, Singapore, 15–18 April 2024; pp. 1–6. [Google Scholar]
Xu, Y.H.; Hou, J.Y.; Zhu, X.J.; Wang, C.; Shi, H.D.; Wang, J.Y. Hyperspectral Image Super-Resolution With ConvLSTM Skip-Connections. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–16. [Google Scholar] [CrossRef]
Abbass, M.J.; Lis, R.; Awais, M.; Nguyen, T.X. Convolutional Long Short-Term Memory (ConvLSTM)-Based Prediction of Voltage Stability in a Microgrid. Energies 2024, 17, 1999. [Google Scholar] [CrossRef]
Zheng, L.; Lu, W.S.; Zhou, Q.Y. Weather image-based short-term dense wind speed forecast with a ConvLSTM-LSTM deep learning model. Build. Environ. 2023, 239, 110446. [Google Scholar] [CrossRef]
He, R.; Liu, Y.B.; Xiao, Y.P.; Lu, X.Y.; Zhang, S. Deep spatio-temporal 3D densenet with multiscale ConvLSTM-Resnet network for citywide traffic flow forecasting. Knowl.-Based Syst. 2022, 250, 109054. [Google Scholar] [CrossRef]
Lv, Z.Q.; Ma, Z.B.; Xia, F.Q.; Li, J.B. A transportation Revitalization index prediction model based on Spatial-Temporal attention mechanism. Adv. Eng. Inform. 2024, 61, 102519. [Google Scholar] [CrossRef]
Xu, C.Y.; Xu, C.Q. Local spatial and temporal relation discovery model based on attention mechanism for traffic forecasting. Neural Netw. 2024, 176, 106365. [Google Scholar] [CrossRef]
Song, T.; Wei, W.; Meng, F.; Wang, J.; Han, R.; Xu, D. Inversion of Ocean Subsurface Temperature and Salinity Fields Based on Spatio-Temporal Correlation. Remote Sens. 2022, 14, 2587. [Google Scholar] [CrossRef]
Valdes, P.J.; Scotese, C.R.; Lunt, D.J. Deep Ocean Temperatures Through Time. Clim. Past 2021, 17, 1483–1506. [Google Scholar] [CrossRef]
Pan, S.Y.; Tian, S.Q.; Wang, X.F.; Dai, L. Comparing Different Spatial Interpolation Methods to Predict the Distribution of Fishes: A Case Study of Coilia Nasus in The Changjiang River Estuary. Acta Oceanol. Sin. 2021, 40, 119–132. [Google Scholar] [CrossRef]
Uddin, S.; Haque, I.; Lu, H.H.; Moni, M.A.; Gide, E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci. Rep. 2022, 12, 6256. [Google Scholar] [CrossRef]
Zhang, S.C. Challenges in KNN Classification. IEEE Trans. Knowl. Data Eng. 2022, 34, 4663–4675. [Google Scholar] [CrossRef]
Zhang, S.C.; Li, J.Y. KNN Classification With One-Step Computation. IEEE Trans. Knowl. Data Eng. 2023, 35, 2711–2723. [Google Scholar] [CrossRef]
Pan, Z.B.; Wang, Y.K.; Pan, Y.W. A New Locally Adaptive K-Nearest Neighbor Algorithm Based on Discrimination Class. Knowl.-Based Syst. 2020, 204, 106185. [Google Scholar] [CrossRef]
Wang, H.Y.; Xu, P.D.; Zhao, J.H. Improved KNN Algorithm Based on Preprocessing of Center in Smart Cities. Complexity 2021, 2021, 5524388. [Google Scholar] [CrossRef]
Abbasimehr, H.; Shabani, M.; Yousefi, M. An Optimized Model Using LSTM Network for Demand Forecasting. Comput. Ind. Eng. 2020, 143, 106435. [Google Scholar] [CrossRef]
Zhao, J.Y.; Huang, F.Q.; Lv, J.; Duan, Y.; Qin, Z.; Li, G.; Tian, G. Do RNN and LSTM have long memory? In Proceedings of the 37th International Conference on Machine Learning (ICML’20), Vienna, Austria, 12–18 July 2020; Volume 119, Article 1054. pp. 11365–11375. [Google Scholar]
Lindemann, B.; Müller, T.; Vietz, H.; Jazdi, N.; Weyrich, M. A Survey on Long Short-Term Memory Networks for Time Series Prediction. Procedia CIRP 2021, 99, 650–655. [Google Scholar] [CrossRef]
Agga, A.; Abbou, A.; Labbadi, M.; Houm, Y.E. Short-Term Self Consumption PV Plant Power Production Forecasts Based on Hybrid CNN-LSTM, ConvLSTM Models. Renew. Energy 2021, 177, 101–112. [Google Scholar] [CrossRef]
Moishin, M.; Deo, R.C.; Prasad, R.; Rai, N.; Abdulla, S. Designing Deep-Based Learning Flood Forecast Model With ConvLSTM Hybrid Algorithm. IEEE Access 2021, 9, 50982–50993. [Google Scholar] [CrossRef]
Peng, Y.Q.; Tao, H.F.; Li, W.; Yuan, H.T.; Li, T.J. Dynamic Gesture Recognition Based on Feature Fusion Network and Variant ConvLSTM. IET Image Process. 2020, 14, 2480–2486. [Google Scholar] [CrossRef]
Guo, F.; Yang, J.; Li, H.; Li, G.; Zhang, Z. A ConvLSTM Conjunction Model for Groundwater Level Forecasting in a Karst Aquifer Considering Connectivity Characteristics. Water 2021, 13, 2759. [Google Scholar] [CrossRef]
Jalalifar, R.; Delavar, M.R.; Ghaderi, S.F. SAC-ConvLSTM: A Novel Spatio-Temporal Deep Learning-Based Approach for A Short Term Power Load Forecasting. Expert Syst. Appl. 2024, 237, 121487. [Google Scholar] [CrossRef]
Liu, W.; Wang, Y.Q.; Zhong, D.Y.; Xie, S.; Xu, J.J. ConvLSTM Network-Based Rainfall Nowcasting Method With Combined Reflectance and Radar-Retrieved Wind Field As Inputs. Atmosphere 2022, 13, 411. [Google Scholar] [CrossRef]
Niu, Z.Y.; Zhong, G.Q.; Yu, H. A Review on The Attention Mechanism of Deep Learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Lai, Q.X.; Khan, S.; Nie, Y.W.; Sun, H.Q.; Shen, J.B.; Shao, L. Understanding More About Human and Machine Attention in Deep Neural Networks. IEEE Trans. Multimed. 2020, 23, 2086–2099. [Google Scholar] [CrossRef]
Gangopadhyay, T.; Tan, S.Y.; Jiang, Z.H.; Meng, R.; Sarkar, S. Spatiotemporal Attention for Multivariate Time Series Prediction and Interpretation. In Proceedings of the ICASSP 2021—2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toronto, ON, Canada, 6–11 June 2021; pp. 3560–3564. [Google Scholar]
Ding, Y.K.; Zhu, Y.L.; Feng, J.; Zhang, P.C.; Cheng, Z.R. Interpretable Spatio-Temporal Attention LSTM Model for Flood Forecasting. Neurocomputing 2020, 403, 348–359. [Google Scholar] [CrossRef]
Li, B.; Tang, B.Q.; Deng, L.; Zhao, M.H. Self-Attention ConvLSTM and Its Application in RUL Prediction of Rolling Bearings. IEEE Trans. Instrum. Meas. 2021, 70, 3518811. [Google Scholar] [CrossRef]
Zhou, G.X.; Chen, J.; Liu, M.; Ma, L.F. A Spatiotemporal Attention-Augmented ConvLSTM Model for Ocean Remote Sensing Reflectance Prediction. Int. J. Appl. Earth Obs. Geoinf. 2024, 129, 103815. [Google Scholar] [CrossRef]
Chen, C.; Qiu, A.; Chen, H.Y.; Chen, Y.J.; Liu, X.; Li, D. Prediction of Pollutant Concentration Based on Spatial-Temporal Attention, ResNet and ConvLSTM. Sensors 2023, 23, 8863. [Google Scholar] [CrossRef]
Shi, J.L.; Xu, N.; Luo, N.N.; Li, S.J.; Xu, J.J.; He, X.D. Retrieval of Sound-Velocity Profile in Ocean by Employing Brillouin Scattering LiDAR. Opt. Express 2022, 30, 16419–16431. [Google Scholar] [CrossRef]
Al-Shehhi, M.R. Uncertainty in Satellite Sea Surface Temperature With Respect to Air Temperature, Dust Level, Wind Speed and Solar Position. Reg. Stud. Mar. Sci. 2022, 53, 102385. [Google Scholar] [CrossRef]
Liu, F.C.; Ji, T.; Zhang, Q.L. Sound Speed Profile Inversion Based on Mode Signal and Polynomial Fitting. Acta Armamentarii 2019, 40, 2283–2295. [Google Scholar] [CrossRef]
Lu, J.J.; Zhang, H.; Li, S.J.; Wu, P.F.; Huang, W. Enhancing Few-Shot Prediction of Ocean Sound Speed Profiles through Hierarchical Long Short-Term Memory Transfer Learning. J. Mar. Sci. Eng. 2024, 12, 1041. [Google Scholar] [CrossRef]
Xu, X.F.; Yu, K.F.; Chen, T.R.; Tao, S.C.; Yan, H.Q.; Chen, T.G. The Responses of Sr/Ca, δ18O, and δ13C in The Porites Coral Skeleton to Extreme Thermal Events in the Nansha Islands. Singap. J. Trop. Geogr. 2022, 42, 1771–1782. [Google Scholar] [CrossRef]
Li, J.; Wang, Y.P.; Gao, S. In Situ Hydrodynamic Observations on Three Reef Flats in The Nansha Islands, South China Sea. Front. Mar. Sci. 2024, 11, 1375301. [Google Scholar] [CrossRef]

Figure 1. Interpolation area for temperature and salinity data.

Figure 2. The internal structure of neurons in LSTM.

Figure 3. Internal structure diagram of ConvLSTM.

Figure 4. Schematic diagram of STA-ConvLSTM network structure.

Figure 5. Temperature interpolation of models with different time spans.

Figure 6. Histogram of temperature interpolation error of models with different time spans.

Figure 7. Salinity interpolation of models with different time spans.

Figure 8. Histogram of salinity interpolation error of models with different time spans.

Figure 9. Schematic diagram of sound speed prediction range.

Figure 10. Prediction error of sound speed profile of different models.

Figure 11. Variation in attention weight predicted by sound speed profile.

Figure 12. Prediction error of sound speed section of different models.

Figure 13. Variation in attention weight predicted by sound speed section.

Figure 14. Prediction error of sound speed structure of different models.

Figure 15. Variation in attention weight predicted by sound speed structure.

Figure 16. Comparison of prediction errors of different models.

Figure 17. Prediction error of different predicted object for STA-ConvLSTM model.

Table 1. Comparison of methods for sound speed prediction.

Method	Main Parameters	Advantage	Disadvantage
MFP	Source Location, Receiver Array Location, Environmental Parameters	High Accuracy, Environmental Change Sensitivity	Large Amount of Calculation, Large Demand for Prior Information
CS	Measurement Matrix, Primary Function, Restructing Algorithm	Small Demands for Data, Low Computing Cost, High Flexibility	Poor Processing of Non-sparse Signals
RF	Number of Decision Trees, Maximum Depth	Suitable for High-dimensional Data, Good Tolerance for Outliers and Noise	Less Explanatory, Long Training Time
LSTM	Number of Neurons in Hidden Layer, Learning Rate, Batch Size	Suitable for Time Series and Long-term Dependencies	Complex Model Structure, Long Training Time
STA-ConvLSTM (We Propose)	Convolution Kernel Size, Attention Weight	Strong Modeling Ability for Spatiotemporal Data, Automatically Pay Attention to Important Information	High Complexity for Model, Large Amount of Computation

Table 2. Temperature interpolation table of models with different time spans.

Year	Month	MSE (°C)	MAE (°C)
2020	3	0.2155	0.2861
2020	9	0.1752	0.2972
2021	6	0.2083	0.3069
2021	12	0.1947	0.2984
2004~2021	1~12	0.2003	0.2989

Table 3. Salinity interpolation table of models with different time spans.

Year	Month	MSE (‰)	MAE (‰)
2020	3	0.0035	0.0217
2020	9	0.0021	0.0182
2021	6	0.0018	0.0184
2021	12	0.0049	0.0270
2004~2021	1~12	0.0039	0.0250

Table 4. Key parameter setting of ConvLSTM.

Key Parameters	Value
Batch Size	16
Activation Function	ReLU
Convolutional Kernel Size	(7, 7)
Number of Filters	32
Number of Stacked Layers	2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, Y.; Ma, B.; Qin, Z.; Wang, C.; Guo, C.; Yang, S.; Zhao, J.; Cai, Y.; Li, M. A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning. J. Mar. Sci. Eng. 2024, 12, 1943. https://doi.org/10.3390/jmse12111943

AMA Style

Liu Y, Ma B, Qin Z, Wang C, Guo C, Yang S, Zhao J, Cai Y, Li M. A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning. Journal of Marine Science and Engineering. 2024; 12(11):1943. https://doi.org/10.3390/jmse12111943

Chicago/Turabian Style

Liu, Yu, Benjun Ma, Zhiliang Qin, Cheng Wang, Chao Guo, Siyu Yang, Jixiang Zhao, Yimeng Cai, and Mingzhe Li. 2024. "A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning" Journal of Marine Science and Engineering 12, no. 11: 1943. https://doi.org/10.3390/jmse12111943

APA Style

Liu, Y., Ma, B., Qin, Z., Wang, C., Guo, C., Yang, S., Zhao, J., Cai, Y., & Li, M. (2024). A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning. Journal of Marine Science and Engineering, 12(11), 1943. https://doi.org/10.3390/jmse12111943

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Spatial Scale Ocean Sound Speed Prediction Method Based on Deep Learning

Abstract

1. Introduction

2. Related Work

3. Data and Methods

3.1. Data

3.2. KNN Regression Model

3.3. Sound Speed Prediction Model

3.3.1. ConvLSTM Model

3.3.2. Attention Mechanism

3.3.3. STA-ConvLSTM Framework

4. Experiments and Results

4.1. Interpolation Experiments of Temperature and Salinity

4.1.1. Dataset Preprocessing

4.1.2. Temperature Interpolation Results and Analysis

4.1.3. Salinity Interpolation Results and Analysis

4.2. Prediction Experiments of Sound Speed

4.2.1. Dataset and Model Preprocessing

4.2.2. Sound Speed Profile Prediction

4.2.3. Sound Speed Section Prediction

4.2.4. Sound Speed Structure Prediction

5. Discussion

5.1. Interpolation of Temperature and Salinity

5.2. Prediction of Sound Speed

5.2.1. Error Comparison and Analysis

5.2.2. Multi-Space Coupling Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI