1. Introduction
The influence of the external environment on the ship in the sea voyage comes from two aspects, which are the hydrological environment of the ocean and the meteorological environment of the atmosphere. The elements that have a direct influence on the ship include wind, waves, current, fog, ice, and tide [
1]. The wind mainly affects the drift and deflection of the ship, and the waves formed by wind drive will affect the safety and navigation efficiency of the ship [
2]. Along with the progress of science and technology, communication between countries is more and more frequent. Shipping is an important means of communication and trade in the world, and sea navigation is tested by waves all the time [
3]. The irregularity of sea waves poses a great challenge to maritime navigation safety, maritime scientific research, maritime operations, and the exploitation of ocean energy [
4,
5]. Therefore, high-precision wave height prediction is helpful in understanding the wave conditions in advance, helping offshore workers plan ahead, maintain the safety of marine navigation, and guarantee smooth and safe marine transportation and offshore operations.
Wave height prediction provides services for ship meteorological and hydrological protection. It takes time for a ship to reach the target area, and multi-step wave height prediction data within a few hours are needed, which are short-term predictions [
6]. The wave height prediction data are a time series data of the whole sea area rather than a time series data of a single location. Therefore, the wave height prediction problem is a multi-step spatio-temporal prediction problem.
Numerical wave forecasting methods aim to derive wave characteristics, such as height and period, by solving the wave spectral equations that describe physical processes occurring in the ocean [
7]. The mainstream approach for regional wave height prediction is based on numerical models that simulate the physical processes of wave generation and dissipation. The third-generation wave model SWAN [
8,
9,
10] have been developed to accurately simulate wave generation, propagation, and dissipation at various scales, from shallow water to deep water, and has been widely applied in wave simulation [
11] and wave energy prediction [
12,
13]. However, these models require significant computational resources and time to solve the equilibrium equations of wave action, which limits their efficiency and accuracy in long-term simulations of large-scale seas. Therefore, there is a need to balance computational efficiency and simulation accuracy in the development of these models.
Statistical forecasting methods are based on mathematical and statistical approaches that build models to find the relationship between input and output variables based on a large amount of data. Statistical forecasting methods include time series forecasting methods based on traditional parametric models [
14,
15], time series forecasting methods based on traditional machine learning [
16,
17,
18], and time series forecasting methods based on deep learning [
19,
20,
21]. Among them, the traditional parametric model-based prediction methods are difficult to capture the nonlinear features in the data, and the traditional machine learning-based spatio-temporal sequence prediction methods can automatically capture the nonlinear features in the data and have good generalization ability on small samples. The deep learning-based spatio-temporal sequence prediction methods not only can effectively mine the effective information in the data and automatically capture the hidden linear and nonlinear features but also can efficiently handle large-scale spatio-temporal sequence data [
22].
Deep learning algorithms achieve high prediction accuracy by using simple neurons to create nonlinear mapping relations. These models provide explicit solutions and can balance computational efficiency with prediction accuracy, making them suitable for the fast and accurate prediction of large-scale waves [
23]. In related work, James et al. [
24] proposed a multilayer perceptron (MLP) model for predicting regionally significant wave heights and a support vector machine (SVM) model for identifying regional feature cycles. Machine learning models were developed as accurate and computationally efficient alternatives to the SWAN model, and the alternative models showed strong accuracy in predicting regionally significant wave heights and identifying feature periods in the computational domain. However, wave prediction depends not only on the input at the current point in time but also on the output at previous points in time. This requires machine learning methods that can recognize patterns in time-series data, such as recurrent neural networks or long and short-term memory. Feng et al. [
25] developed an MLP model to predict significant wave heights and crystal periods in Lake Michigan. The model considers topographic factors, such as winter icing, and achieves high prediction accuracy with much less computational time than the SWAN model. However, many existing regional wave prediction models rely on MLP models to convert regional wave information into vectors for prediction, which can result in a loss of spatial information and reduced prediction accuracy. Gao et al. [
18] developed an LSTM-based model for predicting wave heights at the Bohai Sea hydrographic station. Their results showed that the LSTM model outperformed other models, such as feedforward neural network (FNN) and support vector regression (SVR). Pirhooshyaran and Snyder [
26] combined LSTM networks with Bayesian hyperparametric optimization and elastic network methods to develop a sequence-to-sequence neural network for wave height prediction. This novel approach achieved superior results compared to other neural network models in validation. Jing et al. [
27] proposed a convolutional neural network(CNN)-based regional wave prediction (CNN-RWP) model using a CNN to construct a mapping relationship between wind data and wave data. The CNN-RWP model and SWAN were compared using a dataset from the Gulf of Mexico. The CNN-RWP model was compared with the SWAN model using a dataset from the Gulf of Mexico, and the average absolute error of both the CNN-RWP model and the SWAN output was less than 10%, but the computational efficiency was improved by a factor of about 1000.
Considering the needs of oceanic navigation, regional wave height prediction for the ocean is of equal importance as wave height prediction for a single significant location. It is important to note that regional wave height prediction does not refer only to multi-location wave height input but rather uses both multiple neighboring location inputs and achieves simultaneous prediction of multiple locations. Regional wave height prediction presupposes that multiple locations with predictions are spatially correlated and, at the same time, temporally correlated between multiple consecutive moments, so this multi-step prediction of regional wave heights is a spatio-temporal prediction problem.
The core of regional wave height prediction is to learn spatial correlation and temporal correlation from a large amount of data, so current spatio-temporal prediction models are mainly based on CNN and RNN. Regional wave height prediction still has the following difficulties.
- (1)
The model needs to output the predicted values of multiple locations simultaneously, which is a pixel-level prediction. Achieving accurate pixel-level spatial output not only requires the model to have strong spatio-temporal feature extraction capabilities but also needs to be able to correctly resolve the extracted deep spatial features to the output map of the same size. For the regional wave height prediction task, direct prediction from the image representation is not suitable, but the deep features should be decoded using the same network layers with gradually increasing output resolution [
28]. Thus, regional wave height prediction places high demands on the model structure.
- (2)
Performing multi-step prediction while guaranteeing pixel-level regional wave height output is a challenging task. Current regional wave height prediction models, especially CNN-like models, generally perform single-step prediction. Some studies also exist that use independent modeling of individual moments to achieve multi-step prediction, and this approach has difficulty in maintaining high-accuracy prediction at the more backward moments [
29].
In recent years, spatio-temporal sequence learning has received more attention than time-series learning because spatio-temporal learning can effectively represent complex spatio-temporal phenomena. Shi et al. [
30] proposed a convolutional LSTM (ConvLSTM) network, which combined a convolutional neural network with a recurrent neural network and proved to be able to predict rainfall well from radar images. One advantage of ConvLSTM over CNN is that the former can capture the correlation between time and space. However, ConvLSTM has too many parameters and can easily overwhelm the data [
19,
20].
In order to address the above problems, this paper combines convolutional neural networks and recurrent neural networks to propose a multi-step spatio-temporal prediction model for wave height based on ConvGRU and using a multi-input multi-output multi-step prediction strategy [
31]. The model relies on the encoder-predictor architecture of ConvGRU to construct a mapping of the high-resolution input matrix to the same-resolution output matrix to obtain accurate multi-location prediction results.
The rest of the paper is organized as follows; in
Section 2, we present the data and the multi-step spatio-temporal prediction method used in this study. In
Section 3, we describe in detail the proposed ConvGRU-based multi-step wave height prediction model for the encoder-predictor region.
Section 4 evaluates and discusses the model through experiments. In
Section 5, we summarize the conclusions.
3. Model Building and Experimental Setup
3.1. ConvGRU Network
Because the traditional recurrent neural network (RNN) cannot handle long-term dependence well due to the exploding gradient or vanishing gradient generated during training, the Gated Recurrent unit (GRU) is an improvement on the traditional recurrent neural network RNN, which is a variant of the Long Short-Term Memory Neural Network LSTM, simplifying the structure of the LSTM with only an update gate and a reset gate [
35]. The GRU model has fewer parameters and is a simpler model but maintains the same performance as the LSTM with faster training convergence time. It inherits the ability of RNN to explore the intrinsic dependencies of sequence data but also solves the problems of vanishing gradient, long training time, and overfitting caused by the long sequences of traditional RNN and improves the local optimization ability and network generalization ability [
36,
37]. Compared with GRU networks, convolution-based gated recurrent unit (ConvGRU) neural networks have stronger learning ability, so this paper uses convolution-based GRU (ConvGRU) for modeling, and the ConvGRU structure cannot only establish temporal relationships, such as GRU, but also carve out local spatial features, such as CNN. The internal structure of ordinary LSTM and GRU adopts a nearly fully connected approach, which brings serious information redundancy problems, and this connection ignores the spatial correlation between local pixels in the data. ConvGRU extends the idea of being fully connected in GRU to the convolutional structure and replaces the dot product operation in GRU with the convolution operation, and the internal structure of ConvGRU is shown in
Figure 3. A feature and advantage of this design is that all the input and output elements are three-dimensional tensors, which preserve spatial information.
With the memory feature in GRU, ConvGRU can preserve the features of historical input image sequences during training, which can also ensure the effective transfer of feature information over a longer period and improve the accuracy of prediction results, calculated as follows.
is the reset gate, is the update gate, is the current memory information, and is the final memory information. is the information input at the current moment, is the hidden layer output at the previous moment, and and are the respective bias and weight matrices. The symbol “” denotes the convolution operator, “” denotes the Hadamard product, and “” denotes the Sigmoid function. The structure selects the information through a gate structure composed of Sigmoid layers and convolutional operations. Whenever a new input arrives, the reset gate controls the decision to clear the previous state, and the update gate controls the amount of new information entering the state.
3.2. Model Building
To ensure that the model can represent spatio-temporal features well and effectively predict changes in wave height spatio-temporal sequences, an encoder-predictor structure similar to that of Shi et al. [
30] is used to predict spatio-temporal sequences. The encoder module consists of two convolutional downsampling layers and three ConvGRU layers. The predictor module consists of two transposed convolutional upsampling layers and three ConvGRU layers.
Using the wave height values and wave height directions of the past 24 h, a three-layer Encoder-Forecaster model is built based on the ConvGRU framework for training to establish a spatio-temporal prediction model for wave height values and to predict the wave height values for the next 12 h.
Figure 4 shows the network structure of the Encoder-Forecaster model. The Encoder module learns the image features from low-dimensional to high-dimensional, i.e., after the convolutional downsampling to reduce the image feature size and the ConvGRU unit to learn the image sequence features, the intermediate vector is obtained, and then the intermediate vector is input to the Forecaster forecasting module, and the transposed convolutional upsampling part of the Forecaster module increases the image feature size and the ConvGRU unit learns the image sequence features, and output the future region wave height values. The loss function is continuously updated during the training process so that the loss function value is continuously reduced.
Figure 5 shows the change of the feature map during the process of the model from the Encoder module to the Forecaster module. In the Encoder phase, the size of the feature map gradually becomes smaller while the number of channels gradually increases, and the extracted features gradually change from low-dimensional to high-dimensional, and in the Forecaster phase, the size of the feature map gradually becomes larger while the number of channels gradually becomes smaller, until finally the output image with the same size as the input image is obtained.
3.3. Loss Function and Model Setup
The regional wave height prediction model uses the Frobenius paradigm as the loss function and performs one gradient calculation and network parameter update using the loss function value of each batch. Compared with using the entire sample set for a single parameter update, this strategy improves the computational speed and facilitates the search for extreme value points on large data sets. The loss function on each batch is shown in Equation (9).
where
denotes the number of samples in this batch, and the size is equal to the batch size. The variable
is an integer multiple of
to ensure that a set of
-step prediction samples appear in the same batch. The variables
and
represent the true and predicted values of the target variable (wave height) corresponding to the samples, respectively, both containing
elements.
The hyperparameters of the model are batch size set to 12 and learning rate set to 0.001, and the model uses Adam optimizer. (The parameter settings are explained in
Section 4.1) The training process uses an Encoder-Forecaster network consisting of three layers of ConvGRU, and the information on the network structure parameters is shown in
Table 1. The multi-layer ConvGRU can be used to obtain information on the wave height data in both temporal and spatial dimensions to better establish temporal relationships. Dropout allows each iteration to go randomly to update the network parameters. Introducing such randomness not only increases the ability to handle wave height data but also keeps the input and output neurons unchanged by randomly deleting some hidden neurons in the network layers and back propagating the errors through the modified network by the operation of forward propagation.
3.4. Experimental Setup and Evaluation Indicators
The hardware platform is equipped with NVIDIA GeForce RTX 3080, GPU configuration CUDA 11.3 parallel framework, and cuDNN8.2 acceleration library. The model is built based on Tensorflow 2.3.0 and Numpy 1.18.5, and the code is based on Python 3.8.
The first 24 h of data were used in the experiment to predict the second 12 h. The data set samples were divided into training, validation, and test sets in a 4:1:1 ratio. Standard wave height, mean wave direction, and mean wave period were subjected to [–1, 1] maximum-minimum normalization, and inverse normalization was performed before evaluating the prediction results. The models are trained using the training set, hyperparameterized using the validation set, and the prediction results are derived using the test set. The weight matrices and bias vectors of all deep learning models are initialized using normal distributions. All deep learning models are trained using batches, and the maximum training period (epochs) for the ConvGRU-based models is 30, and the optimizer is Adam.
Evaluation metrics and images are used to evaluate the prediction results. Both evaluation metrics and comparison images are used to accurately quantify the strengths and weaknesses of different models’ prediction results while enabling visual comparisons from different perspectives. In order to evaluate the results of a single prediction moment in multi-step forecasting, the average results of all locations in a single prediction moment in regional multi-step forecasting are evaluated using root mean square error
, mean absolute error
, and mean absolute error percentage
, with the three indicators expressed as shown in Equations (10)–(12).
and represent the true and predicted values of wave height at the coordinate, respectively. The closer the and are to 0, the lower the prediction error in meters. The is also closer to 0 the lower the prediction error in %. In order to evaluate the multi-step spatio-temporal prediction effect as a whole, the mean values of the above indicators at prediction time steps are taken and denoted as , , and , respectively.
5. Conclusions
In this paper, we propose a model with ConvGRU as the main body and a multi-input, multioutput, and multi-step prediction strategy for the sea surface wave height multi-step spatio-temporal prediction problem. The model can better capture the global spatial information and map it to the desired multi-location output and can learn different prediction moment samples simultaneously to achieve accurate spatio-temporal prediction. In addition, this paper also improves the ConvGRU model by adding wave direction and wave period exogenous variables to the input using the Leaky ReLU activation function, and these improvements are proven to be effective. The paper presents a novel wave height prediction model based on ConvGRU, which makes significant contributions in several areas. Firstly, the proposed model addresses the challenges of multi-location and multi-step prediction in wave height forecasting, which is not possible with traditional models. Secondly, the model achieves low prediction errors even for long-term predictions, which is important for applications such as marine operations and meteorological hydrological support.
In summary, the paper’s contributions demonstrate the effectiveness of the proposed ConvGRU-based model for wave height prediction in multiple locations and steps, with potential applications in ocean engineering, marine operations, and other related fields.
The limitation of this paper is that it only focuses on predicting waves in regions that are significant for global crude oil transportation routes. In the future, the paper intends to expand the prediction area and duration beyond these regions.