MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST

Peng, Han; Li, Wengen; Jin, Chang; Yang, Hanchen; Guan, Jihong

doi:10.3390/atmos14091358

Open AccessArticle

MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST

by

Han Peng

^†,

Wengen Li

^†

,

Chang Jin

,

Hanchen Yang

and

Jihong Guan

^*

Department of Computer Science and Technology, Tongji University, Shanghai 201804, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Atmosphere 2023, 14(9), 1358; https://doi.org/10.3390/atmos14091358

Submission received: 19 July 2023 / Revised: 16 August 2023 / Accepted: 27 August 2023 / Published: 29 August 2023

(This article belongs to the Section Atmospheric Techniques, Instruments, and Modeling)

Download

Browse Figures

Versions Notes

Abstract

:

Sea Surface Temperature (SST) prediction is a hot topic that has received tremendous popularity in recent years. Existing methods for SST prediction usually select one sea area of interest and conduct SST prediction by learning the spatial and temporal dependencies and patterns in historical SST data. However, global SST is a unified system of high regionality, and the SST in different sea areas shows different changing patterns due to the influence of various factors, e.g., geographic location, ocean currents and sea depth. Without a good understanding of such regionality of SST, we cannot quantitatively integrate the regionality information of SST into SST prediction models to make them adaptive to different SST patterns around the world and improve the prediction accuracy. To address this issue, we proposed the Multi-Stage Spatio–Temporal Clustering (MuSTC) method to quantitatively identify sea areas with similar SST patterns. First, MuSTC sequentially learns the representation of long-term SST with a deep temporal encoder and calculates the spatial correlation scores between grid ocean regions with self-attention. Then, MuSTC clusters grid ocean regions based on the original SST data, encoded long-term SST representation and spatial correlation scores, respectively, to obtain the sea areas with similar SST patterns from different perspectives. According to the experiments in three ocean areas, i.e., the North Pacific Ocean (NPO), the South Atlantic Ocean (SAO) and the North Atlantic Ocean (NAO), the clustering results generally match the distribution of ocean currents, which demonstrates the effectiveness of our MuSTC method. In addition, we integrate the clustering results into two representative spatio–temporal prediction models, i.e., Spatio–Temporal Graph Convolutional Networks (STGCN) and Adaptive Graph Convolutional Recurrent Network (AGCRN), to conduct SST prediction. According to the results of experiments, the integration of regionality information leads to the reduction of Root Mean Square Error (RMSE) by 1.95%, 1.39% and 1.28% in NPO, SAO and NAO, respectively, using the STGCN model, and the reduction of RMSE by 4.94%, 0.74% and 1.43% by using the AGCRN model. Such results indicate that the integration of regionality information could notably improve the prediction accuracy of SST.

Keywords:

sea surface temperature; SST prediction; regionality; clustering; deep learning

1. Introduction

Sea Surface Temperature (SST), usually referring to the temperature of the water from 1 mm to 20 m below the sea surface, is closely related to Extreme Hydrological Events (EHEs) [1], extreme rainfall [2], Tropical Cyclones (TCs) [3] and many other ecological and climatic changes. Accurate SST prediction can help us monitor global climate, forecast SST anomalies, such as El Nino phenomenon, and forecast extreme hydrological events such as extreme rainfall and tropical cyclones. Therefore, SST prediction is an important fundamental research problem in the field of marine science. Over the past decade, numerous methods have been proposed for SST prediction, and these methods can be generally divided into physical models, traditional machine learning methods and deep learning models. Physical models use mathematical formulas to model the variations of SST. For example, Peng et al. [4] predict the seasonal SST by using a physical model based on the CMIP6 [5] from the Coupled Model Intercomparison Projects. Traditional machine learning methods analyze the statistical properties of historical SST data to obtain a mapping between historical SST data and the SST to be predicted. For example, the linear regression model [6] and the Support Vector Machine (SVM) model [7] have been used to predict SST. In recent years, deep spatio–temporal prediction models have also been widely used for SST prediction and achieve higher prediction accuracy than physical models and traditional machine methods. For example, Wei et al. [8] used the Multi-Layer Perceptron (MLP) model to predict SST in the South China Sea, and Xiao et al. [9] used the ConvLSTM model to capture the spatial and temporal dependencies of SST by combing the Convolutional Neural Networks (CNN) model [10] and Long Short Term Memory (LSTM) model [11]. Qiao et al. [12] proposed the 3DCNN-LSTM-AT model, combining 3D CNN, LSTM and attention mechanism to predict the SST in the Bohai Sea and the South China Sea, where the attention mechanism could reduce the large error in long-term SST prediction. Zhang et al. designed the Memory Graph Convolutional Network (MGCN) [13], composed of alternating graph convolutional layers and memory layers, to solve the problem that there are no valid SST observation values in land and island regions. In addition, Hou et al. designed the D2CL model [14] that uses the dilated convolutional network and LSTM network to learn the spatial and temporal features of SST at different scales.

Most existing methods for SST prediction focus on one sea area of interest, such as the Indian Ocean [6], tropical Atlantic [7], South China Sea [8,12,14] and East China Sea [9,13,14], to conduct the prediction. However, SST is usually affected by geographic location, ocean currents and sea depth, solar radiation, land climate change, wind direction [15,16], etc., and the SST changing pattern is different in different regions. Figure 1 illustrates the global ocean currents (https://beachapedia.org/File:Ocean_currents.gif (accessed on 12 March 2023)), which greatly affect SST. Obviously, ocean currents have strong regional characteristics, thus making SST also present certain regional characteristics. Moreover, SST correlations across diverse ocean regions are substantial, suggesting a shared predictive model could harness these relationships for improving the prediction accuracy. This approach not only benefits from learning inter-region correlations but also accommodates distinct SST patterns shaped by geographical features. Thus, a wide-area SST prediction method holds promise for both accurate SST prediction and capturing region-specific variations. Therefore, it is an interesting research topic to explore the regionality of SST and apply the results to further learn the dynamics of SST and improve SST prediction. In practice, unfortunately, we do not have such regionality information for global SST. Existing information regarding ocean currents is also only a qualitative representation of the regionality and cannot be directly embedded into the data-driven models for SST-related downstream tasks such as SST prediction, SST anomaly detection and SST causal analysis. In this case, achieving a quantitative analysis of the regionality of SST is of high necessity.

There are already some studies on exploring the regionality of SST, ocean climate and other ocean factors. Kumar et al. [17] introduced K-means [18] to cluster long-term global SST, and analyzed the relationship between clustering results and meteorological indices. Steinbach et al. [19] used the Shared Nearest Neighbor (SNN) clustering algorithm to discover Ocean Climate Indices (OCIs), which are important tools for analyzing the ocean’s impacts on land climate. To automatically identify coastal upwelling from SST data, Nascimento et al. [20] proposed a new clustering algorithm based on the Seed Region Growing (SRG) method. Zahraie et al. [21] used Genetic Algorithm (GA) [22] and the K-means clustering method [18] to cluster SST data and found the most relevant geographical zones for precipitation prediction. Considering the uncertainty of the global SST, Qin et al. [23] used the improved type-2 fuzzy clustering method based on fuzzy theory to cluster SST data. All these methods above use the original SST observations for clustering. However, due to the complex periodicity, disturbing noise and redundant information of SST data, direct clustering based on original SST data cannot well capture the deep spatio–temporal dependencies in SST to extract precise regionality information.

To better uncover the regionality of global SST, we proposed the Multi-Stage Spatio–Temporal Clustering (MuSTC) method to quantitatively identify the sea areas of the similar SST patterns.

2. Study Areas and Datasets

There are already many studies on SST prediction, such as the Convolutional Model [24], LSTM [25], the B-spline interpolation and spatiotemporal attention mechanism [26], the Memory Graph Convolutional Networks (MGCN) [13] and D2CL [14]. Most of these studies aim to predict the daily or weekly mean SST in the future on a large spatial scale. In this work, we only follow the setting of these related studies.

The data used in this work is the Version 2 daily Optimum Interpolation Sea Surface Temperature (OISST V2) analysis (https://www.ncei.noaa.gov/products/optimum-interpolation-sst (accessed on 20 June 2022)) of the National Oceanic and Atmospheric Administration (NOAA). By combining bias-adjusted observations from different platforms such as satellites, ships and buoys, and filling in gaps through interpolation, this dataset offers a complete ocean temperature field on a regular global grid. Among many data sources, the satellite data of the Advanced Very-High Resolution Radiometer (AVHRR) is the main source used in the OISST V2 dataset, and the high spatio–temporal coverage of AVHRR lays the foundation for the data integrity of the dataset. To be specific, this dataset provides global daily SST data with a spatial resolution of 0.25° × 0.25° from 1981 to the present. The spatial coverage is 0.125 E–0.125 W, 89.875 S–89.875 N.

As shown in Figure 2, we set three large study areas, i.e., the North Pacific Ocean (NPO, 150° E–130° W, 40 N–66 N), the South Atlantic Ocean (SAO, 50° W–20° E, 0°–45° S) and the North Atlantic Ocean (NAO, 90° W–0°, 0°–75° N). The SST data from 2002 to 2021 was selected for analysis.

Considering the large number of grid ocean regions of size 0.25° × 0.25° in each study area and the high computation cost, we reduce the spatial resolution to 1° × 1°. This will not greatly affect the results of regionality analysis since the SST of adjacent locations usually does not change much in a small range. The SST of each coarse-grained grid region of size 1° × 1° is obtained by averaging all the SST records of the 16 (4 × 4) grid regions of size 0.25° × 0.25° covered by it. After reducing the resolution and removing the land regions, the datasets of the North Pacific Ocean, South Atlantic Ocean and North Atlantic Ocean contain 1708, 2711 and 5238 grid regions of size 1° × 1°, respectively.

3. Methods

MuSTC first learns the representation of long-term SST with a deep temporal encoder. Then, with the learned representation, MuSTC calculates the spatial correlation scores between grid ocean regions with self-attention. Finally, MuSTC clusters grid ocean regions based on the original SST data, encoded long-term SST representation and spatial correlation scores, respectively, to obtain the sea areas with similar SST patterns from different perspectives. Since there are no explicit targets for training MuSTC, we reconstruct SST data using the outputs of the self-attention and minimize the difference between the reconstructed SST data and the original SST data during the training.

To evaluate the effectiveness of the proposed method, we applied MuSTC in three ocean areas, i.e., the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean, and the clustering results, especially the results based on spatial correlation scores, generally match the distribution of global ocean currents. In addition, we integrate the learned regionality information into two representative spatio–temporal prediction models, and the notable improvement in SST prediction accuracy also indicates that our MuSTC method can truly capture the regionality of SST.

3.1. Problem Statement

To discover the regionality of SST, we aim to cluster the N grid ocean regions in each study area into multiple groups, and each group of regions have similar SST patterns. In other words, with the SST data

S_{I} \in R^{N \times T}

for N grid regions, where T is the length of time (in days), we wish to find a method f that generates clustering labels for each grid region, i.e.,

C = f (S_{I}, C_{n})

(1)

where

C_{n}

is the specified number of clusters and

C \in R^{N}

represents the clustering results of N grid regions.

Table 1 lists the notations that will be frequently used in this work.

3.2. Multi-Stage Spatio–Temporal Clustering Method

Figure 3 shows the structure of the MuSTC method, which consists of a temporal encoder module, a self-attention module, two fully connected layers (FC) and an SST cluster generation module. First, the temporal encoder module encodes the original SST data to obtain the encoded long-term SST representation, and the self-attention module uses this representation to learn the correlations between different grid ocean regions. The temporal encoder module and self-attention module are connected by an FC layer. Then, another FC layer is introduced to obtain the reconstructed SST data with the same shape as the original SST data. Finally, the SST cluster generation module uses the original SST data, encoded SST representation and attention matrix to perform cluster analysis for extracting the regionality information of SST. In addition, we integrate the learned regionality information into SST prediction models to verify the effectiveness of our MuSTC method. The technical details of the modules in MuSTC and the methods for verifying its validity are elaborated below.

3.3. Temporal Encoder Module

To capture the temporal characteristics of each grid ocean region, we proposed a temporal encoder e based on Time2vec [27] to encode the SST data

S_{I} \in R^{N \times T}

, i.e.,

S_{E} = e (S_{I})

(2)

where

S_{E} \in R^{N \times D_{E}}

represents the encoded SST representation and

D_{E}

is the size of the encoded feature dimension.

The temporal encoder can well capture the linear variation trend of SST data and automatically capture the periodicity of different granularities. Figure 4 presents the structure of the temporal encoder, which contains

D_{E}

units.

Concretely, the ith unit is calculated as follows:

S_{E_{i}} = \{\begin{matrix} S_{I} \cdot w_{i} + b_{i}, & i = 1 \\ sin (S_{I} \cdot w_{i} + b_{i}), & 2 \leq i \leq D_{E} \end{matrix}

(3)

where

S_{E_{i}} \in R^{N}

represents the encoded data of the ith unit,

S_{I} \in R^{N \times T}

is the input SST data,

w_{i} \in R^{T}

and

b_{i} \in R^{N}

are learnable parameters.

Concatenating the outputs of all

D_{E}

units together generates the encoded SST representation

S_{E} \in R^{N \times D_{E}}

, i.e.,

S_{E} = S_{E}^{1} ∣ S_{E}^{2} ∣ \dots ∣ S_{E}^{D_{E}}

(4)

where the number of encoded features

D_{E} ≪ T

and T is the length of historical SST data.

From the above formula, it can be seen that a linear transformation unit (i.e., the 1st unit) is used in the temporal encoder module to capture the linear trend in SST data, and the other

D_{E} - 1

sine transformation units (i.e., the 2nd unit to the

D_{E}

th unit) with different

w_{i}

and

b_{i}

are used to capture the different periodic changes of SST data in different granularities and learn the long-term non-linear dependencies in the SST data.

3.4. Self-Attention Module

We designed a self-attention module to learn the underlying correlations between grid ocean regions. Self-attention is commonly used in Natural Language Processing (NLP) models to capture the relationship between words in a sentence and determine the importance of each word [28]. In our MuSTC method, the self-attention module tries to reconstruct SST from the encoded SST representation and decides how the output of each grid region is affected by the other grid regions in terms of attention scores. Then, the attention scores can be regarded as the degree of correlations between grid regions and provide abundant information for subsequent regionality analysis of SST.

Figure 5 illustrates the structure of the self-attention module. First, according to Figure 3, the encoded SST representation

S_{E} \in R^{N \times D_{E}}

(i.e., the output of the temporal encoder module) is processed by an FC layer to generate the input

S_{A I} \in R^{N \times D_{A I}}

of the self-attention module, where

D_{A I}

is the feature dimension. Then, the output

S_{A O} \in R^{N \times D_{A O}}

of the self-attention module is produced by the self-attention algorithm r.

\begin{matrix} S_{A O} = r (S_{A I}) \end{matrix}

(5)

where

D_{A O}

is the feature dimension of the output of the self-attention module.

The self-attention module uses query matrix Q to represent the features that need to be matched, key matrix K to provide the features that can be matched and value matrix V to keep the features of the input

S_{A I}

of the self-attention module. Meanwhile, Q and K are multiplied to obtain attention matrix A, which captures the correlations between grid regions. The calculation process of the self-attention module is formally defined as follows.

\begin{matrix} Q = W_{q} \cdot S_{A I}^{T} \\ K = W_{k} \cdot S_{A I}^{T} \\ V = W_{v} \cdot S_{A I}^{T} \\ A = K^{T} \cdot Q \\ A^{^{'}} = s o f t m a x (A) \\ S_{A O} = {(V \cdot A^{^{'}})}^{T} \end{matrix}

(6)

where

W_{q} \in R^{D_{k} \times D_{A I}}

,

W_{k} \in R^{D_{k} \times D_{A I}}

and

W_{v} \in R^{D_{A O} \times D_{A I}}

are learnable parameters;

Q \in R^{D_{k} \times N}

,

K \in R^{D_{k} \times N}

and

V \in R^{D_{A O} \times N}

are the query matrix, key matrix and value matrix, respectively, and

D_{k}

is the feature dimension of the query matrix and key matrix. Attention matrix A is normalized by a softmax operation and the normalized attention matrix

A^{^{'}} \in R^{N \times N}

will be used for subsequent regionality analysis of SST.

Finally, according to Figure 3, another FC layer transforms the output of the self-attenion module

S_{A O}

into the reconstructed SST data

S_{R} \in R^{N \times T}

, which has the same shape as the original SST data

S_{I}

. MuSTC is trained by minimizing the difference between the original SST data

S_{I}

and the reconstructed SST data

S_{R}

, which will be discussed in Section 3.6.

3.5. SST Cluster Generation Module

To discover the regionality of SST, we aim to cluster the N grid ocean regions into multiple groups, such that each group of regions has similar SST patterns. To this end, we designed an SST cluster generation module to cluster grid ocean regions based on the original SST data

S_{I} \in R^{N \times T}

, the encoded SST representation

S_{E} \in R^{N \times D_{E}}

and the normalized attention matrix

A^{^{'}} \in R^{N \times N}

, respectively, and this generates the cluster labels

C \in R^{N}

for all N grid regions.

Two clustering algorithms, i.e., K-means clustering [18] and Agglomerative clustering [29], are introduced to achieve the clustering operation. K-means clustering and Agglomerative clustering are two different types of clustering algorithms, and using them together can make the analysis results more convincing.

The goal of K-means clustering is to divide the input data samples into the specified

C_{n}

categories, so that the data samples within the same category are as similar as possible, while the data between different categories are as different as possible. K-means first randomly initializes

C_{n}

clustering centers. Then, it calculates the distance between each data sample and each clustering center and assigns each data sample to the nearest cluster. After that, we calculate the clustering center of the new

C_{n}

clusters and repeat the assignment operation until there are no more updates. This is relatively simple to implement and can scale to large datasets with guaranteed convergence.

Agglomerative clustering is a hierarchical clustering algorithm that uses tree structure to gradually aggregate data into specified

C_{n}

categories. Concretely, Agglomerative clustering first sets each data sample to be a separate cluster. Then, it calculates the distance between each pair of clusters and merges the two clusters with the shortest distance to form a new bigger cluster, where the distance between two clusters is defined as the minimum distance between the samples in the two clusters. The merge operation is repeated until there are exactly

C_{n}

clusters left. Agglomerative clustering is suitable for clusters of different shapes and sizes.

3.6. Optimizing the MuSTC

Since there are no explicit targets for MuSTC to conduct model training, we adopt the idea of data reconstruction to optimize MuSTC. Given the original SST data

S_{I}

and the reconstructed SST

S_{R}

, the loss function is:

L o s s = | S_{R} - S_{I} |

(7)

where

| S_{R} - S_{I} |

calculates the mean absolute error (MAE) between

S_{I}

and

S_{R}

.

Obviously, the closer the SST data reconstructed by the self-attention module is to the encoded long-term SST representation to the original SST data, the encoded SST representation and attention matrix contain more comprehensive SST information, and therefore the cluster analysis results with such data are more reflective of the regionality of SST.

3.7. Application of SST Regionality Information

We integrated the learned regionality information into SST prediction models for improving the prediction accuracy, which further verifies the effectiveness of the MuSTC method. We choose two advanced spatio–temporal prediction models, i.e., Spatio–Temporal Graph Convolutional Networks (STGCN) [30] and Adaptive Graph Convolutional Recurrent Network (AGCRN) [31], to conduct the SST prediction.

3.7.1. Spatio–Temporal Graph Convolutional Network

The STGCN model was initially proposed for predicting traffic flow by combining graph convolution and gated temporal convolution. The STGCN model consists of two ST-Conv blocks and one output block. Each ST-Conv block consists of two Gated Temporal Convolution layers sandwiched with a Graph Convolution layer. The Graph Convolution layer is implemented by Graph Convolution and Residual Connection. In this work, the graph structure data is obtained based on the spatial adjacency relation between grid ocean regions. The Gated Temporal Convolution layer is implemented by a Gated Linear Unit (GLU), and uses Casual Convolution. The original output block uses a Gated Temporal Convolution layer and two fully connected layers sandwiched with an activation function to achieve the single step prediction.

In this work, we modified the structure of the output block to incorporate the SST regionality information and enable multi-step SST prediction. The results of the Gated Temporal Convolution layer can be represented by a matrix

S_{P E} \in R^{N \times D_{P}}

, where

D_{P}

is the number of encoded feature dimensions. The STGCN model uses a linear transformation

f c 1

, an activation function (sigmoid is used in our experiments) and another linear transformation

f c 2

for multi-step prediction, and outputs the prediction results

S_{P O} \in R^{T_{P O} \times N}

, where

T_{P O}

is the length of time (in days) for prediction. The formula is as follows:

S_{P O} = f c 2 (s i g m o i d (f c 1 (S_{P E})))

(8)

We use one-hot encoding to encode the clustering results of each grid region to obtain

C_{E} \in R^{N \times C_{n}}

, where

C_{n}

denotes the number of clusters. Then, we concatenate the

S_{P E}

and the one-hot encoding data to obtain

S_{P E}^{^{'}} \in R^{N \times (D_{P} + C_{n})}

:

S_{P E}^{^{'}} = S_{P E} | C_{E}

(9)

Finally, the output of the multi-step prediction is

S_{P O}^{^{'}} \in R^{T_{P O} \times N}

, i.e.,

S_{P O}^{^{'}} = f c 2 (s i g m o i d (f c 1 (S_{P E}^{^{'}})))

(10)

3.7.2. Adaptive Graph Convolutional Recurrent Network

Different from the previous graph convolution models, AGCRN proposes Node Adaptive Parameter Learning to solve the problem that all nodes share parameters and cannot learn the unique patterns of each node. Since each node has a set of special feature transformation patterns, the model is difficult to train and easy to overfit due to the large number of parameters. The authors of the AGCRN model use matrix factorization to reduce the number of parameters. In addition, AGCRN also introduces the idea of Data Adaptive Graph Generation to construct a dynamic graph, and it constantly updates the connection relationship between nodes in a dynamic way.

AGCRN uses the graph convolutions modified in the above way as an encoder to capture the spatio–temporal dependencies in historical sequences, and it then performs a convolution to make multi-step predictions. However, in this work, the spatial dependence of historical sequences consists of encoder learning representation and the SST regionality information. Therefore, we concatenate the two types of representation to perform multi-step SST prediction.

The AGCRN model uses a modified graph convolution module to encode the SST data and obtains the original encoded information matrix

S_{P O E} \in R^{T_{P I} \times N \times D_{P}}

, where

T_{P I}

represents the length of the time series input to the prediction model and

D_{P}

is the number of encoded feature dimensions. Then, AGCRN uses the data at the last time of the encoded information matrix, i.e.,

S_{P E} \in R^{N \times D_{P}}

, to perform a convolution for multi-step prediction and outputs the prediction results

S_{P O} \in R^{T_{P O} \times N}

, i.e.,

S_{P O} = c o n v (S_{P E})

(11)

Similar to the STGCN model, we concatenate the

S_{P E}

and the one-hot encoding SST regionality information to obtain

S_{P E}^{^{'}} \in R^{N \times (D_{P} + C_{n})}

, i.e.,

S_{P E}^{^{'}} = S_{P E} | C_{E}

(12)

Finally, a convolution operation is used to perform the multi-step SST prediction, and the prediction results are

S_{P O}^{^{'}} \in R^{T_{P O} \times N}

, i.e.,

S_{P O}^{^{'}} = c o n v (S_{P E}^{^{'}})

(13)

4. Experiments and Results

4.1. The Settings of Experiments

For learning long-term SST representation and generating spatial correlation scores, we sample historical SST data with a sliding window size of 365 days and a step size of 28 days to generate data samples. In this case, each data sample has 365 consecutive daily SST records. The generated data samples are then divided into training dataset, validation dataset and test dataset, according to the ratio of 3:1:1. The training dataset is used to learn the correlations between grid ocean regions and the validation dataset is used to validate such correlations. The test dataset is used to prove that the MuSTC method can also be applied for unknown data samples and has high generality. In addition, the encoded SST representation and spatial correlation scores for clustering analysis are from the validation dataset. In the experiments, the hyperparameters

D_{E}

,

D_{A I}

,

D_{A O}

and

D_{k}

are set to 16, 32, 16 and 16, respectively. For each hyperparameter, we used a grid search (i.e., enumeration) to try out values of 4, 8, 16 and 32 and found the best combination that balances performance and accuracy based on the experiment results. Meanwhile, the learning rate is set to

0.003

, the batch size is set to 4 and the MAE is selected as the loss function.

For SST cluster generation, the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean are clustered into 4, 5 and 7 clusters, respectively, using K-means clustering and Agglomerative clustering. The number of clusters is initially determined by the number of ocean currents in the area and then fine-tuned according to the quality of the clustering results.

For SST prediction, we also divide the SST data into training set, validation set and test set, with a ratio of 3:1:1, and predict the SST of the next 3 days with the historical SST data of 7 days. The STGCN model uses all the data and the AGCRN model samples a set of data every 10 days, since AGCRN is more complex than STGCN and requires higher cost to conduct the training. In the experiments, the learning rate of the AGCRN model is set to 0.003, the batch size is set to 4, and MAE is selected as the loss function. Meanwhile, the learning rate of the STGCN model is set to 0.001 and the batch size is set to 32.

We use four GeForce RTX 3090 graphics cards for training the MuSTC and SST prediction models.

4.2. Results

4.2.1. Results of Regionality Analysis

Figure 6 and Figure 7 illustrate the clustering results for the North Pacific Ocean using K-means clustering and Agglomerative clustering, respectively. According to the illustration, the clustering results are generally consistent with the flow directions of the Oyashio Cold Current, N. Pacific Current and Alaska Warm Current. Compared with K-means clustering, the boundary of Agglomerative clustering results is smoother.

Figure 8 and Figure 9 illustrate the clustering results for the South Atlantic Ocean using K-means clustering and Agglomerative clustering, respectively. Obviously, the clustering results match the flow directions of the S. Equatorial Current, Brazil Warm Current, Benguela Cold Current and South Atlantic Current. Compared with K-means clustering, the region division of Agglomerative clustering results is more reasonable and there are fewer outliers.

Figure 10 and Figure 11 illustrate the clustering results for the North Atlantic Ocean using K-means clustering and Agglomerative clustering, respectively. According to the illustration, the clustering results are generally consistent with the flow directions of the E. Greenland Cold Current, Norwegian Warm Current, Labrador Cold Current, Gulf Stream Warm Current, N. Atlantic Drift Current, Canary Cold Current and N.Equatorial Current.

According to the clustering results in three sea areas, Agglomerative clustering has better performance than K-means clustering, e.g., smoother boundary of clustering results and fewer misclassified grid regions.

Meanwhile, the clustering results based on long-term SST representation and spatial correlation scores are better than the clustering results based on the original SST data. The clustering results based on the original SST data are basically similar to the division of latitudes. In contrast, the clustering results based on spatial correlation scores can well capture deeper ocean features such as currents. In addition, the clustering results based on spatial correlation scores are smoother between clusters and can well match the ocean currents that the other two data cannot match.

4.2.2. Results of SST Prediction

We integrate the learned regionality information by the MuSTC method into two spatio–temporal prediction models, i.e., STGCN and AGCRN, to conduct SST prediction. Table 2 and Table 3 present the prediction results in three ocean areas. After integrating regionality information, the error of the STGCN model is reduced by 3.14%, 1.61% and 1.80% in the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean, respectively, in terms of MAE. Considering RMSE, the reduction is 1.95%, 1.39% and 1.28% in the three oceans. For the AGCRN model, the MAE error is reduced by 1.63% and 1.07% in the North Pacific Ocean and the North Atlantic Ocean, respectively, and increases slightly by 0.05% in the South Atlantic Ocean. The reduction of RMSE is larger, i.e., 4.94%, 0.74% and 1.43%, in the three oceans, respectively.

According to the results of SST prediction, the integration of regionality information can obviously improve the SST prediction accuracy of both the STGCN and AGCRN models in all three study areas, which indirectly indicates that the MuSTC method can well capture the regionality information of SST. Figure 12 visualizes the prediction results of the AGCRN model.

5. Discussion

According to the results of the experiments, there are obvious regionality characteristics of global SST, and such regionality characteristics are highly related to latitude and ocean currents. We studied three ocean areas, i.e., the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean. In the North Pacific Ocean, the regionality information learned by the MuSTC method successfully captures the geographic information, such as the Aleutian Islands, and the ocean current information, such as the Oyashio Cold Current, the N. Pacific Current and the Alaska Warm Current. As shown in Figure 13, in the South Atlantic Ocean, the learned regionality information correctly matches the Benguela Cold Current. As shown in Figure 14, in the North Atlantic Ocean, the learned regionality information captures the deep influence of the Labrador Cold Current on SST.

In the experiments, we integrated the regionality characteristics into the SST prediction model, which improved the prediction accuracy. The inclusion of regionality features can improve the accuracy of downstream tasks such as SST prediction, SST anomaly detection and chlorophyll concentration prediction. In addition, due to the high similarity of SST in the same cluster region, the biological system in the same region also has a certain correlation, and the results of regionality analysis can benefit biological protection, fishery resource utilization, etc.

6. Conclusions

In this work, we proposed the MuSTC method to achieve quantitative analysis of the regionality of SST. MuSTC consists of a temporal encoder module, a self-attention module and an SST cluster generation module, where the temporal encoder module learns the long-term SST representation, the self-attention module learns the spatial correlation scores between grid regions and the SST cluster generation module performs cluster analysis based on original SST data, encoded long-term SST representation and spatial correlation scores, respectively. According to the results of experiments, the clustering results of the MuSTC method generally match the distribution of the ocean currents, and the clustering results based on spatial correlation scores achieves the best performance.

We also demonstrate the validity of the MuSTC method by integrating the learned regionality information into two spatio–temporal prediction models, i.e., STGCN and AGCRN. For the STGCN, the integration reduces the RMSE by 1.95%, 1.39% and 1.28% in the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean, respectively, and for the AGCRN, the reductions are 4.94%, 0.74% and 1.43%, which indicates that our quantitative analysis of SST regionality is effective.

In fact, Our method is general and could be used to enhance most data-driven SST prediction models. Considering that the original SST prediction models have already achieved a very high accuracy, the further improvement brought by our method should be very useful. In addition, our proposed MuSTC method is not limited to enhancing SST prediction models and is also applicable to the spatio–temporal prediction models of other oceanographic parameters. For example, it can be used to improve data-driven salinity prediction and chlorophyll-a prediction models.

Overall, the contributions of this work are threefold. First, we proposed the MuSTC method, which learns the representation of long-term SST and spatial correlation scores between grid regions with a deep temporal encoder module and a self-attention module, sequentially, and clusters grid ocean regions based on the original SST data, encoded SST representation and spatial correlation scores, respectively, to uncover the quantitative regionality of SST. Second, we integrated the regionality information of SST into the spatio–temporal SST prediction models to enhance the prediction performance. Third, we conducted extensive experiments over multiple datasets, and the results indicate that the learned regionality information of SST matches the distribution of global ocean currents and can be used to improve the accuracy of existing SST prediction models.

In this work, SST regionality analysis and SST prediction are conducted separately, which may hinder the transmission of information between the two tasks. Therefore, in future work we plan to develop a unified model to combine the two tasks and achieve collaborative optimization. In addition, applying the regionality information of SST to help address other ocean issues, e.g., climate analysis and global warming prevention, is also a promising direction.

Author Contributions

Conceptualization, H.P. and W.L.; methodology, H.P., W.L. and H.Y.; validation, H.P.; formal analysis, H.P.; data curation, H.P.; writing—original draft preparation, H.P.; writing—review and editing, H.P., W.L. and C.J.; visualization, H.P.; supervision, J.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China (No. 62202336, No. U1936205), the National Key R&D Program of China (No. 2021YFC3300300) and the Fundamental Research Funds for the Central Universities (No. ZD-21-202101).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data for this experiment were processed from the Version 2 daily Optimum Interpolation Sea Surface Temperature (OISST V2) analysis of the National Oceanic and Atmospheric Administration (NOAA) at https://www.ncei.noaa.gov/products/optimum-interpolation-sst (accessed on 20 June 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

Wijeratne, V.P.; Li, G.; Mehmood, M.S.; Abbas, A. Assessing the Impact of Long-Term ENSO, SST, and IOD Dynamics on Extreme Hydrological Events (EHEs) in the Kelani River Basin (KRB), Sri Lanka. Atmosphere 2022, 14, 79. [Google Scholar] [CrossRef]
Sougué, M.; Merz, B.; Sogbedji, J.M.; Zougmoré, F. Extreme Rainfall in Southern Burkina Faso, West Africa: Trends and Links to Atlantic Sea Surface Temperature. Atmosphere 2023, 14, 284. [Google Scholar] [CrossRef]
Pérez-Alarcón, A.; Fernández-Alvarez, J.C.; Sorí, R.; Nieto, R.; Gimeno, L. The combined effects of SST and the North Atlantic subtropical high-pressure system on the Atlantic basin tropical cyclone interannual variability. Atmosphere 2021, 12, 329. [Google Scholar] [CrossRef]
Peng, W.; Chen, Q.; Zhou, S.; Huang, P. CMIP6 model-based analog forecasting for the seasonal prediction of sea surface temperature in the offshore area of China. Geosci. Lett. 2021, 8, 8. [Google Scholar] [CrossRef]
Eyring, V.; Bony, S.; Meehl, G.A.; Senior, C.A.; Stevens, B.; Stouffer, R.J.; Taylor, K.E. Overview of the Coupled Model Intercomparison Project Phase 6 (CMIP6) experimental design and organization. Geosci. Model Dev. 2016, 9, 1937–1958. [Google Scholar] [CrossRef]
Kug, J.S.; Kang, I.S.; Lee, J.Y.; Jhun, J.G. A statistical approach to Indian Ocean sea surface temperature prediction using a dynamical ENSO prediction. Geophys. Res. Lett. 2004, 31. [Google Scholar] [CrossRef]
Lins, I.D.; Araujo, M.; das Chagas Moura, M.; Silva, M.A.; Droguett, E.L. Prediction of sea surface temperature in the tropical Atlantic by support vector machines. Comput. Stat. Data Anal. 2013, 61, 187–198. [Google Scholar] [CrossRef]
Wei, L.; Guan, L.; Qu, L. Prediction of sea surface temperature in the South China Sea by artificial neural networks. IEEE Geosci. Remote Sens. Lett. 2019, 17, 558–562. [Google Scholar] [CrossRef]
Xiao, C.; Chen, N.; Hu, C.; Wang, K.; Xu, Z.; Cai, Y.; Xu, L.; Chen, Z.; Gong, J. A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data. Environ. Model. Softw. 2019, 120, 104502. [Google Scholar] [CrossRef]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Qiao, B.; Wu, Z.; Tang, Z.; Wu, G. Sea surface temperature prediction approach based on 3D CNN and LSTM with attention mechanism. In Proceedings of the 2022 24th International Conference on Advanced Communication Technology (ICACT), Pyeongchang, Republic of Korea, 13–16 February 2022; pp. 342–347. [Google Scholar]
Zhang, X.; Li, Y.; Frery, A.C.; Ren, P. Sea surface temperature prediction with memory graph convolutional networks. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Hou, S.; Li, W.; Liu, T.; Zhou, S.; Guan, J.; Qin, R.; Wang, Z. D2CL: A dense dilated convolutional LSTM model for sea surface temperature prediction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 12514–12523. [Google Scholar] [CrossRef]
Moros, M.; Emeis, K.; Risebrobakken, B.; Snowball, I.; Kuijpers, A.; McManus, J.; Jansen, E. Sea surface temperatures and ice rafting in the Holocene North Atlantic: Climate influences on northern Europe and Greenland. Quat. Sci. Rev. 2004, 23, 2113–2126. [Google Scholar] [CrossRef]
Hurwitz, M.M.; Newman, P.; Garfinkel, C. On the influence of North Pacific sea surface temperature on the Arctic winter climate. J. Geophys. Res. Atmos. 2012, 117. [Google Scholar] [CrossRef]
Kumar, V.; Steinbach, M.; Tan, P.N.; Klooster, S.; Potter, C.; Torregrosa, A. Mining scientific data: Discovery of patterns in the global climate system. In Joint Statistical Meeting; American Statistical Association: Alexandria, VA, USA, 2001. [Google Scholar]
Ja, H.; Ma, W. Algorithm as 136: A k-means clustering algorithm. Appl. Stat. 1979, 28, 100. [Google Scholar]
Steinbach, M.; Tan, P.N.; Kumar, V.; Potter, C.; Klooster, S.; Torregrosa, A. Data mining for the discovery of ocean climate indices. In Proceedings of the Fifth Workshop on Scientific Data Mining, Arlington, VA, USA, 13 April 2002. [Google Scholar]
Nascimento, S.; Casca, S.; Mirkin, B. A seed expanding cluster algorithm for deriving upwelling areas on sea surface temperature images. Comput. Geosci. 2015, 85, 74–85. [Google Scholar] [CrossRef]
Zahraie, B.; Roozbahani, A. SST clustering for winter precipitation prediction in southeast of Iran: Comparison between modified K-means and genetic algorithm-based clustering methods. Expert Syst. Appl. 2011, 38, 5919–5929. [Google Scholar] [CrossRef]
Forrest, S. Genetic algorithms. ACM Comput. Surv. 1996, 28, 77–80. [Google Scholar] [CrossRef]
Qin, K.; Kong, L.; Liu, Y.; Xiao, Q. Sea surface temperature clustering based on type-2 fuzzy theory. In Proceedings of the 2010 18th International Conference on Geoinformatics, Beijing, China, 18–20 June 2010; pp. 1–5. [Google Scholar]
Zheng, G.; Li, X.; Zhang, R.H.; Liu, B. Purely satellite data–driven deep learning forecast of complicated tropical instability waves. Sci. Adv. 2020, 6, eaba1482. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, H.; Dong, J.; Zhong, G.; Sun, X. Prediction of sea surface temperature using long short-term memory. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1745–1749. [Google Scholar] [CrossRef]
Liu, J.; Jin, B.; Yang, J.; Xu, L. Sea surface temperature prediction using a cubic B-spline interpolation and spatiotemporal attention mechanism. Remote Sens. Lett. 2021, 12, 478–487. [Google Scholar] [CrossRef]
Kazemi, S.M.; Goel, R.; Eghbali, S.; Ramanan, J.; Sahota, J.; Thakur, S.; Wu, S.; Smyth, C.; Poupart, P.; Brubaker, M. Time2vec: Learning a vector representation of time. arXiv 2019, arXiv:1907.05321. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
Day, W.H.; Edelsbrunner, H. Efficient algorithms for agglomerative hierarchical clustering methods. J. Classif. 1984, 1, 7–24. [Google Scholar] [CrossRef]
Yu, B.; Yin, H.; Zhu, Z. Spatio-temporal graph convolutional networks: A deep learning framework for traffic forecasting. In Proceedings of the International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Bai, L.; Yao, L.; Li, C.; Wang, X.; Wang, C. Adaptive graph convolutional recurrent network for traffic forecasting. Adv. Neural Inf. Process. Syst. 2020, 33, 17804–17815. [Google Scholar]

Figure 1. Illustration of global ocean currents, which have strong regional characteristics.

Figure 2. Visualization of the global SST on 31 December 2021 with the range of the three large study areas annotated.

Figure 3. The structure of MuSTC, which consists of a temporal encoder module, a self-attention module, two fully connected layers (FC) and an SST cluster generation module.

Figure 4. The structure of the temporal encoder module, which contains

D_{E}

units.

Figure 4. The structure of the temporal encoder module, which contains

D_{E}

units.

Figure 5. The structure of the self-attention module.

Figure 6. The clustering results for the North Pacific Ocean using K-means clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the North Pacific Ocean. Different colors correspond to different clustering regions.

Figure 7. The clustering results for the North Pacific Ocean using Agglomerative clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the North Pacific Ocean. Different colors correspond to different clustering regions.

Figure 8. The clustering results for the South Atlantic Ocean using K-means clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the South Atlantic Ocean. Different colors correspond to different clustering regions.

Figure 9. The clustering results for the South Atlantic Ocean using Agglomerative clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the South Atlantic Ocean. Different colors correspond to different clustering regions.

Figure 10. The clustering results for the North Atlantic Ocean using K-means clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the North Atlantic Ocean. Different colors correspond to different clustering regions.

Figure 11. The clustering results for the North Atlantic Ocean using Agglomerative clustering: (a) clustering results using original SST data; (b) clustering results using encoded SST representation; (c) clustering results using spatial correlation scores (i.e., attention matrix); (d) diagram of ocean currents in the North Atlantic Ocean. Different colors correspond to different clustering regions.

Figure 12. The predicted SST results of the AGCRN model for the future three days from 28–30 December 2021.

Figure 13. The clustering results around the Benguela Cold Current (red box area) using Agglomerative clustering: (a) clustering results using original SST data; (b) clustering results using spatial correlation scores; (c) diagram of the Benguela Cold Current. Different colors correspond to different clustering regions.

Figure 14. The clustering results around the Labrador Cold Current (red box area) using Agglomerative clustering: (a) clustering results using original SST data; (b) clustering results using spatial correlation scores; (c) diagram of the Labrador Cold Current. Different colors correspond to different clustering regions.

Table 1. Frequently used notations and their meanings.

Notations	Meanings
N	the number of grid regions
$S_{I}$	the input SST data without land points
T	the length of time (in days) for input SST data
$C_{n}$	the specified number of clusters
C	the clustering results
$S_{E}$	the encoded SST representation
$D_{E}$	the size of the encoded feature dimension
$S_{A I}$	the input data of self-attention module
$S_{A O}$	the output data of self-attention module
$S_{R}$	the reconstructed SST data with the same shape as the input data $S_{I}$
$D_{A I}$	the feature dimension of the input data of the self-attention module
$D_{A O}$	the feature dimension of the output data of the self-attention module
A	the attention matrix in the self-attention module
$A^{^{'}}$	the normalized attention matrix
$S_{C}$	the input data in SST Cluster Generation

Table 2. The SST prediction results in the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean using STGCN model.

Dataset	STGCN		STGCN with RI (Ours) ¹
Dataset	MAE	RMSE	MAE	RMSE
North Pacific	0.2799	0.4094	0.2711 (3.14% ↓)²	0.4014 (1.95% ↓)
South Atlantic	0.1990	0.2874	0.1958 (1.61% ↓)	0.2834 (1.39% ↓)
North Atlantic	0.2334	0.3600	0.2292 (1.80% ↓)	0.3554 (1.28% ↓)

¹ STGCN with RI means STGCN with regionality information. ² The percentage values with ↓ symbol indicate the proportion of error reduction after integrating the regionality information.

Table 3. The SST prediction results in the North Pacific Ocean, the South Atlantic Ocean and the North Atlantic Ocean using AGCRN model.

Dataset	AGCRN		AGCRN with RI (Ours) ¹
Dataset	MAE	RMSE	MAE	RMSE
North Pacific	0.3002	0.4475	0.2953 (1.63% ↓)²	0.4254 (4.92% ↓)
South Atlantic	0.1900	0.2827	0.1901 (0.05% ↑) ³	0.2806 (0.74% ↓)
North Atlantic	0.2329	0.3578	0.2304 (1.07% ↓)	0.3527 (1.43% ↓)

¹ AGCRN with RI means AGCRN with regionality information. ² The percentage values with ↓ symbol indicate the proportion of error reduction after integrating the regionality information. ³ The percentage values with ↑ symbol indicate the proportion of error increase after integrating the regionality information.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Peng, H.; Li, W.; Jin, C.; Yang, H.; Guan, J. MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST. Atmosphere 2023, 14, 1358. https://doi.org/10.3390/atmos14091358

AMA Style

Peng H, Li W, Jin C, Yang H, Guan J. MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST. Atmosphere. 2023; 14(9):1358. https://doi.org/10.3390/atmos14091358

Chicago/Turabian Style

Peng, Han, Wengen Li, Chang Jin, Hanchen Yang, and Jihong Guan. 2023. "MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST" Atmosphere 14, no. 9: 1358. https://doi.org/10.3390/atmos14091358

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MuSTC: A Multi-Stage Spatio–Temporal Clustering Method for Uncovering the Regionality of Global SST

Abstract

1. Introduction

2. Study Areas and Datasets

3. Methods

3.1. Problem Statement

3.2. Multi-Stage Spatio–Temporal Clustering Method

3.3. Temporal Encoder Module

3.4. Self-Attention Module

3.5. SST Cluster Generation Module

3.6. Optimizing the MuSTC

3.7. Application of SST Regionality Information

3.7.1. Spatio–Temporal Graph Convolutional Network

3.7.2. Adaptive Graph Convolutional Recurrent Network

4. Experiments and Results

4.1. The Settings of Experiments

4.2. Results

4.2.1. Results of Regionality Analysis

4.2.2. Results of SST Prediction

5. Discussion

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI