A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies

Zhang, Tao; Lin, Pengfei; Liu, Hailong; Wang, Pengfei; Wang, Ya; Zheng, Weipeng; Yu, Zipeng; Jiang, Jinrong; Li, Yiwen; He, Hailun

doi:10.3390/rs17091507

Open AccessArticle

A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies

by

Tao Zhang

^1,2,

Pengfei Lin

^1,2,*

,

Hailong Liu

³

,

Pengfei Wang

⁴,

Ya Wang

^1,5,

Weipeng Zheng

^1,2,5

,

Zipeng Yu

⁵,

Jinrong Jiang

⁶,

Yiwen Li

⁷

and

Hailun He

⁸

¹

State Key Laboratory of Earth System Numerical Modeling and Application, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

²

College of Earth and Planetary Sciences, University of Chinese Academy of Sciences, Beijing 100049, China

³

Laoshan Laboratory, Qingdao 266237, China

⁴

State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics (LASG), Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

⁵

Earth System Numerical Simulation Science Center, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing 100029, China

⁶

Computer Network Information Center, Chinese Academy of Sciences, Beijing 100190, China

⁷

School of Ocean Sciences, China University of Geosciences, Beijing 100083, China

⁸

State Key Laboratory of Satellite Ocean Environment Dynamics, Second Institute of Oceanography, Ministry of Natural Resources, Hangzhou 310012, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(9), 1507; https://doi.org/10.3390/rs17091507

Submission received: 27 February 2025 / Revised: 12 April 2025 / Accepted: 22 April 2025 / Published: 24 April 2025

(This article belongs to the Section Ocean Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Short-term sea surface temperature (SST) forecasts are crucial for operational oceanology. This study introduces a specialized Transformer model (U-Transformer) to forecast global short-term SST variability and compares its performance with Convolutional Long Short-Term Memory (ConvLSTM) and Residual Neural Network (ResNet) models. The U-Transformer model forecast consistently outperformed the ConvLSTM and ResNet models, especially in regions with active mesoscale eddies. Globally, the U-Transformer model achieved SST root mean square errors (RMSEs) ranging from 0.2 °C at a 1-day lead time to 0.54 °C at a 10-day lead time during 2020–2022, with anomaly correlation coefficients (ACCs) decreasing from 0.97 to 0.79, respectively. However, in regions characterized by active mesoscale eddies, RMSEs from the U-Transformer model exceeded the global averages by at least 40%, with values in the Gulf Stream region reaching more than twice the global average. Additionally, ACC values in active mesoscale eddy regions declined more sharply with forecast lead time compared to the global averages, decreasing from approximately 0.96 at a 1-day lead time to 0.73 at a 10-day lead time. Specifically, the ACC value dropped to 0.89 in the Gulf Stream region at a 3-day lead time, while maintaining 0.92 globally. These findings underscore the importance of advanced approaches to enhance SST forecast accuracy in challenging active mesoscale eddy regions.

Keywords:

global sea surface temperature; mesoscale eddies; deep learning; forecast

1. Introduction

Sea surface temperature (SST) plays a crucial role in air–sea interactions and is a key climate factor of global change. Variation in SST substantially impacts regional climate variability, influencing global precipitation patterns and potentially leading to extreme events such as droughts and floods [1,2,3,4]. Sea surface temperature serves as a critical indicator of marine heatwaves, which significantly impact global marine ecosystems, illustrating the vital role of accurate short-term SST forecasting [5,6,7,8,9]. Short-term SST forecasting is influenced by numerous factors, among which oceanic mesoscale eddies are particularly important. Due to their strong dynamical effects, these eddies can induce extreme short-term SST anomalies. Therefore, understanding and addressing the influence of mesoscale eddies is essential for improving short-term SST forecasting.

In recent years, data-driven deep learning (DL) methods have gained widespread application in ocean and atmospheric sciences, such as eddy identification, downscaling, SST reconstruction, and parameterization of physical processes [10,11,12,13,14,15]. Various types of DL models have been widely explored in SST forecasting. Recurrent neural networks (RNNs), like long short-term memory networks (LSTMs) and gated recurrent units (GRUs), primarily focus on the temporal evolution of SST at individual locations and have been applied to SST forecasting in specific regions [16,17,18,19]. However, these models struggle to capture complex spatial correlations across areas. Convolutional neural networks (CNNs) have been adopted to address this limitation due to their strengths in spatial feature extraction [20]. Residual neural networks (ResNets) further improve model depth by introducing a residual learning framework, effectively mitigating the gradient-vanishing problem in deep networks [21]. Subsequently, various types of CNN architectures and their variants have been applied to short-term SST forecasting [22,23,24,25,26]. A typical network is Convolutional Long Short-Term Memory (ConvLSTM), which combines the advantages of CNN and RNN to effectively extract temporal and spatial information and enhance forecast accuracy [27,28]. These DL models have shown significant progress in improving prediction accuracy and reliability.

With the rapid advancement of DL, the Transformer architecture [29,30] (see Appendix A for details) has emerged as a powerful tool across various fields due to its ability to capture long-range dependencies and model complex spatiotemporal relationships. While it has been successfully applied to SST super-resolution tasks [31], its potential for short-term SST forecasting remains relatively unexplored.

One challenge in short-term SST forecasting involves mesoscale eddies, which are widely distributed throughout the ocean and serve as primary drivers of mesoscale SST variability [32]. Dynamical forecast models often exhibit notable errors in eddy-active regions due to the complex dynamics and temperature structures of these eddies [33,34]. These errors are also pronounced over eddy-active areas in the High-Resolution Ocean Model Intercomparison Project simulations [35,36,37]. However, studies quantifying short-term SST forecast errors, and their spatial distribution in eddy-active regions remain limited [38,39,40].

This study introduces an innovative Transformer-based variant, the U-Transformer model, to improve short-term global SST forecasting. The U-Transformer model is designed to capture spatial and temporal features simultaneously, enabling more accurate multi-step forecasts for the coming days. In this study, we compare the performance of the U-Transformer model with two classic CNN-based model types—ConvLSTM and ResNet—across global areas and regions with active mesoscale eddies. The paper is organized as follows: Section 1 provides an introduction; Section 2 describes the data and methods, Section 3 presents the results, and Section 4 contains the discussion and conclusions.

2. Data and Methods

2.1. Data

The primary dataset used in this study is the NOAA/NESDIS/NCEI Daily Optimum Interpolation Sea Surface Temperature (OISST), version 2.1 [41,42]. This dataset provides global SST observations at a 0.25° spatial resolution and a daily temporal resolution, with coverage extending from 89.975°S to 89.875°N and from 0.125°E to 359.875°E. Data from January 1982 to December 2022 were utilized for model development and evaluation.

OISST v2.1 integrates SST measurements from satellite observations (e.g., AVHRR) and in situ measurements from ships, drifting buoys, and Argo floats. These data sources are blended using an optimum interpolation algorithm, which ensures consistency and accuracy through bias adjustments based on in situ observations. The interpolation leverages spatial autocorrelation and temporal consistency to generate a high-resolution, gridded SST product.

To evaluate the SST forecasts, we employed data from three prominent oceanographic research programs: the Tropical Atmosphere Ocean (TAO) project, the Research Moored Array for African–Asian–Australian Monsoon Analysis and Prediction (RAMA) project, and the Prediction and Research Moored Array in the Tropical Atlantic (PIRATA) project. These programs utilize moored buoys that provide essential real-time measurements of oceanic and atmospheric conditions. All three arrays provide data daily, measuring key parameters such as SST, air temperature, wind stress, 10 m wind speed, and longwave radiation. For this study, data from 2020–2022 are used to evaluate forecast performance. The spatial resolution of these arrays is approximately 2° latitude by 10° longitude, with the TAO array covering the equatorial Pacific, RAMA covering the tropical Indian Ocean, and PIRATA covering the tropical Atlantic Ocean.

This study utilized the Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) SST data from 2021 as an independent test dataset to assess the generalization capability of the model. Using a multi-sensor optimal interpolation scheme that combines satellite and in situ observations, OSTIA provides daily global SST fields at a 0.05° resolution [43]. The OSTIA data were bilinearly interpolated to a 0.25° grid before being used for model evaluation to ensure consistency with the OISST resolution.

2.2. Model

The proposed U-Transformer architecture eliminates convolutional and recursive operations, replacing them with a self-attention mechanism to extract multivariate relationships in parallel, regardless of spatial and temporal distance. The U-Transformer, as shown in Figure 1a, comprises an encoder, decoder, and skip connections [44] and was built on the Swin Transformer module [45]. The Swin Transformer module employs self-attention within nonoverlapping local windows to reduce network complexity and build hierarchies for multiscale feature extraction.

The encoder starts by dividing the input SST field into 4 × 4 non-overlapping patches, each with a feature dimension C. For our global SST data with input dimensions of H = 720, W = 1440, and T = 10 (representing consecutive SST of the past 10 days), these patches are then projected to an arbitrary dimension C (e.g., C = 96) through a linear embedding layer, reducing both the spatiotemporal dimensions and memory usage. The resulting matrix has the dimensions (180, 360, 96), where 180 and 360 represent the height and width dimensions of the input after patching.

These encoded patches pass through a series of Swin Transformer Blocks and a patch merge layer. The patch merge layer reduces the spatial dimensions by half while doubling the feature dimension, enabling hierarchical feature representation. For instance, after the first patch merge layer, the matrix dimensions become (90, 180, 192).

Similarly, the decoder employs Swin Transformer Blocks and a patch expand layer. The patch expand layer upsamples the feature mappings to restore the spatial dimensions progressively. For example, the dimensions change from (90, 180, 192) back to (180, 360, 96). Skip connections from the encoder provide contextual features to the decoder, mitigating spatial information loss. Finally, a linear projection layer converts the output of the decoder into the desired shape (10, 720, 1440) to generate the future SST field forecast.

A key innovation is the shifted window-based multi-head self-attention (SW-MSA) module in the Swin Transformer, which addresses the lack of cross-window connectivity in a standard window-based MSA. The SW-MSA alternates between two partitioning configurations, with each Swin Transformer Block comprising an SW-MSA, a 2-layer multilayer perceptron (MLP) with Gaussian Error Linear Unit activation, Layer Normalization (LN), and residual connections (Figure 1b). This process can be formulated as follows:

{\hat{z}}^{l} = W - MSA (L N (z^{l - 1})) + z^{l - 1}

(1)

z^{l} = M L P (L N ({\hat{z}}^{l})) + {\hat{z}}^{l}

(2)

{\hat{z}}^{l + 1} = S W - M S A (L N (z^{l})) + z^{l}

(3)

z^{l + 1} = M L P (L N ({\hat{z}}^{l + 1})) + {\hat{z}}^{l + 1}

(4)

where

{\hat{z}}^{l}

and

z^{l}

denote the output features of the (S)W-MSA module and the MLP module for block l, respectively. The W-MSA and SW-MSA modules in Equations (1) and (3) both utilize the self-attention mechanism as their core operation. This self-attention is computed as follows:

A t t e n t i o n (Q, K, V) = S o f t M a x (\frac{Q K^{T}}{\sqrt{d}} + B) V

(5)

where

Q, K, V \in R^{M^{2} \times d}

are the query, key, and value matrices, respectively,

d

is the query/key dimension, and

M^{2}

is the number of patches in a window.

2.3. Implementation Details

The SST dataset undergoes a series of preprocessing steps to construct input–output samples and organize them for training, validation, and evaluation. To prepare the data for model training, the SST fields are first normalized. Land grid points with missing values are filled with 0 to ensure numerical stability during training but are excluded from loss computation. Following preprocessing, training samples are generated using a sliding window approach. For each valid time point t, the input consists of SST fields from t–9 to t, and the output consists of fields from t + 1 to t + 10. Samples from 1982–2019 are randomly split into training and validation sets (90% and 10%, respectively), while those from 2020–2022 serve as an independent test set. The validation set is used for hyperparameter tuning and early stopping during training. To support a more detailed evaluation of model performance, a spatial filtering method is applied during testing to separate mesoscale and large-scale SST signals.

The model was trained using a latitude-weighted L2 loss function, which accounts for the area of grid points across different latitudes.

Our weighted L2 loss function can be expressed as:

L 2 = \frac{1}{N} \sum_{i = 1}^{N} \cos (θ_{i}) {(y_{i} - {\hat{y}}_{i})}^{2}

(6)

where

y_{i}

is the true value,

{\hat{y}}_{i}

is the predicted value,

θ_{i}

is the latitude of the corresponding point, and

\cos (θ_{i})

is the latitude-based weight.

To improve training efficiency and model stability, inputs and outputs were normalized using zero-mean normalization. The zero-mean normalization function can be expressed as:

X' = \frac{X - μ}{σ}

(7)

where

X

is the original data,

μ

is the mean of the feature,

σ

is the standard deviation of the feature, and

X'

is the normalized data. The mean and standard deviation values are computed based on the historical dataset from 1982 to 2019.

All models were implemented using the PyTorch version 1.10.0 framework and trained on a cluster of 16 nodes, each with two accelerator cards (16 GB memory). The training process ran for 100 epochs, with a batch size 2 per card. We used the AdamW [46] optimizer due to its effectiveness in stabilizing training in deep learning models. The optimizer parameters, β₁ = 0.9 and β₂ = 0.95, were chosen based on their widespread use in similar tasks, which balances gradient momentum and stability. An initial learning rate of

10^{- 3}

was applied, as it provided a good balance between convergence speed and performance during preliminary tests. Additionally, we used a weight decay of 0.1 to mitigate overfitting, ensuring generalization. All training parameters were kept consistent across models to maintain fairness and comparability.

We compared the proposed U-Transformer model with ConvLSTM and ResNet models, using the same input–output structures and preprocessing across all models. The ConvLSTM architecture was adapted from [47], while the ResNet model was based on ResNet-18 [21]. The parameter details of ConvLSTM and ResNet can be seen in Appendix B. All forecasts were derived from the test set, ensuring a consistent methodology across models.

2.4. Mesoscale Signal Extraction

This study employed the spatial filtering methods of [48,49] to extract the ocean mesoscale signal. A filter box with dimensions of 3° in both longitude and latitude was used to calculate the mean value within the box, which comprised the low-pass filtered value representing the large-scale signal. The difference between the original SST and the low-pass filtered value was then utilized to isolate and reflect the role of the mesoscale signal.

2.5. Evaluation Metrics

We evaluated the forecast performance using the area-weighted RMSE, Bias, and anomaly correlation coefficient (ACC), which were calculated as follows:

R M S E_{t} = \frac{1}{|D|} \sum_{t_{0} \in D} \sqrt{\frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} α_{i} {({\hat{Y}}_{i, j} - Y_{i, j})}^{2}}

(8)

B i a s_{t} = \frac{1}{|D|} \sum_{t_{0} \in D} \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} α_{i} ({\hat{Y}}_{i, j} - Y_{i, j})

(9)

A C C_{t} = \frac{1}{|D|} \sum_{t_{0} \in D} \frac{\frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} α_{i} ({\hat{Y}}_{i, j} - C_{i, j}) (Y_{i, j} - C_{i, j})}{\sqrt{\frac{1}{H \times W} \sum_{i = 1}^{H} {\sum_{j = 1}^{W} α_{i} ({\hat{Y}}_{i, j} - C_{i, j})}^{2}} \sqrt{\frac{1}{H \times W} \sum_{i = 1}^{H} {\sum_{j = 1}^{W} α_{i} (Y_{i, j} - C_{i, j})}^{2}}}

(10)

where

t_{0}

is the forecast initialization time in testing set

D

and

t

is the forecast lead time step added to

t_{0}

;

H a n d W

are the number of time steps and the grid points in the latitude and longitude directions, respectively;

α_{i}

represents the weights of different latitudes;

{\hat{Y}}_{i, j} a n d Y_{j, j}

are the forecast field and the true field at time

t

, respectively; and

C

represents the climatological mean calculated using data from 2000–2010.

3. Results

3.1. Global Verification of Short-Term Forecasts

Figure 2 illustrates the forecast performance comparison of the three DL models across different lead times. The global average RMSEs produced by the U-Transformer model ranges from 0.2 °C on day 1 to 0.54 °C on day 10 during 2020–2022 (Figure 2a), demonstrating consistent forecast accuracy. At the 5-day lead time, the U-Transformer model achieves a global RMSE of 0.42 °C, slightly outperforming the ConvLSTM and ResNet models with RMSEs of 0.43 and 0.44 °C, respectively. Although the RMSEs increase with increasing lead time for all three models, the U-Transformer model consistently maintains the smallest RMSE values for all (from 1- to 10-day) lead times. Additionally, the U-Transformer and ConvLSTM models show smaller spreads in their RMSE distributions compared to the ResNet model from 4- to 10-day lead times, indicating more consistent forecast performance across different initialization times. Previous studies report that dynamic forecast models produce RMSEs for SST forecasts ranging from 0.35–1.1 °C at the 1-day lead time [50,51,52,53]. Compared with the RMSEs of SST forecasted by dynamic models, the U-Transformer model can reduce the global SST forecast errors, achieving a 1-day RMSE of 0.2 °C in the current study. Furthermore, when compared to another deep learning model [27] that reports a 1-day RMSE of 0.27 °C, the U-Transformer model built in this study shows a lower RMSE. These comparisons show that the U-Transformer model outperforms other models in short-term global SST forecasts.

The good ability of the U-Transformer model to forecast short-term global SST is also demonstrated by the small biases and large ACC (Figure 2b,c). Generally, all three models exhibit small biases (<0.1 °C) in global averaged SST, although the biases among the three models differ slightly. The U-Transformer model exhibits a cold bias of 0.03–0.06 °C at 1- to 10-day lead times. The ResNet model shows a relatively stable cold bias of approximately 0.02 °C, whereas the ConvLSTM model transitions from a slight warm bias to a cold bias over the same period but with a larger spread of values. The consistency in the global averaged SST bias sign for all lead times implies that systematic bias exists for the U-Transformer and ResNet models, which requires further study. The U-Transformer model consistently achieves the largest ACC values, starting at approximately 0.97 at a 1-day lead time and gradually decreasing to 0.79 at a 10-day lead time. Meanwhile, the spreads are narrower for the U-Transformer and ConvLSTM models, implying consistently larger ACCs and, thus, enhanced forecast skill.

The above analysis highlights the reasonable ability of the U-Transformer model to forecast short-term SST globally, consistently outperforming both the ConvLSTM and ResNet models in terms of forecast accuracy and skill across all global statistical metrics at 1- to 10-day lead times.

3.2. Validation with Buoy Observations

Statistical analysis was also performed for the tropics and the regions with active mesoscale eddies. In the tropical and subtropical oceans, the RMSEs are relatively low and generally remain below 0.2 °C, except in the eastern equatorial Pacific (Figure 3a), which is characterized by the Tropical Instability Wave (TIW). In this area, RMSEs exceed 0.3 °C, consistent with findings from previous studies [23,54], highlighting the challenges in forecasting daily SST changes in the TIW region. In the Atlantic and Indian oceans, higher RMSEs (calculated using observed data from moored stations) are observed in the northeast, while smaller RMSEs (<0.2 °C) are found in the western Atlantic (Figure 3c). The Indian Ocean exhibits a relatively uniform distribution of RMSEs, with values generally above 0.25 °C (Figure 3e). RMSEs are distributed unevenly in the Pacific Ocean, with higher values in the eastern equatorial region and lower values in the western parts (Figure 3a). Overall, the ACC decreases as the forecast lead time increases, with the U-Transformer model demonstrating the largest correlation forecast in the Pacific and Atlantic regions (Figure 3b,d). On the first forecast day, the ACC is approximately 0.9, with the Atlantic region showing the best correlation performance. Interestingly, the ACC does not exhibit a strictly monotonic decrease as the forecast lead time extends, possibly due to the limited number of observed samples available for evaluation.

Figure 2 and Figure 3 clearly demonstrate that the U-Transformer model significantly reduces forecast errors compared to the ConvLSTM and ResNet models, particularly in the TIW region. The ConvLSTM and ResNet models both struggle to capture these varied regional dynamics possibly due to the use of convolutional networks. In contrast, the self-attention mechanism in the U-Transformer model effectively captures remote dependencies and intricate spatial patterns, making it particularly suited for SST forecasts in such complex regions.

3.3. Spatial Analysis and Forecast Cases Globally

Figure 4 illustrates the spatial RMSE distribution for forecast SST at different lead times (days) in the test set. At the 1-day lead time, the U-Transformer model performs exceptionally well with a global area average RMSE of 0.2 °C (Figure 4a). In the U-Transformer model results, small RMSEs are found mainly in tropical or subtropical ocean areas, whereas large RMSEs are predominantly observed in regions with active mesoscale eddies. The ConvLSTM and ResNet models can also reproduce the observed SST distribution in the 1-day lead time forecast with RMSE values (0.22–0.23 °C) slightly larger than those of the U-Transformer model (Figure 4d,g). As the forecast lead time increases, the forecast error also grows. When forecasting 10 days, the U-Transformer achieves a global average RMSE of 0.54 °C, compared to 0.55 °C for ConvLSTM and 0.58 °C for ResNet (Figure 4c,f,i). Notably, forecast errors are more pronounced in regions with active mesoscale eddies than in other areas. The observed large-scale features of the SST distribution can also be well reproduced on 1 January 2022 at both 1-day and 5-day lead times by the three DL models (Figure 5a,d,g,j), such as warm SST in the tropics, cold SST at high latitudes (Arctic Ocean and Southern Ocean), the Indo-Pacific warm pool, and the cold tongue in the equatorial eastern Pacific (Figure 5b,e,h,k). It is evident that while different models successfully capture the general characteristics and spatial morphology of mesoscale signals, their errors remain significant. Mesoscale process forecast errors account for approximately 70% of the total error, highlighting the complexity of mesoscale activities as a primary contributor to inaccuracies in SST forecasts. Among the evaluated models, the RMSE is consistently around 0.25; however, the U-Transformer model demonstrates superior correlation performance, achieving the largest value of 0.92 (Figure 5c,f,i,l).

In regions with active mesoscale eddies, the RMSEs are large, and the behavior of the forecasted local SST requires investigation. The Kuroshio Extension (KE), Gulf Stream (GS), and the oceans around Southern Africa (OSA) are regions chosen to characterize these areas with active mesoscale eddies. In these regions, all three DL models exhibit large RMSEs compared with the global average, particularly within the black dashed boxes shown in Figure 4 (details as Figure A1), with errors exceeding 0.6 °C for 1-day lead time forecasts. Additionally, the mesoscale pattern correlation coefficients of the forecasted SSTs are notably lower than those for large-scale patterns, highlighting the challenges in forecasting SST changes associated with mesoscale eddies (Figure 5).

3.4. Forecasts Case in Mesoscale Eddy-Active Regions

The forecast performance of a specific day further reflects the ability of SST to forecast in complex ocean regions. The KE was chosen to display the forecast SST evolution in the eddy-active areas. Figure 6 presents the observed OISST SST and daily SST forecast biases in this region from 14 July to 20 July 2022, based on initial conditions from OISST on 14 July 2022. The results demonstrate that the three DL models effectively capture the overall spatial distribution of SST. The U-Transformer model exhibits smaller biases at the 1-day lead time (14 July 2022) than the ConvLSTM and ResNet models. South of 36°N, the absolute SST biases are less than 0.3 °C, while between 36°N and 39°N, biases exceed 0.5 °C in all models at the 1-day lead time. The U-Transformer achieves an RMSE of 0.3 °C, outperforming the ConvLSTM (0.4 °C) and ResNet (0.5 °C) models. The forecast biases increase substantially as the lead times increase. With the lead times of 4 days, biases south of 36°N become comparable to those north of 36°N in the U-Transformer model. The RMSEs of forecast SST at all lead times are smaller using the U-Transformer model than those using the ConvLSTM and ResNet models, with a slight difference compared to the ConvLSTM model and a larger difference compared to the ResNet model. All models exhibit common biases around finer-scale SST features, particularly near extreme local high or low SST values linked to mesoscale and submesoscale eddies. From 1-day to 3-day lead times, the RMSEs increase significantly, from 0.3 to 0.6 °C using the U-Transformer model. Similar evolution behaviors are found in the other two models. This rapid error growth may be attributed to the eddy movement and their nonlinear behaviors. Similar patterns of large forecast SST biases and their evolution are evident in the GS and the OSA regions, as shown in Figure A2 and Figure A3. Therefore, the advanced DL models exhibit better capabilities for forecasting SST in eddy-active regions. Further optimization is required to enhance their accuracy and reliability when addressing complex ocean processes, like eddy-rich regions.

Figure 7 and Figure 8 illustrate the spatial distributions of forecast SST biases for the three DL models, selected based on the 10th percentile (lower RMSE) and 90th percentile (higher RMSE) of sorted RMSE values in ascending order. The error distributions across the three DL models are generally consistent under various forecast initial conditions, reflecting the inherent physical characteristics of SST variations in eddy-active regions. However, notable differences in bias magnitudes are observed among the models. In the KE region, the SST bias from the U-Transformer model is predominantly below 0.2 °C (Figure 7a), which is 10–30% smaller than those produced by the ConvLSTM and ResNet models. In contrast, the GS region exhibits significantly more irregular SST bias structures (Figure 7d–f), resembling features associated with mesoscale eddies. In this region, the RMSEs are 40–50% larger than those in the KE region for the same model. Similarly, eddy-related bias patterns are evident in the OSA (Figure 7g–i) but much more obvious in the ConvLSTM and ResNet models. In this area, the RMSEs of the ConvLSTM and ResNet models are 22–61% larger than those of the U-Transformer. While biases in this region are larger than those in the KE, they remain smaller than those in the GS for the same model. These cross-regional comparisons indicate the challenges posed by active eddies, which can induce significant forecast SST biases due to their complex capture dynamics.

Compared with the smaller RMSE cases (Figure 7), the larger RMSE cases (Figure 8) exhibit larger biases for the same region. Even under these cases, the U-Transformer model demonstrates smaller SST biases than the ConvLSTM and ResNet models, but the differences vary by region and model. In the KE region, the U-Transformer achieves RMSE reductions of 17% and 3% compared to ConvLSTM and ResNet, respectively. In the GS region, the RMSEs for the U-Transformer are 11–14% smaller than those of the other two models. In the OSA, the U-Transformer reduces RMSEs by 7% compared to ConvLSTM and by 20% compared to ResNet. While the U-Transformer consistently outperforms the other models, the relative improvements are smaller in the larger RMSE case. The above results may be related to much more apparent eddy structures in Figure 8 than in Figure 7. These intensified mesoscale eddies significantly influence SST forecasts, highlighting the challenges of accurately capturing such complex dynamics.

3.5. Forecast Comparison Between Eddy-Active Regions and Global Average

A detailed comparison of regional RMSE and ACC demonstrates that substantially larger SST forecast errors occur in three eddy-active regions compared to global averages, underscoring the significant challenges presented by mesoscale eddy activity. From 1-day to 10-day lead times, SST forecast RMSEs increase from 0.28 to 1.2 °C in eddy-active areas while increasing from 0.2 to 0.6 °C globally. Eddy-active regions exhibit RMSEs approximately 40% and 130% larger than the global average at 1-day and 10-day lead times, respectively (Figure 9b,d,f). Among these regions, RMSEs in the KE region increased by 42–60% while the GS region exhibited more substantial increases (>100%) relative to global averages. Furthermore, the ACC values in these three eddy-active regions are consistently smaller than global averages at the same lead time, with ACC values decreasing by approximately 0.24 in eddy-active regions (versus 0.18 globally) from the 1- to 10-day lead time forecast (Figure 2c). Notably, ACC values for all models drop below 0.9 at the 3-day lead time in the GS region, whereas the U-Transformer model maintains an ACC value > 0.9 at the 4-day lead time globally. This steeper decline in forecast skill with increasing lead time reflects the dynamic complexity introduced by mesoscale eddies, making SST forecasting particularly challenging in these regions compared to global forecasting.

Regions such as the KE, GS, and OSA exhibit larger SST forecast errors, primarily due to their distinct dynamic characteristics. These factors make forecasting SST in these regions more challenging than in other oceanic areas. Active mesoscale eddies in these regions play a significant role in SST variability through their movements and nonlinear behaviors. The frequent formation and dissipation of these eddies introduce additional uncertainties, as their small spatial scales (ranging from tens to hundreds of kilometers) often approach the resolution limits of the models [55]. These regions also experience intense air–sea interactions, which are not adequately accounted for by the DL models, leading to substantial forecast discrepancies [56,57]. Additionally, the GS is a high-speed western boundary current, posing unique challenges. Compared to the KE, the GS exhibits stronger mass transport, heat, and salt transport, which can easily lead to flow instabilities [58,59,60]. Moreover, the GS’s stronger current is confined within the narrower Atlantic Ocean Basin than the Pacific Ocean Basin. These combined factors make SST forecasting in the GS region even more difficult than in other eddy-active regions.

Despite these challenges, among all three models, the U-Transformer demonstrates the best forecasting skills across all areas. Table 1 shows the RMSE difference (denoted as the RMSE reduction percentage) of the forecasted SSTs when using the U-Transformer model compared with the SSTs forecasted using the ConvLSTM and ResNet models, particularly in regions with active mesoscale eddies. At a 1-day lead time, the U-Transformer model achieves RMSE reductions of 20.42% and 21.32% in the KE region, 12.19% and 16.82% in the GS region, and 10.41% and 25.79% in the OSA region relative to the ConvLSTM and ResNet models, respectively. Globally, the U-Transformer model reduces the RMSEs by 10.13% compared with the ConvLSTM model and by 11.66% compared with the ResNet model. As the lead time increases, the RMSE reduction percentages also decrease. For example, at the 10-day lead time, compared with the ConvLSTM and ResNet models, the RMSE reductions for the U-Transformer model decreased to 3.73% and 9.08% in the KE region, 4.09% and 8.05% in the GS region, and 0.08% and 6.36% in the OSA region, respectively. This comparison demonstrates that the U-Transformer model consistently outperforms the other two models, not only in terms of forecast SSTs globally but also for forecast SSTs in the regions with active mesoscale eddies.

3.6. Effect of Different Training Periods on Model Performance

This section investigates the impact of different mesoscale SST variability intensities on model forecast performance. The analysis contrasts two distinct periods characterized by different intensities of variability: a period of relatively weak variability (1993–2006) and a period of strong variability (2007–2018) [61].

3.6.1. Evaluation of Different Period Validation Dataset

To evaluate model performance across different time periods, the original 10% validation dataset was divided into two subsets: 529 samples from 1993–2006 and 439 samples from 2007–2018.

Figure 10 displays the forecast RMSEs of all models across two distinct validation periods: 1993–2006 with weaker mesoscale SST variability and 2007–2018 with stronger variability. In both periods, RMSE increases with lead time for all models, with the U-Transformer consistently demonstrating the lowest errors. Additionally, the performance difference between models becomes more pronounced at longer lead times. These results confirm that the U-Transformer maintains robust SST forecasting performance across periods of varying mesoscale SST variability.

3.6.2. Forecast Skill with Different Training Periods

To investigate how different temporal ranges in the training data affect model performance, we conducted experiments using three distinct dataset periods: (1) the complete dataset spanning 1982–2019, (2) the dataset excluding the 1993–2006 period, and (3) the dataset excluding the 2007–2018 period. All models were evaluated on the same 2020–2022 test set.

Figure 11 presents the RMSE values from the three experiments across global and regions with active mesoscale eddies. Firstly, models trained on the full dataset spanning 1982–2019 achieved the smallest RMSEs in both global and regional evaluations (Figure 11). Additionally, excluding the 2007–2018 period leads to significantly higher RMSEs than excluding the 1993–2006 period in almost all models and regions, strongly indicating that the data from 2007–2018 plays a more critical role in model performance. Possible reasons include the stronger mesoscale SST variabilities during 2007–2018 compared to 1993–2006 [61] and the higher similarity of mesoscale SST variabilities during 2007–2018 to those in the forecast years (2020–2022). These results suggest that incorporating both weak mesoscale SST variabilities (from 1993–2006) and strong mesoscale SST variabilities (from 2007–2018) in the training data can substantially enhance the model’s generalization ability.

The effects of different training datasets varied by region and forecast lead time (Figure 11). Globally, the differences among models were relatively small. However, in regions with active mesoscale eddies—such as the Gulf Stream, Kuroshio, and the Southern Ocean—the differences became more pronounced, especially at longer lead times. In these challenging regions and lead times, the U-Transformer consistently achieved the lowest forecast errors compared to ConvLSTM and ResNet, regardless of which training dataset was employed. These results highlight that the U-Transformer consistently outperforms the other two models across different regions.

3.7. Cross-Dataset Evaluation Using OSTIA

To assess model generalizability, models trained on OISST data were evaluated using an independent dataset, OSTIA data from 2021. Figure 12 presents the RMSE values for all three models (U-Transformer, ConvLSTM, and ResNet) across 1- to 10-day lead times for both global averages and three regions characterized by active mesoscale eddies. The U-Transformer consistently demonstrates lower RMSE values than ConvLSTM and ResNet across all regions and lead times, further confirming the enhanced SST forecasting capabilities of the U-Transformer model. Furthermore, all models exhibit significantly increased forecast errors in active-eddy regions compared to global averages, regardless of the evaluation dataset. For instance, in the Gulf Stream region (Figure 12c), RMSE values approximately double the global average across all models and lead times. Prediction difficulties in these regions may be due to complex ocean dynamics.

4. Discussion and Conclusions

This study used the U-Transformer model to forecast global short-term SST and compared its performance with the ConvLSTM and ResNet models. The U-Transformer model consistently outperformed the other two models, achieving the smallest RMSEs globally, ranging from 0.2–0.54 °C for 1- to 10-day lead times, with the largest ACC of 0.97–0.79. Notably, the RMSEs produced by the U-Transformer model at a 1-day lead time were more than 10% smaller than those of the ConvLSTM and ResNet models. In regions with active mesoscale eddies, such as the KE, GS, and OSA, the U-Transformer model also produced smaller RMSEs and higher ACC values. It reduced the RMSEs by 20.42% and 21.32% in the KE region, 12.19% and 16.82% in the GS region, and 10.41% and 25.79% in the OSA region, compared with those of the ConvLSTM and ResNet models, respectively. The good performance of the U-Transformer model implies that the self-attention mechanism can recognize pattern connections and nonlinear temporal relationships (such as those associated with eddies), enhancing short-term SST forecasting skills in regions with active mesoscale eddies. These findings emphasize the importance of selecting appropriate DL models for accurate SST forecasting, particularly in regions dominated by complex mesoscale dynamics.

Although the U-Transformer shows improved performance compared to ConvLSTM and ResNet, its RMSEs remain 40–130% higher than the global average in regions with active mesoscale eddies, highlighting the forecasting challenges in these dynamically complex areas. For example, in the Gulf Stream region, RMSEs exceed twice the global average, reaching up to 1.2 °C, and forecast skill declines more rapidly. These difficulties are mainly due to the intrinsic dynamical features of these regions, including energetic eddy activity, strong nonlinear processes, and complex air–sea interactions, which make short-term SST evolution inherently harder to predict. Future research should focus on improving model performance in these challenging regions.

As Transformer-based models are not previously well-established in short-term SST forecasting, this study focuses on evaluating their applicability to global SST forecasting. Using a single variable, SST, as the input introduces certain limitations. Mesoscale eddy activity is influenced by multiple factors including wind stress, ocean currents, and air–sea fluxes, which are particularly significant in eddy-rich regions where our model demonstrated larger forecast errors. Additionally, different eddy polarities (cyclonic versus anticyclonic) generate distinct air–sea interaction conditions that cannot be adequately captured using SST data alone, potentially explaining the more rapid accuracy deterioration in areas with intense mesoscale activity.

Future work should enhance model performance through several approaches. First, incorporating additional physical variables—such as air–sea fluxes, wind stress, and sea surface height—into the input data would provide a more comprehensive representation of ocean dynamics. Extending the forecast model to process multiple variables simultaneously would enable it to learn the complex interactions between SST and other oceanic and atmospheric fields, thereby better characterizing eddy behavior under diverse environmental conditions. Secondly, physical constraints could be incorporated directly into the model architecture through conservation laws, or indirectly via regularization terms in the loss function. These physically informed constraints would promote consistent predictions and significantly improve model reliability in complex dynamic systems. Furthermore, leveraging multiple satellite products, including AVISO sea-level measurements, CCMP wind data, and MODIS flux observations, would support the robust physical interpretation of eddy-related SST variability. Such multi-source satellite data would facilitate more sophisticated eddy diagnostics, including eddy kinetic energy (EKE) estimations, and substantially advance our understanding of how neural network models represent eddy dynamics and their influence on SST patterns.

Author Contributions

Conceptualization, P.L. and T.Z.; methodology, T.Z.; software, T.Z.; validation, P.L., H.L., P.W. and Y.W.; formal analysis, T.Z.; investigation, T.Z.; resources, P.L.; data curation, T.Z.; writing—original draft preparation, T.Z.; writing—review and editing, P.L., H.L., P.W., Y.W., W.Z., Z.Y., J.J., Y.L. and H.H.; visualization, T.Z.; supervision, P.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 92358302), the Key Program for Developing Basic Sciences (Grant No. 2022YFC3104802), and the Strategic Priority Research Program of the Chinese Academy of Sciences (Grant No. XDB0500303).

Data Availability Statement

All datasets used in this study are publicly available. The OISST dataset is available and can be accessed at https://psl.noaa.gov/thredds/catalog/Datasets/noaa.oisst.v2.highres/catalog.html (accessed on 21 April 2025). The buoy observation datasets can be accessed at https://www.pmel.noaa.gov/tao/drupal/disdel/ (accessed on 21 April 2025). The code for this study was developed using PyTorch. The Swin Transformer code can be found at https://github.com/microsoft/Swin-Transformer (accessed on 21 April 2025). The ConvLSTM code is available at https://github.com/jhhuang96/ConvLSTM-PyTorch (accessed on 21 April 2025). The ResNet code can be found at https://github.com/weiaicunzai/pytorch-cifar100/blob/master/models/resnet.py (accessed on 21 April 2025).

Acknowledgments

We would like to thank the reviewers for their helpful comments. We are grateful for the technical support of the National Large Scientific and Technological Infrastructure “Earth System Numerical Simulation Facility” (https://cstr.cn/31134.02.EL) (accessed on 21 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Overview of Transformers

The Transformer was initially introduced by Vaswani et al., (2017) [29] for natural language processing (NLP). Its core innovation lies in modeling long-range dependencies through a self-attention mechanism. This mechanism calculates the correlation between each input element and all other elements in the sequence, enabling the model to focus on globally significant features. To retain the sequential information of input data, the Transformer incorporates position encoding that introduces relative or absolute positional information. The typical Transformer architecture consists of multiple self-attention layers stacked together, each followed by a feedforward neural network, layer normalization, and residual connections. These components ensure training stability and enhance the model’s expressiveness.

Although initially designed for sequential data, the Transformers’ ability to model long-range dependencies has led to widespread adoption in computer vision and remote sensing fields. For image data processing, Dosovitskiy et al. (2020) [30] introduced a Vision Transformer (ViT) that divides images into fixed-size patches (e.g., 16 × 16 pixels) and treats these patches as input tokens analogous to words in NLP. Each patch is embedded into a vector, and position encoding is added to preserve the spatial structure of the image. These embedded vectors are then processed through a series of Transformer layers, enabling the model to capture global dependencies across all patches. The convolutional neural networks (CNNs) excel at extracting local features, and Transformers demonstrate significant advantages in learning global long-range dependencies.

The introduction of ViT has made Transformers highly adaptable to remote sensing tasks with spatiotemporal attributes, particularly in scenarios requiring capturing large-scale spatiotemporal dependencies. In this study, inspired by ViT, we propose the U-Transformer architecture, which leverages the spatiotemporal characteristics of SST data (time, latitude, and longitude) to model the inherent relationships in SST effectively. This design facilitates more accurate short-term SST forecasts.

Appendix B. Model Architectures

Appendix B.1. ConvLSTM Architecture

The ConvLSTM model uses an encoder–decoder architecture to capture spatial and temporal input data features. The encoder begins with two convolutional layers: the first layer has 16 output channels and uses a kernel size of 3, a stride of 2, and padding of 1. The second convolutional layer also has 16 output channels and applies similar kernel size and padding. These convolutional layers help downsample the input data while preserving key spatial features. Following these layers, two ConvLSTM cells process the data. The first ConvLSTM cell operates on a spatial resolution of 360 × 720, while the second works on a reduced resolution of 180 × 360. Each ConvLSTM cell retains 16 feature maps per location, learning spatial and temporal dependencies across the data.

In the decoder, the model uses deconvolutional layers to restore the input data’s spatial resolution progressively. The first deconvolutional layer has 16 output channels, a kernel size of 4, and a stride of 2, followed by a second deconvolutional layer with 32 output channels. Finally, a 1 × 1 convolutional layer changes the shape of the output. The architecture effectively bridges spatial and temporal dependencies through the ConvLSTM cells, making it suitable for spatiotemporal forecasting tasks.

Appendix B.2. ResNet Architecture

The ResNet model is a deep convolutional neural network adapted for spatiotemporal forecasting. It starts with an initial convolutional layer that processes the input data with 10 channels, applying a convolution operation with 64 output channels, a kernel size of 3, and padding of 1. The output is then passed through a ReLU activation function to introduce non-linearity. The core of ResNet consists of several residual blocks, where each block learns hierarchical features from the input data. The first block extracts features with 64 output channels, followed by blocks that progressively increase the output channels to 128, 256, and 512. These residual blocks utilize skip connections, which allow the model to learn residual mappings, mitigating the vanishing gradient problem and enabling the training of deeper networks.

Following the residual blocks, the network performs a convolution to produce the desired output shape, with additional convolutional layers to refine the predictions. Instead of using global pooling and fully connected layers, the model uses deconvolutional layers to upsample the output back to its original spatial resolution gradually. The first deconvolutional layer has 32 output channels, followed by another layer increasing the output channels to 64. A final deconvolutional layer refines the prediction, bringing the output to the required shape. The ResNet-18 architecture’s use of residual connections helps it effectively capture complex spatial features while maintaining efficient training, making it a suitable model for spatiotemporal forecasting tasks.

Figure A1. Spatial distribution of RMSE for forecast 10 days ahead by the U-Transformer model (a,d,g), ConvLSTM model (b,e,h), and ResNet model (c,f,i) during 2020–2022 in three eddy-active regions. The regional average RMSE values are displayed in the upper-right corner of each panel. Panels (a–c) correspond to the Kuroshio Extension (30–40°N, 140–170°E), panels (d–f) to the Gulf Stream (35–55°N, 40–80°W), and panels (g–i) to the oceans surrounding Southern Africa (35–45°S, 10–45°E).

Figure A2. Comparison of OISST and SST forecasts by three deep learning models in the Gulf Stream region from 17 September 2020 to 23 September 2022. The first row represents OISST, while the second, third, and fourth rows show forecast biases from the U-Transformer, ConvLSTM, and ResNet models.

Figure A3. Comparison of OISST and SST forecasts by three deep learning models in the oceans around Southern Africa region from 23 May 2020 to 29 May 2022. The first row represents OISST, while the second, third, and fourth rows show forecast biases from the U-Transformer, ConvLSTM, and ResNet models.

References

Behera, S.K.; Luo, J.-J.; Masson, S.; Delecluse, P.; Gualdi, S.; Navarra, A.; Yamagata, T. Paramount Impact of the Indian Ocean Dipole on the East African Short Rains: A CGCM Study. J. Clim. 2005, 18, 4514–4530. [Google Scholar] [CrossRef]
Zhou, L.-T.; Tam, C.-Y.; Zhou, W.; Chan, J.C.L. Influence of South China Sea SST and the ENSO on winter rainfall over South China. Adv. Atmos. Sci. 2010, 27, 832–844. [Google Scholar] [CrossRef]
Rauscher, S.A.; Jiang, X.; Steiner, A.; Williams, A.P.; Cai, D.M.; McDowell, N.G. Sea Surface Temperature Warming Patterns and Future Vegetation Change. J. Clim. 2015, 28, 7943–7961. [Google Scholar] [CrossRef]
Salles, R.; Mattos, P.; Iorgulescu, A.-M.D.; Bezerra, E.; Lima, L.; Ogasawara, E. Evaluating temporal aggregation for predicting the sea surface temperature of the Atlantic Ocean. Ecol. Inform. 2016, 36, 94–105. [Google Scholar] [CrossRef]
Cane, M.A.; Clement, A.C.; Kaplan, A.; Kushnir, Y.; Pozdnyakov, D.; Seager, R.; Zebiak, S.E.; Murtugudde, R. Twentieth-Century Sea Surface Temperature Trends. Science 1997, 275, 957–960. [Google Scholar] [CrossRef]
Friedel, M.J. Data-driven modeling of surface temperature anomaly and solar activity trends. Environ. Model. Softw. 2012, 37, 217–232. [Google Scholar] [CrossRef]
Castro, S.L.; Wick, G.A.; Steele, M. Validation of satellite sea surface temperature analyses in the Beaufort Sea using UpTempO buoys. Remote Sens. Environ. 2016, 187, 458–475. [Google Scholar] [CrossRef]
Bouali, M.; Sato, O.T.; Polito, P.S. Temporal trends in sea surface temperature gradients in the South Atlantic Ocean. Remote Sens. Environ. 2017, 194, 100–114. [Google Scholar] [CrossRef]
Chaidez, V.; Dreano, D.; Agusti, S.; Duarte, C.M.; Hoteit, I. Decadal trends in Red Sea maximum surface temperature. Sci. Rep. 2017, 7, 8144. [Google Scholar] [CrossRef]
Su, H.; Wang, A.; Zhang, T.; Qin, T.; Du, X.; Yan, X.-H. Super-resolution of subsurface temperature field from remote sensing observations based on machine learning. Int. J. Appl. Earth Obs. Geoinf. 2021, 102, 102440. [Google Scholar] [CrossRef]
Xu, G.; Xie, W.; Lin, X.; Liu, Y.; Hang, R.; Sun, W.; Liu, D.; Dong, C. Detection of three-dimensional structures of oceanic eddies using artificial intelligence. Ocean. Model. 2024, 190, 102385. [Google Scholar] [CrossRef]
Zhu, Y.; Zhang, R.-H.; Moum, J.N.; Wang, F.; Li, X.; Li, D. Physics-informed deep-learning parameterization of ocean vertical mixing improves climate simulations. Natl. Sci. Rev. 2022, 9, nwac044. [Google Scholar] [CrossRef] [PubMed]
Qi, J.; Xie, B.; Li, D.; Chi, J.; Yin, B.; Sun, G. Estimating thermohaline structures in the tropical Indian Ocean from surface parameters using an improved CNN model. Front. Mar. Sci. 2023, 10, 1181182. [Google Scholar] [CrossRef]
Putra, D.P.; Hsu, P.-C. Leveraging Transfer Learning and U-Nets Method for Improved Gap Filling in Himawari Sea Surface Temperature Data Adjacent to Taiwan. ISPRS Int. J. Geo-Inf. 2024, 13, 162. [Google Scholar] [CrossRef]
Young, C.-C.; Cheng, Y.-C.; Lee, M.-A.; Wu, J.-H. Accurate reconstruction of satellite-derived SST under cloud and cloud-free areas using a physically-informed machine learning approach. Remote Sens. Environ. 2024, 313, 114339. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, H.; Dong, J.; Zhong, G.; Sun, X. Prediction of Sea Surface Temperature Using Long Short-Term Memory. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1745–1749. [Google Scholar] [CrossRef]
Xiao, C.; Chen, N.; Hu, C.; Wang, K.; Xu, Z.; Cai, Y.; Xu, L.; Chen, Z.; Gong, J. A spatiotemporal deep learning model for sea surface temperature field prediction using time-series satellite data. Environ. Model. Softw. 2019, 120, 104502. [Google Scholar] [CrossRef]
Sarkar, P.P.; Janardhan, P.; Roy, P. Prediction of sea surface temperatures using deep learning neural networks. SN Appl. Sci. 2020, 2, 1458. [Google Scholar] [CrossRef]
Jia, X.; Ji, Q.; Han, L.; Liu, Y.; Han, G.; Lin, X. Prediction of Sea Surface Temperature in the East China Sea Based on LSTM Neural Network. Remote Sens. 2022, 14, 3300. [Google Scholar] [CrossRef]
Xiao, C.; Chen, N.; Hu, C.; Wang, K.; Gong, J.; Chen, Z. Short and mid-term sea surface temperature prediction using time-series satellite data and LSTM-AdaBoost combination approach. Remote Sens. Environ. 2019, 233, 111358. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Shi, B.; Hao, Y.; Feng, L.; Ge, C.; Peng, Y.; He, H. An Attention-Based Context Fusion Network for Spatiotemporal Prediction of Sea Surface Temperature. IEEE Geosci. Remote Sens. Lett. 2024, 21, 1504405. [Google Scholar] [CrossRef]
Zheng, G.; Li, X.; Zhang, R.-H.; Liu, B. Purely satellite data–driven deep learning forecast of complicated tropical instability waves. Sci. Adv. 2020, 6, eaba1482. [Google Scholar] [CrossRef] [PubMed]
Shi, B.; Ge, C.; Lin, H.; Xu, Y.; Tan, Q.; Peng, Y.; He, H. Sea Surface Temperature Prediction Using ConvLSTM-Based Model with Deformable Attention. Remote Sens. 2024, 16, 4126. [Google Scholar] [CrossRef]
He, H.L.; Shi, B.Y.; Hao, Y.J.; Feng, L.; Lyu, X.; Ling, Z. Forecasting sea surface temperature during typhoon events in the Bohai Sea using spatiotemporal neural networks. Atmos. Res. 2024, 309, 107578. [Google Scholar] [CrossRef]
Xu, S.; Dai, D.; Cui, X.; Yin, X.; Jiang, S.; Pan, H.; Wang, G. A deep learning approach to predict sea surface temperature based on multiple modes. Ocean. Model. 2023, 181, 102158. [Google Scholar] [CrossRef]
Xu, T.; Zhou, Z.; Li, Y.; Wang, C.; Liu, Y.; Rong, T. Short-Term Prediction of Global Sea Surface Temperature Using Deep Learning Networks. J. Mar. Sci. Eng. 2023, 11, 1352. [Google Scholar] [CrossRef]
Pan, X.; Jiang, T.; Sun, W.; Xie, J.; Wu, P.; Zhang, Z.; Cui, T. Effective attention model for global sea surface temperature prediction. Expert Syst. Appl. 2024, 254, 124411. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS′17), Long Beach, CA, USA, 4–9 December 2017; Curran Associates Inc.: Red Hook, NY, USA, 2017; pp. 6000–6010. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. Available online: https://arxiv.org/abs/2010.11929 (accessed on 21 April 2025).
Zou, R.; Wei, L.; Guan, L. Super Resolution of Satellite-Derived Sea Surface Temperature Using a Transformer-Based Model. Remote Sens. 2023, 15, 5376. [Google Scholar] [CrossRef]
Lv, M.; Wang, F.; Li, Y.; Zhang, Z.; Zhu, Y. Structure of Sea Surface Temperature Anomaly Induced by Mesoscale Eddies in the North Pacific Ocean. J. Geophys. Res. Ocean. 2022, 127, e2021JC017581. [Google Scholar] [CrossRef]
Carneiro, D.M.; King, R.; Martin, M.; Aguiar, A. Short-Range Ocean Forecast Error Characteristics in High Resolution Assimilative Systems; Forecasting Research Technical Report 645; Met Office: Edinburgh, The Scotland, 2021. Available online: https://digital.nmla.metoffice.gov.uk/IO_e084c2c3-dc73-4cf3-acc1-44091ce6ef32 (accessed on 21 April 2025).
Lea, D.J.; While, J.; Martin, M.J.; Weaver, A.; Storto, A.; Chrust, M. A new global ocean ensemble system at the met Office: Assessing the impact of hybrid data assimilation and inflation settings. Q. J. R. Meteorol. Soc. 2022, 148, 1996–2030. [Google Scholar] [CrossRef]
Chassignet, E.P.; Yeager, S.G.; Fox-Kemper, B.; Bozec, A.; Castruccio, F.; Danabasoglu, G.; Horvat, C.; Kim, W.M.; Koldunov, N.; Li, Y.; et al. Impact of horizontal resolution on global ocean–sea ice model simulations based on the experimental protocols of the Ocean Model Intercomparison Project phase 2 (OMIP-2). Geosci. Model Dev. 2020, 13, 4595–4637. [Google Scholar] [CrossRef]
Li, Y.; Liu, H.; Ding, M.; Lin, P.; Yu, Z.; Yu, Y.; Meng, Y.; Li, Y.; Jian, X.; Jiang, J.; et al. Eddy-resolving Simulation of CAS-LICOM3 for Phase 2 of the Ocean Model Intercomparison Project. Adv. Atmos. Sci. 2020, 37, 1067–1080. [Google Scholar] [CrossRef]
Ding, M.; Liu, H.; Lin, P.; Hu, A.; Meng, Y.; Li, Y.; Liu, K. Overestimated eddy kinetic energy in the eddy-rich regions simulated by eddy-resolving global ocean–sea ice models. Geophys. Res. Lett. 2022, 49, e2022GL098370. [Google Scholar] [CrossRef]
Nian, R.; Cai, Y.; Zhang, Z.; He, H.; Wu, J.; Yuan, Q.; Geng, X.; Qian, Y.; Yang, H.; He, B. The Identification and Prediction of Mesoscale Eddy Variation via Memory in Memory With Scheduled Sampling for Sea Level Anomaly. Front. Mar. Sci. 2021, 8, 753942. [Google Scholar] [CrossRef]
Zhu, R.; Song, B.; Qiu, Z.; Tian, Y. A Metadata-Enhanced Deep Learning Method for Sea Surface Height and Mesoscale Eddy Prediction. Remote Sens. 2024, 16, 1466. [Google Scholar] [CrossRef]
Wang, X.; Li, C.; Wang, X.; Tan, L.; Wu, J. Spatio–Temporal Attention-Based Deep Learning Framework for Mesoscale Eddy Trajectory Prediction. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 3853–3867. [Google Scholar] [CrossRef]
Reynolds, R.W.; Smith, T.M.; Liu, C.; Chelton, D.B.; Casey, K.S.; Schlax, M.G. Daily High-Resolution-Blended Analyses for Sea Surface Temperature. J. Clim. 2007, 20, 5473–5496. [Google Scholar] [CrossRef]
Huang, B.; Liu, C.; Banzon, V.; Freeman, E.; Graham, G.; Hankins, B.; Smith, T.; Zhang, H.-M. Improvements of the Daily Optimum Interpolation Sea Surface Temperature (DOISST) Version 2.1. J. Clim. 2021, 34, 2923–2939. [Google Scholar] [CrossRef]
Donlon, C.J.; Martin, M.; Stark, J.; Roberts-Jones, J.; Fiedler, E.; Wimmer, W. The Operational Sea Surface Temperature and Sea Ice Analysis (OSTIA) system. Remote Sens. Environ. 2012, 116, 140–158. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. arXiv 2021, arXiv:2105.05537. Available online: http://arxiv.org/abs/2105.05537 (accessed on 21 April 2025).
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2017, arXiv:1412.6980. Available online: http://arxiv.org/abs/1412.6980 (accessed on 21 April 2025).
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2015), Montreal, QC, Canada, 7–12 December 2015; pp. 802–810. [Google Scholar]
Bryan, F.O.; Tomas, R.; Dennis, J.M.; Chelton, D.B.; Loeb, N.G.; McClean, J.L. Frontal Scale Air–Sea Interaction in High-Resolution Coupled Climate Models. J. Clim. 2010, 23, 6277–6291. [Google Scholar] [CrossRef]
Lin, P.; Liu, H.; Ma, J.; Li, Y. Ocean mesoscale structure–induced air–sea interaction in a high-resolution coupled model. Atmos. Ocean. Sci. Lett. 2019, 12, 98–106. [Google Scholar] [CrossRef]
Zheng, W.; Lin, P.; Liu, H.; Luan, Y.; Ma, J.; Mo, H.; Liu, J. An assessment of the LICOM Forecast System under the IVTT class4 framework. Front. Mar. Sci. 2023, 10, 1112025. [Google Scholar] [CrossRef]
Barbosa Aguiar, A.; Bell, M.J.; Blockley, E.; Calvert, D.; Crocker, R.; Inverarity, G.; King, R.; Lea, D.J.; Maksymczuk, J.; Martin, M.J.; et al. The Met Office Forecast Ocean Assimilation Model (FOAM) using a 1/12-degree grid for global forecasts. Q. J. R. Meteorol. Soc. 2024, 150, 3827–3852. [Google Scholar] [CrossRef]
Blockley, E.W.; Martin, M.J.; McLaren, A.J.; Ryan, A.G.; Waters, J.; Lea, D.J.; Mirouze, I.; Peterson, K.A.; Sellar, A.; Storkey, D. Recent development of the Met Office operational ocean forecasting system: An overview and assessment of the new Global FOAM forecasts. Geosci. Model Dev. 2014, 7, 2613–2638. [Google Scholar] [CrossRef]
Liu, H.; Lin, P.; Zheng, W.; Luan, Y.; Ma, J.; Ding, M.; Mo, H.; Wan, L.; Ling, T. A global eddy-resolving ocean forecast system in China–LICOM Forecast System (LFS). J. Oper. Oceanogr. 2023, 16, 15–27. [Google Scholar] [CrossRef]
Zhang, T.; Lin, P.; Liu, H.; Zheng, W.; Wang, P.; Xu, T.; Li, Y.; Liu, J.; Chen, C. Short-Term Sea Surface Temperature Forecasts for the Equatorial Pacific Based on Long Short-Term Memory Network. Chin. J. Atmos. Sci. 2024, 48, 745–754. (In Chinese) [Google Scholar] [CrossRef]
Fu, L.-L.; Chelton, D.; Le Traon, P.-Y.; Morrow, R. Eddy Dynamics From Satellite Altimetry. Oceanography 2010, 23, 14–25. [Google Scholar] [CrossRef]
Kwon, Y.-O.; Alexander, M.A.; Bond, N.A.; Frankignoul, C.; Nakamura, H.; Qiu, B.; Thompson, L.A. Role of the Gulf Stream and Kuroshio–Oyashio Systems in Large-Scale Atmosphere–Ocean Interaction: A Review. J. Clim. 2010, 23, 3249–3281. [Google Scholar] [CrossRef]
Ni, X.; Zhang, Y.; Wang, W. Hurricane influence on the oceanic eddies in the Gulf Stream region. Nat. Commun. 2025, 16, 583. [Google Scholar] [CrossRef] [PubMed]
Chelton, D.B.; Schlax, M.G.; Samelson, R.M. Global observations of nonlinear mesoscale eddies. Prog. Oceanogr. 2011, 91, 167–216. [Google Scholar] [CrossRef]
Kang, D.; Curchitser, E.N. Gulf Stream eddy characteristics in a high-resolution ocean model. J. Geophys. Res. Ocean. 2013, 118, 4474–4487. [Google Scholar] [CrossRef]
Kang, D.; Curchitser, E.N. Energetics of Eddy–Mean Flow Interactions in the Gulf Stream Region. J. Phys. Oceanogr. 2015, 45, 1103–1120. [Google Scholar] [CrossRef]
Zhu, Y.; Li, Y.; Wang, F.; Lv, M. Weak Mesoscale Variability in the Optimum Interpolation Sea Surface Temperature (OISST)-AVHRR-Only Version 2 Data before 2007. Remote Sens. 2022, 14, 409. [Google Scholar] [CrossRef]

Figure 1. (a) Architecture of the U-Transformer model and (b) two successive Swin Transformer Blocks.

Figure 2. Boxplots comparing the performance of three DL models (U-Transformer, ConvLSTM, and ResNet) in terms of (a) RMSE, (b) Bias, and (c) ACC across lead times of 1 to 10 days during 2020–2022. Each boxplot represents the distribution of globally weighted averaged metrics across different forecast initialization times.

Figure 3. Panels (a,c,e) show the spatial distribution and RMSE of buoy points across various oceans, as forecasted at a 1-day lead time by the U-Transformer. Panels (b,d,f) display the RMSE and ACC of different models at varying lead times, where solid lines represent RMSE and dashed lines represent ACC.

Figure 4. Spatial distribution of RMSEs for 1-day (a,d,g), 5-day (b,e,h), and 10-day (c,f,i) lead times by the U-Transformer model during 2020–2022 (a–c), by the ConvLSTM model (d–f), and by the ResNet model (g–i). Global average RMSE values are displayed in the upper-right corner of each panel. Dashed boxes indicate the locations of selected regions with active mesoscale eddies: the Kuroshio Extension (30–40°N, 140–170°E), Gulf Stream (35–55°N, 40–80°W), and the oceans around Southern Africa (35–45°S, 10–45°E).

Figure 5. Global SST from observation and forecasts from three DL models, at a 1-day lead (1 January 2022) and a 5-day lead (5 January 2022) starting from 1 January 2022. (a–c) OISST; (d–f) U-Transformer; (g–i) ConvLSTM; (j–l) ResNet. RMSEs and pattern correlation coefficients (R) of forecast SST from models and observations in the upper right corner. The first and second columns display the raw SST values, with thin black contour intervals representing 4 °C isotherms and thick black lines denoting 28 °C isotherms. The third column shows the filtered mesoscale signal obtained by subtracting the low-pass filtered SST (3° × 3°) from the raw SST values.

Figure 6. Comparison of OISST and SST forecasts by three deep learning models in the Kuroshio Extension region from 14 July 2022 to 20 July 2022. The first row represents OISST, while the second, third, and fourth rows show forecast biases from the U-Transformer, ConvLSTM, and ResNet models.

Figure 7. Forecast SST from forecast cases from the U-Transformer, ConvLSTM, and ResNet at the 1-day lead time using the same forecast initial value. These cases are selected according to the 10th percentile (smaller RMSE) of the sorted RMSE values by ascending order for the U-Transformer in three eddy-active regions (Kuroshio Extension, Gulf Stream, and the oceans around Southern Africa). The average RMSE values for each area are displayed in the upper-right corner of each panel. Panels (a–c) correspond to forecasts initialized on 19 December 2021; panels (d–f) forecasts initialized on 6 October 2022; and panels (g–i) forecasts initialized on 1 January 2022.

Figure 8. Forecast SST from forecast cases from the U-Transformer, ConvLSTM, and ResNet at the 1-day lead time using the same forecast initial value. These cases are selected according to the 90th percentile (larger RMSE) of the sorted RMSE values by ascending order for the U-Transformer in three eddy-active regions (Kuroshio Extension, Gulf Stream, and the oceans around Southern Africa). The average RMSE values for each area are displayed in the upper-right corner of each panel. Panels (a–c) correspond to forecasts on 30 August 2020; panels (d–f) forecasts on 3 June 2020; and panels (g–i) forecasts on 13 February 2021.

Figure 9. Comparison of RMSEs and ACC values across three regions with active mesoscale eddies for different models at various lead times (a,c,e). Solid lines represent RMSEs and dashed lines represent ACC values. Panels (b,d,f) show the percentage increase in RMSEs within the selected regions (denoted RMSEc) compared with the global average RMSEs (denoted RMSEd), calculated as ((RMSEc − RMSEd)/RMSEd) × 100%.

Figure 10. RMSE comparison of three models (U-Transformer, ConvLSTM, and ResNet) on validation samples from different periods. (a) Performance on validation samples from 1993–2006, showing RMSE values as a function of forecast lead time. (b) Performance on validation samples from 2007–2018.

Figure 11. RMSE comparison of different training data combinations across regions evaluated on the 2020–2022 test set. (a) Global performance. (b) Kuroshio Extension region. (c) Gulf Stream region. (d) Oceans around Southern Africa. Blue markers represent U-Transformer, orange markers represent ConvLSTM, and green markers represent ResNet. Circle markers indicate models trained without 2007–2018 data, square markers indicate models trained without 1993–2006 data, and triangle markers indicate models trained using the complete 1982–2019 dataset.

Figure 12. RMSE comparison of three models (U-Transformer, ConvLSTM, and ResNet) validated against OSTIA SST data from 2021. (a) Global performance, (b) the Kuroshio Extension region, (c) the Gulf Stream region, and (d) oceans around Southern Africa.

Table 1. Percentage reduction in RMSE (

\frac{| R M S E_{a} - R M S E_{b} |}{R M S E_{b}} \times 100 %

) of the U-Transformer model (denoted

R M S E_{a}

) compared with the different models (denoted

R M S E_{b}

) across various regions, including the global region and the three regions with active mesoscale eddies: the Kuroshio Extension (KE), Gulf Stream (GS), and oceans around Southern Africa (OSA).

Table 1. Percentage reduction in RMSE (

\frac{| R M S E_{a} - R M S E_{b} |}{R M S E_{b}} \times 100 %

) of the U-Transformer model (denoted

R M S E_{a}

) compared with the different models (denoted

R M S E_{b}

) across various regions, including the global region and the three regions with active mesoscale eddies: the Kuroshio Extension (KE), Gulf Stream (GS), and oceans around Southern Africa (OSA).

Model		1-Day	2-Day	3-Day	4-Day	5-Day	6-Day	7-Day	8-Day	9-Day	10-Day
Model	Region	1-Day	2-Day	3-Day	4-Day	5-Day	6-Day	7-Day	8-Day	9-Day	10-Day
ConvLSTM	Global	10.13	5.65	4.37	3.67	3.17	2.88	2.76	2.76	2.91	3.15
	KE	20.42	10.60	7.45	5.96	4.80	4.10	3.72	3.51	3.52	3.73
	GS	12.19	5.96	4.13	3.53	3.25	3.15	3.20	3.40	3.70	4.09
	OSA	10.41	4.19	2.51	1.81	1.29	0.87	0.50	0.25	0.12	0.08
ResNet	Global	11.66	6.66	5.75	5.59	5.68	5.94	6.25	6.58	6.95	7.33
	KE	21.32	10.94	8.24	7.41	7.17	7.23	7.52	7.88	8.41	9.08
	GS	16.82	8.03	5.80	5.28	5.37	5.73	6.29	6.90	7.52	8.05
	OSA	25.79	11.47	7.45	6.08	5.59	5.51	5.64	5.87	6.13	6.36

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, T.; Lin, P.; Liu, H.; Wang, P.; Wang, Y.; Zheng, W.; Yu, Z.; Jiang, J.; Li, Y.; He, H. A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies. Remote Sens. 2025, 17, 1507. https://doi.org/10.3390/rs17091507

AMA Style

Zhang T, Lin P, Liu H, Wang P, Wang Y, Zheng W, Yu Z, Jiang J, Li Y, He H. A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies. Remote Sensing. 2025; 17(9):1507. https://doi.org/10.3390/rs17091507

Chicago/Turabian Style

Zhang, Tao, Pengfei Lin, Hailong Liu, Pengfei Wang, Ya Wang, Weipeng Zheng, Zipeng Yu, Jinrong Jiang, Yiwen Li, and Hailun He. 2025. "A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies" Remote Sensing 17, no. 9: 1507. https://doi.org/10.3390/rs17091507

APA Style

Zhang, T., Lin, P., Liu, H., Wang, P., Wang, Y., Zheng, W., Yu, Z., Jiang, J., Li, Y., & He, H. (2025). A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies. Remote Sensing, 17(9), 1507. https://doi.org/10.3390/rs17091507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Transformer Network for Short-Term Global Sea Surface Temperature Forecasting: Importance of Eddies

Abstract

1. Introduction

2. Data and Methods

2.1. Data

2.2. Model

2.3. Implementation Details

2.4. Mesoscale Signal Extraction

2.5. Evaluation Metrics

3. Results

3.1. Global Verification of Short-Term Forecasts

3.2. Validation with Buoy Observations

3.3. Spatial Analysis and Forecast Cases Globally

3.4. Forecasts Case in Mesoscale Eddy-Active Regions

3.5. Forecast Comparison Between Eddy-Active Regions and Global Average

3.6. Effect of Different Training Periods on Model Performance

3.6.1. Evaluation of Different Period Validation Dataset

3.6.2. Forecast Skill with Different Training Periods

3.7. Cross-Dataset Evaluation Using OSTIA

4. Discussion and Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Overview of Transformers

Appendix B. Model Architectures

Appendix B.1. ConvLSTM Architecture

Appendix B.2. ResNet Architecture

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI