Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach

Gao, Bingyan; Liu, Yang; Lu, Peng; Wang, Lei; Liao, Hui

doi:10.3390/w17091263

Open AccessArticle

Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach

by

Bingyan Gao

¹,

Yang Liu

^2,3

,

Peng Lu

³,

Lei Wang

^2,4

and

Hui Liao

^1,*

¹

Department of Physics, College of Sciences, Shihezi University, Shihezi 832000, China

²

School of Mathematical Science, Dalian University of Technology, Dalian 116000, China

³

State Key Laboratory of Coastal and Offshore Engineering, Dalian University of Technology, Dalian 116000, China

⁴

Department of Mathematics, College of Sciences, Shihezi University, Shihezi 832000, China

^*

Author to whom correspondence should be addressed.

Water 2025, 17(9), 1263; https://doi.org/10.3390/w17091263

Submission received: 22 February 2025 / Revised: 15 April 2025 / Accepted: 20 April 2025 / Published: 23 April 2025

(This article belongs to the Special Issue Mathematical, Physical, Chemical, and Biological Methods for Ice and Water Problems)

Download

Browse Figures

Versions Notes

Abstract

:

The thickness of the Arctic sea ice constitutes one of the crucial indicators of global climate change, and while deep learning has shown promise in predicting sea ice thickness (SIT), the field continues to grapple with the challenge of limited data availability. In this study, we introduce a Wasserstein Generative Adversarial Network–Long Short-Term Memory (WGAN-LSTM) model, which leverages the data generation capabilities of WGAN and the temporal prediction strengths of LSTM to perform single-step SIT prediction. During model training, the mean square error (MSE) and a novel comprehensive index, the Distance between Indices of Simulation and Observation (DISO), are used as two metrics of the loss function to compare. To thoroughly assess the model’s performance, we integrate the WGAN-LSTM model with the Monte Carlo (MC) dropout uncertainty estimation method, thereby validating the model’s enhanced generalization capabilities. Experimental results demonstrate that the WGAN-LSTM model, utilizing MSE and DISO as loss functions, improves comprehensive performance by 51.9% and 75.2%, respectively, compared to the traditional LSTM model. Furthermore, the MC estimates of the WGAN-LSTM model align with the distribution of actual observations. These findings indicate that the WGAN-LSTM model effectively captures nonlinear changes and surpasses the traditional LSTM model in prediction accuracy. The demonstrated effectiveness and reliability of the WGAN-LSTM model significantly advance short-term SIT prediction research in the Arctic region, particularly under conditions of data scarcity. Additionally, this model offers an innovative approach for identifying other physical features in the sea ice field based on sparse data.

Keywords:

deep learning; Arctic sea ice; SIT; data augmentation; WGAN-LSTM

1. Introduction

In the context of high levels of greenhouse gas emissions (e.g., CO₂ and CH₄), Arctic sea ice continues to decrease in area, thickness, and volume, a phenomenon that has become an important indicator of global climate change [1]. These variabilities in the Arctic have profound influences on the global climate system [2]. Typically, for instance, loss of sea ice coverage reduces the reflection of sunlight on the surface, leading to more solar radiation being absorbed on the Earth’s surface, further contributing to intensified global warming [3]. The melting of glaciers and ice caps and the thermal expansion of seawater causes global sea levels to rise, threatening low-lying islands and coastal cities [4]. Additionally, it triggers extreme weather events in the mid-latitude regions [5,6,7], such as extreme snowfall in Europe [8] and smog in China [9]. As the Arctic sea ice decreases, Arctic shipping lanes (such as the Northwest Passage and the Northeast Passage) are becoming viable, providing new routing options for the global shipping industry [10]. This shortens the distance of some international routes, reduces transport costs, and improves transport efficiency [11,12]. Accurate single-point SIT data in the Arctic region can provide a real-time dynamic reference for route planning, helping ships avoid thick ice areas or dense floating ice zones and reducing the risk of ship damage and grounding [13]. Meanwhile, refined navigation and ice condition prediction can optimize speed and fuel consumption, reduce transportation costs, and enhance economic benefits. In addition, ice thickness prediction provides a scientific basis for deploying waterway infrastructure (such as ports and icebreaker scheduling) and helps promote the sustainable development of Arctic shipping.

Currently, there are several approaches to quantify the Arctic SIT. Remote sensing technology can determine SIT on a large scale [14,15,16]; however, the accuracy and spatial resolution of the resultant data are limited. Acoustic or electromagnetic induction measurements from underwater robots, submarines, aircraft, or ships cannot discern temporal variations in snow depth (SW) and SIT. The evolution of SIT can be obtained through numerical patterns of thermodynamic and dynamic processes based on known physical principles [17,18,19]. Still, the results are susceptible to substantial spatiotemporal variability due to the influence of physical parameterization schemes. On-site, manual observations and autonomous platform monitoring, such as drillings, ablation piles, and sea ice mass balance buoys (SIMBAs), are the most accurate methods for quantifying SW and ice thickness [20,21]. Collecting Arctic SIT data is resource intensive, time consuming, and in some cases, nearly impossible. In addition, the danger of offshore operations during sea ice melting intensified the difficulty of obtaining data. In 2019–2020, the Multidisciplinary drifting Observatory for the Study of Arctic Climate (MOSAiC) expedition gathered extensive, year-round data on Arctic sea ice and its connections with the atmosphere and ocean [22,23].

Deep learning (DL) [24] has been increasingly applied in sea ice prediction studies in recent years because of its inherent nonlinear fitting ability. For example, Chi and Liu et al. indicate that DL-based prediction models can be successfully utilized to fit, long term, the Arctic sea ice concentration (SIC) datasets and forecast monthly SICs throughout a year, although the prediction quality has slightly deteriorated due to accelerated sea ice melting rates in summer months [25,26]. Although many studies have validated the applicability of deep learning to predict Arctic SIC [27,28], research on predicting sea ice thickness (SIT) is still extremely limited. Only Song et al. [29] have systematically conducted relevant research in the public literature. The research of Song et al. verifies that DL models based on Concurrent Long Short-Term Memory (LSTM) [30] and fully convolutional U-net algorithms can provide monthly forecasts of SIT without considering complex physical processes. DL models demonstrate applicability in dealing with sea ice multi-scale spatiotemporal series forecasts, among which LSTM [31] has great potential for future application in the field of Arctic sea ice due to its significant advantage in handling complex series data. Nevertheless, the principal challenge remains the relationship between performance and the large amounts of data required.

Facing small-scale and imbalanced data, numerous studies have adopted data augmentation techniques to generate higher-quality samples, especially DL-based Generative Adversarial Networks (GANs) and Wasserstein GANs (WGANs) [32,33]. Currently, combining GANs (and their variants) with LSTM dealing with time series to process small numbers of complex time series has a very high potential, and the method has been used in areas such as satellite image prediction [34], intent recognition in collaborative tasks [35], and automatic recognition of human activities in aerial videos captured by unmanned aerial vehicles [36]. These studies have demonstrated that the use of hybrid data in network training can improve the accuracy and generality of prediction or classification in these domains compared to using only real training data.

In this study, we propose an innovative hybrid approach model for hindcasting the Arctic SIT, which uses as inputs the Arctic SIT and SW obtained by SIMBA analysis during MOSAiC and atmospheric forcing data from the European Centre for Medium-Range Weather Forecasts Reanalysis version 5 (ERA5) [37] for the same period. The core of this method lies in utilizing the powerful generation capability of WGAN to expand the feature set and processing complex time series through LSTM. We adopted a novel loss function for the model and conducted uncertainty estimation on the model. This study not only provides a new method for short-term prediction of the Arctic SIT but also provides innovative ideas for other sparse data-based physical feature identification studies.

The rest of the paper is organized as follows: Section 2 describes the composition of our model and evaluation methods; Section 3 describes the data sources and experimental setup; Section 4 describes the results of the experiments and the various evaluation results; and Section 5 presents the conclusions and perspectives.

2. Data and Methods

In this study, the primary focus is on proposing an innovative hybrid approach model for hindcasting the Arctic SIT and evaluating its accuracy and applicability. The dataset, model, and assessment methodology are described in detail below.

2.1. Dataset Description

In this study, the part of the Arctic drift sea ice during the MOSAiC survey was used as the research area to verify the proposed method’s performance. Information on the study area and data is described below.

2.1.1. The SIT Dataset

The SIMBA used for on-site observation is a temperature chain autonomous ice mass balance buoy developed by SRSL, a Scottish Marine Science Society (SAMS) subsidiary. As shown in Figure 1, the float temperature chain has a total length of 4.8 m and is equipped with 241 digital chip temperature sensors with a spacing of 0.02 m. By carrying GPS and Iridium modules, the buoy can accurately monitor the temperature of ice and snow environments (SIMBA-ET), track the trajectory of sea ice movement, and regularly transmit data to a remote monitoring center. In addition, the SIMBA buoy integrates heating elements (heating resistors) to perform weak current heating on each sensor and the surrounding environment to determine the temperature change around the thermistor after heating (SIMBA-HT). As shown in Figure 2, during the MOSAiC survey, many SIMBA buoys were deployed near the observation station, with an overall observation period from October 2019 to August 2020 [38]. The SIMBA-ET sampling interval for all buoys is 6 h, and the SIMBA-HT sampling interval is 24 h.

2.1.2. Atmospheric Forcing Data

ERA5 is the fifth-generation atmospheric reanalysis dataset developed by the European Centre for Medium-Range Weather Forecasts (ECMWF), covering the period from 1950 to the present with a time and spatial resolution of 1 h and 0.25° × 0.25°, respectively. It provides globally consistent meteorological and climate data covering the atmosphere, oceans, and land domains. This article matches the GPS positioning of buoys during the MOSAiC survey to the nearest point in the ERA5 data grid. It uses bilinear interpolation to extract atmospheric forcing field variables at the corresponding time.

As shown in Table 1, the following five atmospheric forcings obtained from ERA5 are part of the input network’s features.

2.1.3. Data Preprocessing

We upgraded the 6 h resolution SIT and SW data extracted by SIMBA buoys to 1 h resolution through linear interpolation, aligning them with ERA5 atmospheric forcing data (1 h resolution). Subsequently, sliding window technology is used to extract features from the interpolated data, with a window size of 24 h and a step size of 1 h. Next, this study selected the relevant data obtained from buoy 2020T74 (5 April to 26 July 2020) as a typical sample, during which the Arctic sea ice showed a seasonal transitional characteristic of slow growth followed by melting. The data are divided into training set, validation set, and testing set in an 8:1:1 ratio, while the remaining SIMBA data are used entirely to validate the model’s generalization ability.

2.2. Model

2.2.1. Wasserstein Generative Adversarial Network

The recognition of DL models for sea ice prediction has shown promising improvements. However, these models require large amounts of high-quality, long time series historical data. DL models typically have a large number of parameters, making the models susceptible to overfitting if the amount of training data is insufficient or the quality of the data is poor [39]. Overfitted models perform well on the training set but make poor predictions on unseen data and have insufficient generalization [40].

GAN is a DL model proposed by Ian Goodfellow et al., which consists of two competing neural networks: generator and discriminator [32]. The generator produces synthetic data from a series of random noises, while the discriminator compares synthetic data with accurate data to determine whether the synthetic data are authentic. The loss function of GAN can be expressed as an equation related to Jensen–Shannon (JS) divergence. The JS divergence does not decrease as the two distributions approach each other, unless they completely overlap. Instead, it remains constant at log2 with high probability. This behavior poses a significant challenge for GAN training: when the generated distribution gets closer to the real distribution but does not fully overlap, the JS divergence fails to provide meaningful gradient signals. As a result, the original GAN algorithm often suffers from gradient vanishing, convergence issues, and instability during training. To address these limitations, WGAN introduces the Wasserstein distance as an alternative metric for measuring the distance between distributions:

W (p_{d a t a} ∥ p_{g}) = \inf_{γ \in \prod S (p_{d a t a}, p_{g})} E_{(x, y)} [∥ x - y ∥]

(1)

where inf refers to the maximum lower bound, and S (p_data, p_g) denotes all possible joint distributions in the distribution p_data and p_g. When the difference between two probability distributions is slight or does not overlap completely, compared to JS divergence, Wasserstein distance can still provide meaningful gradient information and smoother results for parameter updates in gradient descent methods. The WGAN algorithm has proven to be highly effective in generating accurate data; it can improve the stability of learning and get rid of problems such as pattern collapse.

2.2.2. Long Short-Term Memory Network

LSTM is a kind of Recurrent Neural Network (RNN) [41] that has proven very powerful at processing sequence data. It can solve standard RNNs’ gradient disappearance and gradient explosion problems.

The three basic “gates” that make up the LSTM unit are the Input Gate, the Forget Gate, and the Output Gate. As shown in Figure 3, each gate of the LSTM cell is a neural network layer of a sigmoid activation function that controls the flow of information. The direction of the black arrow is the forward calculation process, and the red arrow’s direction carries out the error’s backpropagation. First, we must determine the network structure and loss function of LSTM. Second, each parameter should be initialized, and then, the model’s accuracy calculated through the loss function. The criterion for updating the weights and bias terms is to minimize the loss of the specified objective function in the training samples. We must update the parameters by choosing the corresponding optimization algorithm if networks fail to reach the required accuracy. In this cycle, when the gradient reaches the required accuracy, we can determine the model’s parameters, complete the LSTM model’s modeling, and apply it to prediction or classification.

The computational expression within LSTM is as follows:

F_{t} = σ (W_{X F} \times X_{t} + W_{H F} \times H_{t - 1} + b_{F})

(2)

I_{t} = σ (W_{X I} \times X_{t} + W_{H I} \times H_{t - 1} + b_{I})

(3)

O_{t} = σ (W_{X O} \times X_{t} + W_{H O} \times H_{t - 1} + b_{O})

(4)

C_{t} = F_{t} \times C_{t - 1} + I_{t} \times \tanh (W_{X C} \times X_{t} + W_{H C} \times H_{t - 1} + b_{C})

(5)

H_{t} = O_{t} \times \tanh (C_{t})

(6)

where F_t, I_t, O_t, C_t, and H_t denote the outputs of the forgetting gate, input gate, output gate, cell state, and hidden layer, respectively; W and b denote the weight vector and bias vector, respectively; and tanh and σ denote the Tanh and Sigmoid functions for activating variables, respectively.

2.2.3. A New Loss Function: DISO

The mean square error (MSE) loss function is the default loss function for regression problems, defined as:

L o s s_{M S E} = \frac{1}{n} \sum_{i = 1}^{n} (y_{i} - {\hat{y}}_{i})^{2}

(7)

Mathematically, if the distribution of the target variable is Gaussian, it is the preferred loss function under the framework of maximum likelihood inference. Since the MSE squares the error, large outliers can have a significant impact on the overall error, resulting in large MSE values. If outliers are present in the data, it may be necessary to preprocess them or use another more robust error metric such as mean absolute error (MAE). Therefore it is important to choose a suitable statistical indicator as a loss function.

As opposed to a single statistical indicator, Hu et al. developed a new comprehensive index Distance between Indices of Simulation and Observation (DISO) [42], which quantitatively describes the overall performance of different models on the observed field. DISO combines different statistical measures, including correlation coefficient (R), MAE, and root mean square error (RMSE). In our case, the DISO loss function is defined as:

L o s s_{D I S O} = \sqrt{{(R - 1)}^{2} + M A E^{2} + R M S E^{2}}

(8)

Overall, with the strong physical characteristic of the distance in three-dimensional spaces and the strict mathematical proof, this multidimensional evaluation capability can help the model optimize multiple objectives simultaneously during the training process and effectively solve the problem of conflicting evaluation results of different statistical indicators. Therefore, DISO, as a new comprehensive loss function for neural networks, will have great potential.

2.2.4. WGAN-LSTM

The WGAN-LSTM model combines the high-quality generative ability of WGAN with the ability of LSTM to process time series and learn long-term dependencies. As shown in Figure 4, the training process of this model starts with real datasets that contain multiple features, such as atmospheric forcing features, and key variables, such as SIT. After preprocessing the real dataset, it is divided into a training set, validation set, and test set. Based on the training set data, the WGAN network generates fake data that are relatively consistent with its distribution and splices them with the real training data to jointly train the LSTM. This hybrid dataset is used as the input to the LSTM network, which is used to train the model to recognize and learn long-term dependencies in the time series. After two LSTM networks, the model further processes and integrates features through a fully connected hidden layer to finally obtain the output.

We respectively employed two loss functions during training to optimize the LSTM network. First, we used the mean square error (MSE), a commonly used loss function, to measure the difference between the model’s hindcast values and the observations. Secondly, we introduced a new comprehensive index DISO. Despite careful consideration of design and selection issues (e.g., hyperparameter tuning), the black-box nature of neural network architectures remains challenging. This black-box nature means that the internal working mechanism of the model is opaque and susceptible to small perturbations in the network weights, leading to inconsistent model test results.

In this study, we compared and qualitatively analyzed three models: LSTM, WGAN-LSTM (using MSE as the loss function on the LSTM section), and WGAN-LSTM-DISO (using DISO as the loss function on the LSTM section).

2.3. Performance Evaluation

The purpose of this study is to improve the accuracy and reliability of hindcast SIT. Thus, the performance of the proposed model needs to be evaluated considering two aspects. On the one hand, the accuracy of all models is evaluated based on the observed SIT at the same time. On the other hand, uncertainty analysis of the model will be performed to evaluate its reliability.

2.3.1. Accuracy Evaluation of SIT Hindcast

The statistical metrics are applied to quantitatively describe the accuracies of the models from different perspectives, such as the correlation coefficient (R) in measuring the strength and direction of the linear association between the simulated time series and the observed time series, mean absolute error (MAE) in measuring any persistent bias (underestimation with positive value and overestimation with negative value) for the observed time series, and RMSE in quantifying the averaged magnitude of the deviation [43].

In order to comprehensively assess the performance of the model, we use a variety of statistical metrics to measure the hindcast ability, including the center root mean square error (RMSD), MAE, R, and comprehensive index DISO. These metrics can reflect the hindcast accuracy and stability of the model from different perspectives. Their expressions are as follows:

R M S D = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} [(T_{O, i} - {\bar{T}}_{O}) - (T_{P, i} - {\bar{T}}_{P})]^{2}}

(9)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | T_{O, i} - T_{P, i} |

(10)

R = \frac{\sum_{i = 1}^{n} (T_{O, i} - {\bar{T}}_{O}) (T_{P, i} - {\bar{T}}_{P})}{\sqrt{\sum_{i = 1}^{n} {(T_{O, i} - {\bar{T}}_{O})}^{2}} \sqrt{\sum_{i = 0}^{n} (T_{P, i} - {\bar{T}}_{P})^{2}}}

(11)

D I S O = \sqrt{{(R - 1)}^{2} + M A E^{2} + \frac{\sum_{i = 1}^{n} (T_{O, i} - T_{P, i})^{2}}{n}}

(12)

where T_Oand T_P denote the observed and forecast SIT,

{\bar{T}}_{O}

and

{\bar{T}}_{P}

denote the mean value of the observed and forecast SIT, and n denotes the forecast SIT sequence length.

2.3.2. Uncertainty Quantification

Model uncertainty is caused by uncertainty in the model structure and parameters, which reflects the model’s fit to the data and generalization ability. Bayesian methods provide a robust framework for uncertainty analysis, especially in DL, and the Bayesian neural network (BNN) [44] is one of the main tools used to implement this framework. In Bayesian neural networks, the weights and biases of the model are treated as random variables rather than fixed values. This approach allows the model to learn the probability distribution of the parameters during the training process, thus being able to quantify the uncertainty of the model.

Monte Carlo dropout (MC dropout) is an uncertainty estimation technique for DL that provides an effective means to visualize and quantify model uncertainty, which is crucial for improving model reliability and interpretability [45]. It is based on the dropout regularization method, which heavily discards a portion of the neurons during the training process, reducing the model’s over-reliance on specific samples from the training data, thus reducing the risk of overfitting [46]. Dropout (and its variants) activation is still maintained during the neural networks testing phase to simulate the Bayesian distribution of the model parameters, which can be interpreted as a Bayesian approximation of the well-known probabilistic model: the Gaussian process [47].

In our problem, we apply dropout multiple times during the forward propagation of the model and compute multiple outputs to generate Monte Carlo samples. These whole samples are then used to quantify uncertainty by counting the mean and variance of the test results.

3. Experiment Settings

Our proposed model is developed in Python 3.7 using PyCharm 2023. 3. 4, and the torch 1. 13. 1 package is used on the backend. When using WGAN for data augmentation, the input features are atmospheric forcing data, SW, historical SIT, and the future predicted time point SIT, whose shape is 24 × 8. In the experiment, we generated an additional 5000 enhanced samples and concatenated them with real data to serve as the training data for LSTM. The remaining data from 2020T74 are used as the validation and test sets. The generators and discriminators of the WGAN model and the hyperparameters used in the LSTM model are shown in Table 2.

For model uncertainty estimation, we implemented dropout for the LSTM layer with a dropout rate of 0.3. The dropout rate for the fully connected layer after implementing the two-layer LSTM was chosen to be 0.5. In this experiment, we generated 100 MC samples for model uncertainty estimation by placing the data of each test set into the well-trained network model multiple times.

4. Results and Qualitative Analysis

4.1. Generated Features

We selected three moments during the WGAN training process, namely the 200th, the 500th, and the 1000th iterations, to evaluate the evolution of WGAN performance. We randomly select the same time point from the 5000 samples generated by WGAN, and pick out the generated features and corresponding generated ice thickness at that time point. Figure 5 shows the comparison between the scatter density maps of generated features and ice thickness with actual observations, and the loss function curve of the WGAN network, where the color scales from blue to red represent the increase in data point density.

We can see that as the number of iterations increases, the performance of the WGAN improves. As shown in Figure 5a, when the number of iterations reaches 200, the WGAN roughly captures the primary distribution of the observed features, although there are still some significant differences compared to the actual distribution. The area with a bluish color is relatively extensive, indicating that most of the data with lower density have been generated, in contrast to the high-density area of the observed data. However, as shown in Figure 5b, after 500 iterations, the color distribution of the scatter plot is more concentrated in the darker area, and the location of the high-density area is closer to the observed results. As the number of iterations increases, as shown in Figure 5c, the generated feature distribution is generally closer to the observed features and is better represented in some details. These changes indicate that the WGAN has gradually learned and improved its ability to simulate the observed distribution during training. It is worth noting that although the overall trend is close to the actual distribution, there are still some data points that deviate from the observed distribution. This phenomenon actually reflects the WGAN’s ability to generalize, which can generate values that have not appeared in the observed samples. Figure 5d shows the loss curves of the WGAN model’s generator and discriminator. It can be observed that the loss curves of the generator and discriminator start to stabilize and approach zero after about 800 iterations. The loss curve indicates that our model has achieved this balance and converged to a stable point, indicating that the model has received appropriate training. Combined with the visualization shown in Figure 5a–c, we can conclude that WGAN can effectively generate high-quality samples.

4.2. Model Hindcast Results

After network training, we use test set data for SIT hindcast. Figure 6 shows the observed SIT values and the SIT values exported by each model from 11:00 on 11 July 2020, to 11:00 on 26 July 2020. The SIT values exported by each model are the average values based on MC samples. The analysis results show that the three models’ hindcast trends are consistent with the observed data, but the test deviation gradually increases over time. The main reason for this phenomenon is that most of our training data are in the ice growth period. As the melting degree increases, the model’s overestimating ice thickness becomes more apparent. It is worth noting that the test results of LSTM and WGAN-LSTM on 23 July deviated from the observed data, showing an increase.

As shown in Figure 7, the tcc values of the training and validation sets (mostly during the ice growth period) are lower than those of the test set (during the ice melt period). Therefore, although the WGAN generated some training data, the network model still tended to learn the non-physical phenomenon of increased ice thickness when cloud cover decreased. The sudden decrease in the tcc value on 22 July 2020 caused the model to mistakenly believe that the ice thickness had increased, resulting in abnormal results. In contrast, the hindcast results of the WGAN-LSTM-DISO model were closest to the observed data, with a smoother test curve, effectively correcting the abnormal hindcast.

4.3. Model Evaluation

As shown in Table 3, the correlation coefficient R between the hindcasted and observed values of the three models is more excellent. At the same time, the RMSD and AE of LSTM, WGAN-LSTM, and WGAN-LSTM-DISO are reduced in turn. We can see that WGAN-LSTM can reduce the RMSD error of LSTM from 2.122 to 1.109, and the test RMSD error of WGAN-LSTM-DISO is further reduced to 0.686. However, it is worth noting that the MAE of all three models is greater than 0, indicating that all three models are overestimated, mainly due to the different periodic nature of the training and validation sets. Calculating the DISO difference between the two comparison models and the LSTM model and dividing it by the LSTM’s DISO value obtains the performance improvement ratio, which quantitatively evaluates the degree of improvement in the model’s overall performance. Compared with the LSTM model, the performance of WGAN-LSTM and WGAN-LSTM-DISO is improved by 51.9% and 75.2%, respectively. All of this shows that our data enhancement processing and loss function improvements can significantly improve the model’s accuracy. This significant performance improvement shows that our data augmentation processing and loss function improvements are essential for improving model accuracy. Finally, we note that the standard deviation of each statistical metric for WGAN-LSTM-DISO is minimal, which means that the WGAN-LSTM-DISO model exhibits the most excellent stability across all evaluation metrics.

After a preliminary assessment of the performance of the three models, we further visualize the sample thickness contours predicted by the three models to analyze the effectiveness of the MC profile in capturing the uncertainty of the thickness hindcast. Figure 8 shows the mean and standard deviation of the model’s hindcast at each time point for the complete dropout sample pool (MC samples are obtained every ten independent random trials). The error bars are generated based on two standard deviations around the mean, which can cover 99.7% of the samples under the Gaussian assumption. We can see that the average curve of the WGAN-LSTM-DISO model best matches the actual observations, and at any test time point, the sample distribution generated by this model accurately surrounds the observations. In contrast, the error bars of the LSTM and WGAN-LSTM models at the beginning of the validation set cover the basic facts. However, as time goes on, the actual observations begin to deviate from the sample distributions predicted by these two models and exceed the range of the error bars. In addition, the variance of the samples from the three models increases over time, making it more challenging for the model to predict the ice thickness at the test time point in the future.

Ideally, an effective uncertainty estimate for a model should ensure that the sample distribution it generates matches the actual observed value distribution at the test points. In other words, for the kth percentile of the sample generated by the ideal model, we expect k% of the actual observations to fall within that percentile range. To quantify this concept, we first fit a Gaussian distribution over the complete sample pool generated by the model and then estimate the two-tailed percentile of the observed underlying truth. Figure 9 shows the cumulative percentage of observations (y-axis) falling within a particular percentile of the model-predicted sample (x-axis). Ideal model performance is represented by the diagonal line y = x, where the proportion of observations falling within a percentile equals the percentile value. If the model curve lies below the diagonal line y = x, this indicates that the proportion of actual observations falling within a given percentile is lower than expected. This suggests that the model is overconfident, which could lead to over-reliance on the accuracy of the model predictions in practical applications.

The WGAN-LSTM and WGAN-LSTM-DISO are partly below the diagonal line, producing slightly confident uncertainty estimates. The curve of the LSTM model is almost entirely above the diagonal, so its uncertainty estimate is much larger than expected in an ideal situation. When there is a lack of observational data information about the test date, it is generally preferable for the model to show a slightly less confident attitude. Overall, WGAN-LSTM (including WGAN-LSTM-MSE and WGAN-LSTM-DISO) produces good uncertainty estimates and effectively captures the characteristics of the distribution of actual observations. However, it is to our regret that the uncertainty of the WGAN-LSTM without DISO performs best. This is mainly due to the introduction of RMSE in DISO’s design R; these indicators will assign higher weights to larger prediction errors. For example, RMSE amplifies the impact of errors, forcing the model to minimize such errors through extreme predictions. Similarly, when the weight of R is high, the model will enhance the linear correlation with the actual label through extreme predicted values.

4.4. Model Generalization Ability Evaluation and Physical Process

Based on the feature data extracted from 12 SIMBAs obtained by the MOSAiC program, this study systematically evaluated the generalization ability of the model proposed in this chapter in predicting Arctic SIT. These buoys are spatially distributed in the central Arctic Ocean, Fram Strait, and North Pole region, with a period from October 2019 to August 2020, including key phenological stages such as the Arctic sea ice growth period (November–April of the following year) and partial melting ice period (May–August). Figure 10 shows the time series of the predicted thickness and observed ice thickness of the models on different buoys. As shown in Figure 10a–i, the predicted ice thickness trends of the three models are consistent with the measured changes, and there is a significant deviation between the predicted results of the LSTM model and the measured SIT from October 2019 to January 2020, while it shows good agreement in other periods. This time-dependent error is mainly due to the limitations of the training data, as the model is only trained on data from April to June 2020, resulting in poor generalization ability of the LSTM model for samples far from the training period. In contrast, the testing performance of the WGAN-LSTM and WGAN-LSTM-DISO models on all buoys is significantly better than that of LSTM. However, from the intuitive comparison of the models in Figure 10, it can be seen that the visual difference in prediction results between the WGAN-LSTM and WGAN-LSTM-DISO models is not significant. This indicates that introducing the WGAN framework has mainly improved the models’ performance, while the DISO module’s improvement is relatively limited.

In order to quantitatively analyze and demonstrate the generalization of the model, Table 4 shows the average error metrics of the model on all buoys. Compared to traditional LSTM models, the WGAN-LSTM model significantly improves prediction accuracy, with MAE reduced to 0.242, RMSD significantly reduced to 0.887, and the DISO index optimized to 0.005. Although the MAE value of WGAN-LSTM-DISO is slightly higher than that of WGAN-LSTM, its comprehensive indicator DISO has increased by 20% compared to the latter. The R values of all three models remain at a high level of 0.999, indicating that they can accurately reflect the trend of SIT changes. In summary, WGAN-LSTM-DISO exhibits optimal performance on buoy data from different regions and times, fully verifying its significant generalization ability.

Due to air temperature being a key factor affecting Arctic SIT, this study analyzed the relationship between ice thickness predicted by the WGAN-LSTM-DISO model and measured air temperature during the freezing period (25 November 2019 to 1 January 2020) and melting period (1 June to 29 June 2020) based on observation data from buoy 2019T67. As shown in Figure 11, the model exhibits a negative correlation between ice thickness and air temperature increase under both ice conditions. During the frozen period, due to the insulation effect of snow cover, the heat exchange between the atmosphere and sea ice is suppressed, resulting in a slower response of ice thickness to air temperature changes. During the melting ice period, as snow melts and solar radiation increases, the heat absorbed by sea ice significantly increases, making ice thickness more sensitive to air temperature rise. This indicates that the model prediction results can accurately reflect the physical relationship between air temperature and ice thickness.

5. Conclusions

The acquisition of the Arctic sea ice data is limited by factors such as observation equipment and geographical environment, and the amount of data is relatively limited. In order to alleviate the difficulty of data volume that is too small and improve the accuracy of DL in predicting SIT, a novel hybrid model based on DL for hindcasting Arctic SIT was proposed: WGAN-LSTM. First, high-quality atmospheric forcing data and the corresponding SW and SIT were generated using WGAN and spliced with the original data. Second, the blended data were used as inputs to LSTM. We used different loss functions during training to obtain the SIT output at the next point. Finally, 100 MC samples generated by each model were used to evaluate the test results with model uncertainty analysis. The WGAN-LSTM model in this paper takes about 1 min to train once on iRIS xe. Through qualitative analysis, we found that the combined performance of the WGAN-LSTM and WGAN-LSTM-DISO models improved by 51.9% and 75.2%, respectively, compared to the LSTM model. Based on the test results of the models on other buoys, the two proposed models demonstrate good generalization performance.

The main contributions of this study are summarized as follows:

We propose a novel DL-based WGAN-LSTM method for small-scale Arctic SIT hindcasting, which alleviates the challenge of DL to address the insufficient amount of data for Arctic SIT prediction.
Compared with LSTM, the robustness and generalization ability of the WGAN-LSTM model is improved. The synthetic data generated by WGAN contains various possible sea ice change scenarios, which enables LSTM to better adapt to sea ice changes in different environments during the training process, and can more accurately capture the time-series features of SIT changes, thus improving the prediction accuracy and reducing the overfitting phenomenon.
Using a new comprehensive index DISO as the loss function, the anomalies appearing in the testing process are effectively corrected, but it can cause the model to become overconfident.
We demonstrate that the model prediction results can accurately reflect the physical relationship between air temperature and ice thickness.

However, the black-box nature of neural networks makes the model uncertain, and the predictions are less interpretable. We have only experimented on a small dataset, and the uncertainty and quality of the data seriously affect the model’s performance, so the model’s performance, when applied to a broader range of SIT predictions, remains to be explored. Sea ice changes show substantial variability and spatial–temporal heterogeneity due to the combined influence of complex physical mechanisms such as oceanic factors, atmospheric factors, and regional distribution. In the future, we can move toward combining deep learning models with expertise in the meteorological and oceanographic fields to improve the applicability and accuracy of the models through the selection and screening of input data, the constraints of physical mechanisms, and the processing of long time series.

Author Contributions

All authors contributed significantly to this manuscript. B.G.: writing—original draft preparation, software, drawings, and investigation; Y.L.: writing—review and editing, data curation, and resource acquisition; L.W.: writing—review and editing, resources, and funding acquisition; P.L.: funding acquisition; H.L.: supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by (1) the National Natural Science Foundation of China (No. 42320104004) and (2) the Fundamental Research Funds for the Central Universities (No. DUT24LK006).

Data Availability Statement

The data used in this study are cited in the references. The simulation results generated during this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

GAN	Generative adversarial network
WGAN	Wasserstein Generative adversarial network
LSTM	Long short-term memory
DL	Deep learning
SIT	Sea ice thickness
SW	Snow depth
SIMBA	Sea ice mass balance buoy
ERA5	European Centre for Medium-Range Weather Forecasts Reanalysis version 5
MOSAiC	Multidisciplinary drifting Observatory for the Study of Arctic Climate
weed	10 m wind field
fsw	Surface solar radiation downwards
t2m	2 m air temperature
tp	Total precipitation
tcc	Proportion of total cloud cover
MAE	Mean absolute error
RMSE	Root mean square error
RMSD	Center root mean square error
DISO	Distance between Indices of Simulation and Observation
MC	Monte Carlo
LeakyReLU	Leaky Rectified Linear Unit
RMSprop	Root Mean Square Propagation
AdamW	Adam with Weight Decay

References

Francis, J.A.; Chan, W.; Leathers, D.J.; Miller, J.R.; Veron, D.E. Winter Northern Hemisphere weather patterns remember summer Arctic sea-ice extent. Geophys. Res. Lett. 2009, 36, L07503. [Google Scholar] [CrossRef]
Serreze, M.C.; Barry, R.G. Processes and impacts of Arctic amplification: A research synthesis. Glob. Planet. Change 2011, 77, 85–96. [Google Scholar] [CrossRef]
Curry, J.A.; Schramm, J.L.; Ebert, E.E. Sea Ice-Albedo Climate Feedback Mechanism. J. Clim. 1995, 8, 240–247. [Google Scholar] [CrossRef]
Storlazzi, C.D.; Gingerich, S.B.; Van Dongeren, A.; Cheriton, O.M.; Swarzenski, P.W.; Quataert, E.; Voss, C.I.; Field, D.W.; Annamalai, H.; Piniak, G.A. Most atolls will be uninhabitable by the mid-21st century because of sea-level rise exacerbating wave-driven flooding. Sci. Adv. 2018, 4, eaap9741. [Google Scholar] [CrossRef]
Cohen, J.; Screen, J.A.; Furtado, J.C.; Barlow, M.; Whittleston, D.; Coumou, D.; Francis, J.; Dethloff, K.; Entekhabi, D.; Overland, J.; et al. Recent arctic amplification and extreme mid-latitude weather. Nat. Geosci. 2014, 7, 627–637. [Google Scholar] [CrossRef]
Huang, J.; Hitchcock, P.; Maycock, A.C.; Mckenna, C.M.; Tian, W. Northern hemisphere cold air outbreaks are more likely to be severe during weak polar vortex conditions. Commun. Earth Environ. 2021, 2, 147. [Google Scholar] [CrossRef]
Hou, Y.; Cai, W.; Holland, D.M.; Cheng, X.; Zhang, J.; Wang, L.; Johnson, N.C.; Xie, F.; Sun, W.; Yao, Y.; et al. A surface temperature dipole pattern between Eurasia and North America triggered by the Barents–Kara sea-ice retreat in boreal winter. Environ. Res. Lett. 2022, 17, 114047. [Google Scholar] [CrossRef]
Bailey, H.; Hubbard, A.; Klein, E.S.; Mustonen, K.R.; Welker, J.M. Arctic sea-ice loss fuels extreme European snowfall. Nat. Geosci. 2021, 14, 283–288. [Google Scholar] [CrossRef]
Zou, Y.; Wang, Y.; Zhang, Y.; Koo, J.H. Arctic sea ice, Eurasia snow, and extreme winter haze in China. Sci. Adv. 2017, 3, e160275. [Google Scholar] [CrossRef]
Smith, L.C.; Stephenson, S.R. New Trans-Arctic shipping routes navigable by midcentury. Proc. Natl. Acad. Sci. USA 2013, 3, E1191–E1195. [Google Scholar] [CrossRef]
Liu, M.; Kronbak, J. The potential economic viability of using the Northern Sea Route (NSR) as an alternative route between Asia and Europe. J. Transp. Geogr. 2010, 8, 434–444. [Google Scholar] [CrossRef]
Stephenson, S.R.; Smith, L.C.; Agnew, J.A. Divergent long-term trajectories of human access to the Arctic. Nat. Clim. Change 2011, 1, 156–160. [Google Scholar] [CrossRef]
Rigot- Müller, P.; Cheaitou, A.; Etienne, L.; Faury, O.; Fedi, L. The role of polarseaworthiness in shipping planning for infrastructure projects in the Arctic: The case of Yamal LNG plant. Transp. Res. Part A Policy Pract. 2022, 155, 330–353. [Google Scholar] [CrossRef]
Sallila, H.; Farrell, S.L.; McCurry, J.; Rinne, E. Assessment of contemporary satellite sea ice thickness products for Arctic sea ice. Cryosphere 2019, 13, 1187–1213. [Google Scholar] [CrossRef]
Gu, F.; Zhang, R.; Tian-Kunze, X.; Han, B.; Zhu, L.; Cui, T.; Yang, Q. Sea Ice Thickness Retrieval Based on GOCI Remote Sensing Data: A Case Study. Remote Sens. 2021, 13, 936. [Google Scholar] [CrossRef]
Landy, J.C.; Dawson, G.J.; Tsamados, M.; Bushuk, M.; Stroeve, J.C.; Howell, S.E.L.; Krumpen, T.; Babb, D.G.; Komarov, A.S.; Heorton, H.D.B.S. A year-round satellite sea-ice thickness record from CryoSat-2. Nature 2022, 609, 517–522. [Google Scholar] [CrossRef]
Wang, Y.; Yuan, X.; Bi, H.; Ren, Y.; Liang, Y.; Li, G.; Li, X. Understanding Arctic Sea Ice Thickness Predictability by a Markov Model. J. Clim. 2023, 36, 4879–4897. [Google Scholar] [CrossRef]
Hunke, E.C.; Hebert, D.A.; Lecomte, O. Level-ice melt ponds in the Los Alamos sea ice model, CICE. Ocean Model. 2013, 71, 26–42. [Google Scholar] [CrossRef]
Girard, L.; Amitrano, D.; Weiss, J. Failure as a critical phenomenon in a progressive damage model. J. Stat. Mech. Theory Exp. 2010, 2010, 577–611. [Google Scholar] [CrossRef]
Rösel, A.; Itkin, P.; King, J.; Divine, D.; Wang, C.; Granskog, M.A.; Krumpen, T.; Gerland, S. Thin sea ice, thick snow, and widespread negative freeboard observed during N-ICE2015 north of Svalbard. J. Geophys. Res. Oceans 2018, 123, 1156–1176. [Google Scholar] [CrossRef]
Jackson, K.; Wilkinson, J.; Maksym, T.; Meldrum, D.; Beckers, J.; Haas, C.; Mackenzie, D. A Novel and Low-Cost Sea Ice Mass Balance Buoy. J. Atmos. Ocean. Technology. 2013, 30, 2676–2688. [Google Scholar] [CrossRef]
Nicolaus, M.; Perovich, D.K.; Spreen, G.; Granskog, M.A.; Albedyll, L.V.; Angelopoulos, M.; Anhaus, P.; Arndt, S.; Belter, H.J.; Bessonov, V.; et al. Overview of the MOSAiC expedition: Snow and sea ice. Elem. Sci. Anthr. 2022, 10, 000046. [Google Scholar] [CrossRef]
Rabe, B.; Heuzé, C.; Regnery, J.; Aksenov, Y.; Allerholt, J.; Athanase, M.; Bai, Y.; Basque, C.; Bauch, D.; Baumann, T.M.; et al. Overview of the MOSAiC expedition: Physical oceanography. Elem. Sci. Anthr. 2022, 10, 00062. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Chi, J.; Kim, H.C. Prediction of Arctic Sea Ice Concentration Using a Fully Data Driven Deep Neural Network. Remote Sens. 2017, 9, 1305. [Google Scholar] [CrossRef]
Liu, L. A review of deep learning for cryospheric studies. In Deep Learning for the Earth Science: A Comprehensive Approach to Remote Sensing, Climate Science, and Geosciences; John Wiley & Sons: Hoboken, NJ, USA, 2021; pp. 258–268. [Google Scholar]
Anderson, T.R.; Hosking, J.S.; Pérez-Ortiz, M.; Paige, B.; Elliott, A.; Russell, C.; Law, S.; Jones, D.C.; Wilkinson, J.; Phillips, T. Seasonal Arctic sea ice forecasting with probabilistic deep learning. Nat. Commun. 2021, 12, 5124. [Google Scholar] [CrossRef]
Ren, Y.; Zhang, W. A data-driven deep learning model for weekly sea ice concentration prediction of the pan-arctic during the melting season. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4304819. [Google Scholar] [CrossRef]
Song, C.T.; Zhu, J.; Li, X.C. Assessments of Data-Driven Deep Learning Models on One-Month Predictions of Pan-Arctic Sea Ice Thickness. Adv. Atmos. Sci. 2024, 41, 1379–1390. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. In Proceedings of the 29th Annual Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the 28th Conference on Neural Information Processing Systems (NIPS), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Xu, Z.; Du, J.; Wang, J.J.; Jiang, C.X.; Ren, Y. Satellite Image Prediction Relying on GAN and LSTM Neural Networks. In Proceedings of the IEEE International Conference on Communications (IEEE ICC), Shanghai, China, 20–24 May 2019. [Google Scholar]
Mavsar, M.; Morimoto, J.; Ude, A. GAN-Based Semi-Supervised Training of LSTM Nets for Intention Recognition in Cooperative Tasks. IEEE Robot. Autom. Lett. 2024, 9, 263–270. [Google Scholar] [CrossRef]
Bousmina, A.; Selmi, M.; Ben Rhaiem, M.A.; Farah, I.R. A Hybrid Approach Based on GAN and CNN-LSTM for Aerial Activity Recognition. Remote Sens. 2023, 15, 3626. [Google Scholar] [CrossRef]
Hersbach, H.; Bell, B.; Berrisford, P.; Hirahara, S.; Horányi, A.; Muñoz-Sabater, J.; Nicolas, J.; Peubey, C.; Radu, R.; Schepers, D.; et al. The ERA5 global reanalysis. Q. J. R. Meteorol. Soc. 2020, 146, 1999–2049. [Google Scholar] [CrossRef]
Lei, R.B.; Cheng, B.; Hoppmann, M.; Zhang, F.Y.; Zuo, G.Y.; Hutchings, J.K.; Lin, L.; Lan, M.S.; Wang, H.Z.; Regnery, J.; et al. Seasonality and timing of sea ice mass balance and heat fluxes in the Arctic transpolar drift during 2019–2020. Elem. Sci. Anthr. 2022, 10, 000089. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [Google Scholar] [CrossRef] [PubMed]
Elman, J.L. Finding Structure in Time. Cogn. Sci. 1990, 14, 179–211. [Google Scholar] [CrossRef]
Hu, Z.; Chen, X.; Zhou, Q.; Chen, D.; Li, J. DISO: A rethink of Taylor diagram. Int. J. Climatol. 2019, 39, 2825–2832. [Google Scholar] [CrossRef]
Hu, Z.; Hu, Q.; Zhang, C.; Chen, X.; Li, Q. Evaluation of Reanalysis, Spatially-interpolated and Satellite Remotely-sensed Precipitation data sets in Central Asia. J. Geophys. Res.-At. 2016, 121, 5648–5663. [Google Scholar] [CrossRef]
Kononenko, I. Bayesian neural networks. Biol. Cybern. 1989, 61, 361–370. [Google Scholar] [CrossRef]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016. [Google Scholar]
Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Rasmussen, C.E.; Williams, C.K.I. Gaussian Processes for Machine Learning. Int. J. Neural Syst. 2004, 14, 69–106. [Google Scholar]

Figure 1. Schematic diagram of the snow and ice mass balance array (SIMBA). A sensor array is inserted into a 2-inch ice hole, linked to a Pelican Case with electronics, batteries, and an Iridium modem.

Figure 2. The operation periods of the buoys.

Figure 3. The structure of Long Short-Term Memory (LSTM). “×” represents matrix multiplication, “+” represents matrix addition, black arrows represent forward propagation, and red arrows represent backward propagation.

Figure 4. The proposed model for Arctic sea ice thickness (SIT) hindcast: Wasserstein GAN (WGAN)-LSTM.

Figure 5. Scatter density plot between features and the corresponding SIT: (a) 200 iterations; (b) 500 iterations; (c) 1000 iterations; (d) loss function curve of the WGAN generator and discriminator.

Figure 6. The time series of SIT exported by LSTM, WGAN-LSTM, and WGAN-LSTM-DISO, and measurement SIT. Obs represent measurement data (black diamond); take one point for every ten observations.

Figure 7. The proportion of total cloud cover (tcc) from 6 April to 26 July 2020. The red triangle presents the date when the tcc decreased on test data.

Figure 8. The means and standard deviations of the MC samples generated by LSTM, LSTM-DISO, and WGAN-LSTM-DISO in July 2020 (100 samples). From left to right: (a) the hindcast of LSTM; (b) the hindcast of GAN-LSTM; (c) the hindcast of GAN-LSTM-DISO.

Figure 9. Cumulative percentage of observations within a certain percentile of samples of three models.

Figure 10. Results of the trained model on other buoys. (a–l) represent buoy 2019T47, buoy 2019T58, buoy 2019T62, buoy 2019T63, buoy 2019T66, buoy 2019T67, buoy 2019T68, buoy 2019T69, buoy 2019T70, buoy 2020T73, buoy 2020T77, buoy 2020T79.

Figure 11. Scatter plot of actual air temperature observations and predicted ice thickness by the WGAN-LSTM-DISO model on buoy 2019T67. (a) represents part of the ice-covered period spanning 25 November 2019 to 1 January 2020; (b) represents part of the melt season spanning 1 June to 29 June 2020.

Table 1. Selected atmospheric forcing data for WGAN-LSTM model.

Variable	Source	Abbreviation or Calculation	Units
10 m wind field	ERA5	wind	m/s
Surface solar radiation downwards	ERA5	fsw	W/m²
2 m air temperature	ERA5	t2m	℃
Total precipitation	ERA5	tp	mm/h
Proportion of total cloud cover	ERA5	tcc	%

Table 2. Hyperparameters of all modules.

Module	Parameter	Values
WGAN generator	Noise vector dimension z	100
	Number of layers	five layers
	Activation function	LeakyReLU; output layer: tanh
	Number of neurons in each layer	(128, 256, 512, 1024, 192)
WGAN discriminator	Number of layers	three layers
	Activation function	LeakyReLU; output layer: tanh
	Number of neurons in each layer	(512, 256, 1)
WGAN model	Optimizer	RMSprop (lr = 0.00005)
	Batch size	64
	Number of epoch	1000
	Loss function	Wasserstein loss
LSTM model	Input frame resizing	24 × 7
	Batch size	256
	Number of LSTM layers	2
	Number of epoch	50
	Dropout rate	0.3
	Optimizer	AdamW

Table 3. The corresponding statistical metrics and their standard deviations between MC sample values and observed values.

Dataset	R	RMSD	MAE	DISO
Obs	1	0	0	0
LSTM	$0.978 \pm 0.010$	$2.212 \pm 0.274$	$3.242 \pm 2.119$	$0.044 \pm 0.009$
WGAN-LSTM	$0.987 \pm 0.005$	$1.109 \pm 0.178$	$1.322 \pm 1.020$	$0.021 \pm 0.005$
WGAN-LSTM-DISO	$0.994 \pm 0.002$	$0.686 \pm 0.112$	$0.574 \pm 0.572$	$0.011 \pm 0.002$

Table 4. The mean error of models on all buoys.

Model	MAE	RMSD	R	DISO
LSTM	2.295	2.825	0.999	0.020
WGAN-LSTM	0.242	0.887	0.999	0.005
WGAN-LSTM-DISO	0.260	0.798	0.999	0.004

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, B.; Liu, Y.; Lu, P.; Wang, L.; Liao, H. Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach. Water 2025, 17, 1263. https://doi.org/10.3390/w17091263

AMA Style

Gao B, Liu Y, Lu P, Wang L, Liao H. Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach. Water. 2025; 17(9):1263. https://doi.org/10.3390/w17091263

Chicago/Turabian Style

Gao, Bingyan, Yang Liu, Peng Lu, Lei Wang, and Hui Liao. 2025. "Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach" Water 17, no. 9: 1263. https://doi.org/10.3390/w17091263

APA Style

Gao, B., Liu, Y., Lu, P., Wang, L., & Liao, H. (2025). Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach. Water, 17(9), 1263. https://doi.org/10.3390/w17091263

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advancing Sea Ice Thickness Hindcast with Deep Learning: A WGAN-LSTM Approach

Abstract

1. Introduction

2. Data and Methods

2.1. Dataset Description

2.1.1. The SIT Dataset

2.1.2. Atmospheric Forcing Data

2.1.3. Data Preprocessing

2.2. Model

2.2.1. Wasserstein Generative Adversarial Network

2.2.2. Long Short-Term Memory Network

2.2.3. A New Loss Function: DISO

2.2.4. WGAN-LSTM

2.3. Performance Evaluation

2.3.1. Accuracy Evaluation of SIT Hindcast

2.3.2. Uncertainty Quantification

3. Experiment Settings

4. Results and Qualitative Analysis

4.1. Generated Features

4.2. Model Hindcast Results

4.3. Model Evaluation

4.4. Model Generalization Ability Evaluation and Physical Process

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI