Successful Precipitation Downscaling Through an Innovative Transformer-Based Model

Yang, Fan; Ye, Qiaolin; Wang, Kai; Sun, Le

doi:10.3390/rs16224292

Open AccessArticle

Successful Precipitation Downscaling Through an Innovative Transformer-Based Model

¹

School of Computer Science, Nanjing University of Information Science and Technology, Nanjing 210044, China

²

College of Information Science and Technology, Nanjing Forestry University, Nanjing 210037, China

³

Nanjing NARl Information & Communication Technology Co., Ltd., Nanjing 211815, China

⁴

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology, Nanjing 210044, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(22), 4292; https://doi.org/10.3390/rs16224292

Submission received: 16 October 2024 / Revised: 12 November 2024 / Accepted: 14 November 2024 / Published: 18 November 2024

(This article belongs to the Special Issue Precipitation, Flood and Earthquake Events Monitoring, Simulation, Analysis and Early Warning by Advanced Environmental Remote Sensing and AI)

Download

Browse Figures

Versions Notes

Abstract

:

In this research, we introduce a novel method leveraging the Transformer architecture to generate high-fidelity precipitation model outputs. This technique emulates the statistical characteristics of high-resolution datasets while substantially lowering computational expenses. The core concept involves utilizing a blend of coarse and fine-grained simulated precipitation data, encompassing diverse spatial resolutions and geospatial distributions, to instruct Transformer in the transformation process. We have crafted an innovative ST-Transformer encoder component that dynamically concentrates on various regions, allocating heightened focus to critical spatial zones or sectors. The module is capable of studying dependencies between different locations in the input sequence and modeling at different scales, which allows it to fully capture spatiotemporal correlations in meteorological element data, which is also not available in other downscaling methods. This tailored module is instrumental in enhancing the model’s ability to generate outcomes that are not only more realistic but also more consistent with physical laws. It adeptly mirrors the temporal and spatial distribution in precipitation data and adeptly represents extreme weather events, such as heavy and enduring storms. The efficacy and superiority of our proposed approach are substantiated through a comparative analysis with several cutting-edge forecasting techniques. This evaluation is conducted on two distinct datasets, each derived from simulations run by regional climate models over a period of 4 months. The datasets vary in their spatial resolutions, with one featuring a 50 km resolution and the other a 12 km resolution, both sourced from the Weather Research and Forecasting (WRF) Model.

Keywords:

Transformer-based; simulating high-resolution model; extreme precipitation weather; downscaled spatiotemporal precipitation

1. Introduction

The objective of enhancing precipitation resolution is to refine the granularity of satellite-derived rainfall information, thereby rendering it more appropriate for localized or regional-scale investigations. This is important for climate, ecological, and hydrological studies, as precipitation data with high spatio-temporal resolution can provide more accurate precipitation information, leading to a better understanding of global climate change and the hydrological cycle [1]. Global warming adversely affects the hydrological cycle, changing the water vapor capacity and evaporation rate of the atmosphere, leading to changes in the intensity and frequency of precipitation. This could lead to an increase and more frequent occurrence of extreme wet or dry events and even natural disasters such as forest fires [2,3]. Remote sensing technology and machine learning methods provide a way to detect these changes [4,5]. Nevertheless, the resolution of satellite-based precipitation datasets is often too broad for utilization in detailed hydrological and meteorological research, particularly when focusing on localized or regional areas [6,7]. Therefore, downscaling is required before using satellite precipitation products for research. The aim extends beyond enhancing the precision of satellite datasets for applicability at the local level; it also seeks to refine hydrological predictions. Over recent years, numerous scholarly works have put forth proposals aimed at elevating the spatial resolution of satellite rainfall assessments.

Downscaling is a method for transforming large-scale and low-resolution meteorological information into small-scale and high-resolution regional meteorological information. For the downscaling methods of satellite-derived precipitation, researchers mainly divide them into two categories: the statistical downscaling method and the dynamic downscaling method. Statistical downscaling techniques are predicated on the correlation between key and supplementary variables, a strategy that has gained popularity for refining the resolution of satellite precipitation data. Despite its efficiency, this approach is contingent upon the persistence of the established statistical patterns under existing climate conditions into future scenarios [8]. In addition, observational data may contain errors, which poses a challenge to developing robust models for future climate prediction. Therefore, in an ideal spatio-temporal scale, precipitation processes are usually simulated using multiple stochastic cascade models to capture the multi-scale characteristics of precipitation. This model can consider the multi-scale structure of precipitation [9].

The dynamical downscaling technique is grounded in the mathematical modeling of intricate atmospheric, oceanic, and terrestrial processes [10,11,12]. It is contingent upon the use of regional climate models, which demand considerable computational resources and extensive datasets, which limit the applicability of dynamic downscaling methods. Nevertheless, dynamic downscaling methods are still valuable in some areas of research, especially in cases where physical processes and complex interactions need to be considered. Compared with the statistical downscaling method, the dynamic downscaling method has lower requirements for data acquisition. The dynamical downscaling technique leverages the results from the Earth System Model (ESM) to set the initial and boundary conditions for simulations conducted by the Regional Climate Model (RCM), thereby generating detailed high-resolution climate projections. However, at the same time, this method needs higher computing resources and data support. The ability of RCM to resolve atmospheric properties typically ranges from a resolution of 10 to 50 km, depending on a number of factors, including the size of the area under study. These models employ parameterization of atmospheric physical processes, often similar to the mechanisms employed in ESMs. Due to the high computational cost, RCM is not suitable for large-scale ESM applications [13], especially where high resolution is required to clearly capture meteorological phenomena such as precipitation storms.

According to Gutmann’s [14] research, statistical downscaling techniques rely mainly on existing estimates of the spatial distribution of key meteorological elements (including temperature and precipitation), and assume that the localized distribution of these elements will not change significantly due to climate change. In contrast, dynamic downscaling techniques can reveal the dynamic mechanism of precipitation processes [15], and provide hydrometeorological parameters that are consistent with actual precipitation physical processes, which is crucial for many hydrological models. Therefore, from the perspective of improving the information richness and practicability of satellite-based precipitation estimates, dynamic downscaling and statistical downscaling methods have the same goal.

In the downscaling of meteorological elements, the global dependence relationship is very important, because the meteorological system has the global spatial and temporal correlation. Therefore, how to enhance the ability to model the global dependence relationship is a key problem in the downscaling of meteorological elements. In addition, the downscaling task has the following difficulties: (1). Physical process modeling: downscaling may be different in the physical processes of meteorological elements at different scales, and it needs to be properly modeled and parameterized. (2). Boundary conditions: downscaling processes usually require the use of high-resolution data as boundary conditions to provide finer information. However, high-resolution data may not be available or costly over a large area. (3). Uncertainty: high-resolution data propagation and simulated physical processes have uncertainties that can lead to bias and errors in downscaling results.

Transformer’s latest advances provide new ideas on how to solve these three problems [16,17,18], where the spatio-temporal Transformer model seeks a breakthrough in physical process modeling. Motivated by these considerations, this paper introduces a new spatial-temporal downscaling precipitation prediction model based on Transformer, namely, STTA. This model overcomes the problem that existing deep learning precipitation prediction models cannot accurately capture the spatio-temporal dynamic features, and studies the dependence between different positions in the input sequence at different scales, which can fully model the spatio-temporal correlation in meteorological element data. Our main contributions to precipitation downscaling are as follows:

We introduced a downscaled spatio-temporal precipitation model based on a Transformer attention mechanism (STTA). STTA extracts features of meteorological elements from low-resolution data through convolution. Input a Transformer encoder to enhance resolution, realize downscaling operation of meteorological element data, and generate high-resolution layers of meteorological elements and related information.
We built the ST-Transformer attention module so that the model can effectively be guided to accentuate or diminish specific aspects and enhance intermediate characteristics, which can solve the spatial sparsity of precipitation data.
We conducted validation of our model using two datasets derived from RCM simulations of the Weather Research and Forecasting Model (WRF) version 3.3.1, and juxtaposed our model’s performance with that of prevailing sophisticated precipitation downscaling models. The empirical outcomes affirm the model’s superiority.

2. Related Work

In recent decades, efforts have been directed towards crafting statistical downscaling techniques that capitalize on the linkage between rainfall and detailed environmental factors. With the evolution of artificial intelligence, a variety of machine learning methodologies, including but not limited to Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), Random Forests (RFs), and Deep Learning (DL), have emerged to address complex non-linear relationships. Jing et al. [19] introduced machine learning into the downscaling process and tested the downscaling performance of four machine learning algorithms, namely, Classified Regression Tree, K-nearest Neighbor, SVM, and RF, on the Tropical Rainfall Measuring Mission (TRMM) 3B43 (Version 7) precipitation dataset. Chen et al. [20] used the Geographically Weighted Regression (GWR) model to show that there is a strong relationship between precipitation and relevant small-scale environmental variables. These machine learning methods also have many applications in remote sensing image recognition [5,21].

Compared with the statistical downscaling method, the dynamic downscaling method focuses on simulating the physical mechanism and dynamic change of the precipitation process. The meteorological prediction framework incorporates a range of data assimilation techniques, including the Kalman Filter, as well as three-dimensional (3D-Var) and four-dimensional variational (4D-Var) algorithms, alongside the Polynomial Interpolation approach, among others. Among the aforementioned algorithms, the most commonly used is the 4D-Var assimilation algorithm due to its capability to integrate both conventional and non-conventional precipitation data in real time, offering enhanced dynamic consistency [22]. Wang et al. [23] applied the identical dynamical downscaling technique, the 4D-Var algorithm, to evaluate the efficacy of the WRF model in predicting heavy precipitation events in the Yangtze River Delta region. Their findings indicated that the incorporation of Integrated Multi-satellitE Retrievals for GPM (IMERG) precipitation data through assimilation led to improved accuracy in heavy rainfall forecasts and a reduction in the rate of erroneous predictions, particularly for precipitation intensities exceeding 5 mm/h.

With the increasing demand for high-resolution climate data in emerging climate research, deep learning has become increasingly popular in remote sensing [24,25]. The early deep learning meteorological downscaling methods mainly refer to the field of image super-resolution (SR) and mainly solve the problem of spatial downscaling. Vandal et al. [26] proposed the DeepSD framework in 2017, adding terrain elements to the lightweight CNN network as an influence to improve downscaling performance. Rodrigues et al. [27] used CNN to combine spatially in 2018, and finally obtained a better result on the precipitation problem. Wang et al. [28] proposed a new learned multi-resolution dynamic downscaling method that seeks to combine the advantages of dynamic and Deep Neural Network (DNN)-based downscaling methods, employing a pair of datasets in place of a solitary one; a novel CNN-driven downscaling methodology is crafted with the anticipation that it will yield intricate details akin to those present in the authentic high-resolution datasets. However, this method is relatively weak for global dependency modeling. Jing et al. [29] proposed an attention-mechanism-based Convolutional network (AMCN) in 2022, which combined with in an situ measurement to further improve downscaling results and achieve high-quality and fine-scale precipitation estimation. In 2022, Harris et al. [30] applied Generative Adversarial Networks (GANs) to the use of high-resolution radar measurements as Ground Truth to improve the accuracy and resolution of relatively low-resolution inputs from weather prediction models. Wang et al. [31] proposed a custom DL model to address the challenge of bias correction and downscaling of hourly precipitation data. The model integrates specific loss functions, multi-task learning, and physical covariates to improve the representation of small-scale and extreme precipitation features through MERRA2 reanalysis and Stage IV radar data test. Zhuang et al. [32] enhanced the Global Precipitation Measurement (GPM) IMERG data for urban areas by integrating them with gauge data and applying a spatial downscaling-calibration technique. Nishant et al. [33] compared machine learning (ML) and dynamical downscaling for refining precipitation data from climate models. An ML approach using a multi-layer perceptron (MLP) was more accurate than DD in capturing rainfall patterns and reducing extreme precipitation biases. Yoshikane et al. [34] presented a downscaling method for climate model simulations to better predict hourly precipitation at a local scale. The method enhances the estimation of precipitation patterns, including extremes, and captures climate change impacts on precipitation, providing a more detailed representation of local factors like topography. The LOCA statistical method [35] has been utilized to downscale CMIP6 climate data at a 6 km resolution for regions from central Mexico to southern Canada, which finds that extreme precipitation events are projected to increase in frequency, highlighting the need for end users to consider these enhanced CMIP6 data for climate impact assessments.

3. Methods

This research zeroes in on precipitation—a meteorological element characterized by its pronounced variability across temporal and spatial dimensions and notorious for being challenging to model within ESM [36]. The data on rainfall produced by RCM at successive time intervals can be conceptualized as two-dimensional arrays or visual representations. However these depictions of rainfall differ fundamentally from standard photographic imagery. For instance, the rendition of rainfall through dynamic downscaling processes might exhibit discrepancies between lower and higher spatial resolutions, irrespective of the RCM’s use, which varies solely in terms of spatial granularity. Such discrepancies are commonplace in the rainfall data rendered by ESMs operating at disparate spatial resolutions and subject to identical initial and boundary conditions. This presents formidable hurdles for the advancement of DNN-based downscaling methods. In the subsequent sections, we will delve into the specifics of the dataset utilized for our investigation and elaborate on our deep learning strategy.

3.1. Problem Description

The objective we aim to achieve is the prediction of precipitation from 50 km resolution simulation data in the CONUS region. The precipitation data in this region can be expressed as

X_{t} = [x_{1}, x_{2}, \dots, x_{N}] \in R^{X_{W} \times X_{H} \times N}

, where

X_{i}

is the precipitation data at time i,

X_{W}

represents the width of the data in the 50 km dataset, and

X_{H}

represents the height of the data in the 50 km dataset.

Y_{t}

is 12 km resolution simulation data,

Y_{t} = [y_{1}, y_{2}, \dots, y_{T}] \in R^{Y_{W} \times Y_{H} \times N}

, where

Y_{W}

represents the width of the data in the 12 km dataset,

Y_{H}

represents the height of the data in the 12 km dataset, and N represents the length of the dataset. The corresponding predicted value is expressed in

Z_{t}

. Consequently, the downscaling endeavor can be encapsulated as the process of mastering a mapping function

Z_{T_{p}} = f (X_{T_{i}}, Y_{T_{i}})

, where

T_{i}

denotes the duration of the input temporal segment and

T_{p}

signifies the extent of the projected temporal segment. Subsequently, we refine the resolution of the ensuing issues, as follows:

m i n \sum_{t = 1}^{T_{p}} \sum_{n = 1}^{N} [L (Y_{t}^{n}, Z_{t}^{n})],

(1)

where L stands for

l_{1}

loss function.

3.2. Network Structure

First, the basic operation of the network is introduced. The network includes a CNN feature encoder, an initial module, and an ST-Transformer feature encoder. First, the CNN feature encoder extracts the features of meteorological elements and connects them. Then, the initial module is constructed and the ST-Transformer feature encoder is used for feature extraction of the tensor twice. Finally, the features are connected with the high-resolution precipitation feature vector after convolutional feature extraction and extracted again to obtain high-resolution precipitation data. A network flow diagram of STTA is shown in Figure 1.

3.2.1. CNN Feature Encoder

To harness input variables for CNN training that encapsulates the correlation between precipitations from low-resolution and high-resolution model outputs, a selection of variables—including precipitation (Prec), temperature (Tem), total water vapor (Vap), and sea level pressure (LP)—is performed. These variables are then integrated into a three-dimensional tensor, serving as the input for the CNN model. This way of stacking variables has been tried and tested in other downscaling methods [26]. This approach of amalgamating climatic variables into distinct input channels is a technique observed in other downscaling research endeavors. However, diverging from their methodology, we employ the convolutional module as the foundational component due to its capacity to offer varying receptive fields across different layers [36]. We integrate kernel sizes of 1 × 1, 3 × 3, and 5 × 5 (as shown in Figure 2) within our convolutional module assembly, aiming to mitigate the complexity of capturing the interplay between low- and high-resolution simulation outputs, particularly when precipitation manifests at disparate locales across the two datasets.

To some extent, the convolutional module is necessary because the precipitation in an area is affected not only by the condition variables in that area but also by the condition variables in neighboring areas, contingent upon the nature of the meteorological system in question. For instance, rainfall associated with tropical cyclones, characterized by low sea level pressure (LP) centers, in the southeastern U.S. is commonly experienced to the east or northeast of the cyclone’s core. This pattern arises as moist air is transported northward from the Atlantic or the Gulf of Mexico, reaching the mainland. Furthermore, the amalgamation of variables acknowledges the interplay and collective influence of all variables on the precipitation event, albeit with varying degrees of significance. Conversely, precipitation is spatially scarce with numerous voids, posing challenges to the model training regimen. Given the sparsity of precipitation, we integrate a spatial attention mechanism into our model, which empowers it to discerningly accentuate or mute specific features during the learning process. This mechanism enhances the model’s capability to spotlight salient areas that are characterized by substantial precipitation levels. This approach is logically sound as it prioritizes the identification of regions with significant rainfall, which is pivotal for assessing climatic impacts, over periods when rainfall is minimal or scattered.

Additionally, due to the difficulty of combining variables with stark differences in size, distribution shape, density, and unit, we used a method called Encoded-Simple [28]. The encoding variable CNN assigns a dedicated convolution layer to each variable, enabling the extraction of features prior to their combination. This method ensures that the feature maps derived from each variable exhibit a degree of uniformity when they are subsequently stacked.

In particular, as illustrated in Figure 3, we designed convolutional layers for each of the quartet of variables, subsequently layering their resultant feature maps. We follow a parallel procedure for the terrain data, yet we amalgamate the feature maps after the second up-sampling phase, that is, once the feature maps have expanded to dimensions of 512 × 256. Within the scope of this research, we initially concentrate on the spatial attention within the feature maps for each variable prior to their aggregation, subsequently shifting our focus to channel attention. This is primarily due to the sporadic and uneven spatial attributes of precipitation, which are critical in evaluating the model’s efficacy.

3.2.2. ST-Transformer

The structure of the ST-Transformer is shown in Figure 4, which contains a variable attention layer, a spatial attention layer, and a unimodal feature encoder.

F_{i n}

represents the feature vector after convolutional feature extraction, and

F_{o u t}

represents the feature vector after encoder resolution enhancement.

In order to enable the model to adaptively focus on different parts according to the relevance of the input feature vectors and to be able to pay more attention to and highlight important spatial locations or regions when processing spatial information, we employ a variable attention layer and a spatial attention layer. The variable attention layer outputs the feature

F_{V A}

based on the input data

F_{i n}

, and then outputs the feature

F_{S A}

through the spatial attention layer.

The unimodal feature encoder consists of a multi-head self-attention mechanism, a multi-layer perceptron, and two normalization layers. Residual connection and normalization layers are added after the multi-head attention mechanism to address the issues of vanishing gradients and to expedite the training process. The unimodal feature encoder receives

F_{S A} \in R^{T_{a} \times D_{f}}

as its input, where

T_{a}

signifies the overall duration of the input data and

D_{f}

denotes the dimensionality of the features.

F_{S A}

is composed of the feature vectors derived from the modal data at each instant by the feature extraction process. The output of the unimodal feature encoder is the coded feature

F_{o u t} \in R^{T_{a} \times D_{f}}

of the precipitation data. Within the multi-attention sub-layer, we determine the correlation between the feature vector at each specific moment and the feature vector

M A_{X ⟶ X} (x_{t}, \{x_{0}, \dots, x_{k}\})

at other moments. We apply three matrices,

W_{Q}

,

W_{K}

, and

W_{V}

, to the feature vector

x_{t}

to derive the query

Q_{t}

, the key

W_{Q}

, and the value

V_{t}

. The dot product of

W_{t}

and

K_{j}

is utilized to ascertain the affinity metric between the feature vector at that moment and the feature vector at moment j. We normalize the score using

\sqrt{d_{k}}

, then apply a softmax function to weigh it against

V_{j}

, and aggregate the weighted feature values across all moments, denoted as

α_{j} V_{j}

to construct the updated feature vector. This transformation of

x_{t}

results in a feature vector of identical dimensions to

x_{t}

after attention via the attention mechanism involving the matrix

\{x_{0}, \dots, x_{k}\}

. The corresponding formula is presented below:

a_{j} = Q_{t} K_{j}^{T} / \sqrt{d_{k}},

(2)

α_{j} = e x p (a_{j}) / \sum e x p (a_{k}),

(3)

Z_{t} = \sum_{k} α_{j} V_{j},

(4)

M A = [Z_{0}, \dots, Z_{t}],

(5)

Multi-attention can be conceptualized as the deployment of multiple self-attention mechanisms that do not interfere with each other and splice the output of each self-attention mechanism into multiple outputs

[M A_{1}, \dots, M A_{k}]

. The encoded output from the unimodal feature encoder is subjected to a multiplication with a trainable weight matrix

W_{M A}

. The corresponding equation is presented below:

F_{o u t} = [M A_{1}, \dots, M A_{k}] W_{M A},

(6)

where

W \in R^{h \times d_{k} \times d_{w}}

h signifies the number of attention heads in the multi-attention mechanism and

d_{k}

represents the dimensionality of the key vectors

K_{t}

, respectively.

Within the unimodal feature encoder framework, we implement a multi-head self-attention mechanism. This setup enables the precipitation data’s feature vectors to be mapped into various representational subspaces by a series of distinct weight matrices. Such an arrangement bolsters the model’s capacity to concentrate on disparate features at varying time points, while keeping the input and output sizes in proportion.

3.3. Implementation

We verified the STTA model with 50 km rainfall simulation data. The training sample consisted of 90 × 8 × 64 × 128 3 h data, in which 90 represented a total of 90 days from January to March 2006, 8 represented 8 data with 3 h time steps in 1 day, and 64 × 128 represented the size of a single precipitation data image.

First, a CNN encoder is used to design a dedicated convolution layer for each meteorological factor data variable (

V a p

,

L P

,

T e m

,

P r e c

) and stack it. Similar features are extracted from each variable and connected on the channel axis to form low-resolution input data

L R

. Then, the low-resolution input data

L R

is stacked to form a three-dimensional tensor, and kernels with sizes of 1 × 1, 3 × 3, and 5 × 5 are used to build initial modules for feature extraction of the tensor, and the feature vector

F_{L R}

is obtained. At the same time, the spatial attention mechanism is used to take the low-resolution tensor

F_{L R}

after convolutional feature extraction as the input of the ST-Transformer structure and output the feature vector Z. Finally, the feature extraction and input and output operations of the initial module are repeated, and the obtained attention feature

Z^{'}

is connected with the high-resolution precipitation feature map

F_{H R}

, and the feature extraction is carried out again with the initial module to obtain high-resolution precipitation data

H R_{P r e c}

. The algorithmic flow of STTA is shown in Algorithm 1.

Algorithm 1 Algorithmic flow of STTA

Require:: { $V a p$ , $L P$ , $T e m$ , $T e m$ }, { $F_{H R}$ }
Ensure:: { $H R_{P r e c}$ }
1:: for $m = 1 \to M$ do //M is the total number of input meteorological element data variable types, namely, $V a p$ , $L P$ , $T e m$ , $P r e c$ .
2:: Design a dedicated convolution layer for current meteorological element data variable and stack it.
3:: The similar features are extracted from the data variables of current meteorological elements ( $F_{V a p}$ , $F_{L P}$ , $F_{T e m}$ , $F_{P r e c}$ ).
4:: end for
5:: The similar feature vectors of M meteorological element data variables $F_{V a p}$ , $F_{L P}$ , $F_{T e m}$ , $F_{P r e c}$ are connected on the channel axis to form $L R$ , which is stacked to form $L R^{'}$ .
6:: Using the initial module to extract the features of $L R^{'}$ to obtain $F_{L R}$ .
7:: Taking $F_{L R}$ after convolutional feature extraction as the input of the ST-Transformer structure and output Z.
8:: Repeating the above two steps once each, connect $Z^{'}$ with $F_{H R}$ , input it into the initial module for feature extraction, and output $H R_{P r e c}$ .
9:: return { $H R_{P r e c}$ }

4. Data and Experimental Configuration

To evaluate the effectiveness of the STTA network for precipitation downscaling, we compared the quantitative predictions of STTA with the current mainstream precipitation downscaling models on the CONUS simulation dataset. In this section, we begin with an overview of the source and format of the data, as well as the separation of the training and test sets. In addition, we also introduce the evaluation criteria of the experiment.

4.1. Data Description and Pretreatment

The dataset used for this research are the 4-month climate simulation results from two RCMs, executed with the Weather Research and Forecasting (WRF) Model version 3.3.1. One simulation operates at a 50 km resolution, and the other at a 12 km resolution. Both simulations are initialized using data from the National Centers for Environmental Prediction’s Reanalysis-2 (NCEP-R2), provided by the U.S. Department of Energy for the year 2006. The two simulations were performed independently, rather than in a nested domain, with each dataset output every 3 h, accumulating a total of 730 temporal intervals. This research area focuses on the Continental United States (CONUS) region, the 12 km resolution simulation encompassing a 512 × 256 grid configuration and the 50 km resolution simulation utilizing a 128 × 64 grid setup. We evaluated the 12 km precipitation field produced by the CNN model against two distinct WRF precipitation data sets: one derived from the WRF model operating at a 12 km grid interval, referred to as Ground Truth, and the other from the data that were upscaled from 50 km to 12 km using interpolation techniques, termed as the Interpolator. The 12 km precipitation data from the WRF model were selected as the Ground Truth benchmark as the CNN model is intended to replicate the quality of these data by modeling the relationship between precipitation at coarse and fine resolutions. We used 3 h of data from January to March 2006 for model training and validation, and the remaining April data for testing model performance.

The two WRF simulations are configured identically and employ the same physical parameterizations; the sole variation lies in their spatial resolutions. This divergence leads to two principal impacts on the precipitation patterns. First, the higher-resolution simulations offer a more refined interpretation of physical processes compared with their lower-resolution counterparts. Second, higher-resolution models provide a more detailed representation of the terrain, enabling a more accurate depiction of rainfall influenced by topography, the transition between land and sea, and coastal precipitation phenomena [37]. Variations in spatial resolution indirectly influence precipitation models as well; these variances result in the two simulations adopting distinct computational time steps (120 s versus 40 s). Such disparities could potentially contribute to variations in the simulated precipitation outcomes owing to the differential operator splitting between the dynamical and physical processes within the WRF model [38,39]. These elements contribute to discrepancies in the precipitation data between the 50 km and 12 km outputs from the WRF model. The differences extend beyond the distinct fine-scale features of the datasets; they also encompass variations in the spatial distribution of precipitation events.

Considering these datasets, the task at hand is to determine the appropriate variables for input into our DNN downscaling framework. A multitude of factors influence the scale and variability of precipitation, which constitute the central theme of this research. Drawing from the physical underpinnings of precipitation, our selection of input variables is informed by the approach of Vandal et al. [26], who utilized low-resolution precipitation data along with high-resolution topographic information, among other factors. In our SR model, we incorporate additional inputs such as Vap, also known as precipitable water; LP; and Tem. These variables were selected due to their significant correlation patterns with precipitation over time. Each of these variables exhibits intricate spatial interdependencies, akin to visual imagery, albeit with the added complexity of climate data characterized by sparsity, dynamics, and chaotic behavior. For each grid cell within our model, we implement a consistent minimum precipitation threshold of 0.05 mm over a 3 h period to prevent the transmission of negligible precipitation values to the neural network. Additionally, we establish the 99.5th percentile as the upper limit for precipitation values per grid cell to mitigate the impact of outlier precipitation values that could distort the neural network training by skewing the loss function. Our proposed downscaling method, which leverages a Transformer-based approach, diverges from conventional statistical downscaling techniques, especially those reliant on regression models that transform spatial data into a vectorized form, thereby eliminating the inherent spatial structure.

4.2. Comparing Models

To analyze the performance of our model, we compared it with a number of models that are often used for downscaling predictions, including the following:

ESPCN: A deep learning method for image Super-Resolution (SR). Its core idea is to use a subpixel convolution layer to improve the resolution of the image, without the need for traditional interpolation methods. It directly extracts features from low-resolution images, reduces computational complexity and memory cost, and improves efficiency.

SRCNN: SRCNN uses convolutional neural networks to learn the mapping relationship between low-resolution and high-resolution images. SRCNN maximizes the predictability of statistical downscaling by adding multi-scale input channels, and can use the climate variable information of different scales to improve the accuracy and reliability of precipitation prediction.

Encoded-CNN: Encoded-CNN trains neural networks to map from the former to the latter by using a combination of low-resolution and high-resolution simulations that differ not only in spatial resolution but also in geospatial patterns. Encoded-CNN can capture the temporal and spatial distribution of precipitation data to improve the accuracy of precipitation predictions.

Directed-CNN: Directed-CNN is a convolutional neural network architecture designed for a specific task. It enhances the model’s ability to understand data characteristics by introducing directional information, especially when dealing with data with obvious directional or sequential characteristics.

4.3. Evaluation Criteria

We assess the statistical distribution of precipitation through the application of mean square error (MSE) and the probability density function (PDF), achieved by consolidating data across all grid cells, over both the extensive CONUS and a more localized area.

MSE is a statistic that measures the difference between the predicted value and the actual observed value, and it is the average of the squared error of the prediction. In the context of precipitation downscaling, MSE can be used to evaluate the consistency between the downscaling model output and the actual observed precipitation data. A lower MSE value generally indicates that the model’s predictions are more accurate because it means that there is less difference between the predicted and observed values. MSE can intuitively reflect the size of the prediction error, and is easy to calculate and understand. Calculate the MSE of each time step in the test cycle, as follows:

ℓ_{m s e} = \frac{1}{N} \sum_{i = 1}^{N} {(Y_{i}^{H} - G_{θ} (Y_{i}^{L}))}^{2},

(7)

where N signifies the aggregate count of grid cells within the research locale. The variables

Y_{i}^{H}

and

Y_{i}^{L}

correspond to the precipitation levels within grid cell

(i)

as rendered by the WRF model at high and low resolutions. In this scenario,

Y_{i}^{H}

serves as the Ground Truth. The notation

G_{θ}

denotes the deep neural network, characterized by parameters

θ

, which is employed to replicate the interplay between simulations of varying resolutions. According to the NCA, CONUS can be divided into seven subregions: Northeast (NE), Southeast (SE), Midwest (MW), Southwest (SW), Northwest (NW), Northern Great Plains (NGP), and Southern Great Plains (SGP). The specific division of regions is shown in Figure 5. As shown in Figure 6 (bottom right), we computed MSEs for the entire CONUS and the seven distinct subregions as delineated by the National Climate Assessment (NCA) [40].

PDF is a description of the probability distribution of a random variable taking a particular value. In the context of precipitation downscaling, PDF can be used to assess whether the precipitation distribution output by the model matches the actual observed precipitation distribution. A reasonable downscaling method should be able to reproduce the statistical properties of precipitation, including the frequency, intensity, and distribution of extreme events. PDF provides a comprehensive view of precipitation distribution and can capture the extreme events and heavy tail characteristics of precipitation, which helps to evaluate the ability of models to predict precipitation extremes. To compare the similarity between Ground Truth and alternative modeling approaches, we use the Jensen–Shannon (J–S) distance [41,42] method to measure the similarity between the two probability distributions. The smaller the value of the J–S distance is, the more similar the two distributions are. The larger the value is, the less similar the two distributions are. In downscaling studies of precipitation, the J–S distance serves as a tool for assessing the congruence between PDFs that are produced by downscaling models and those that are actually observed. The computation of the J–S distance is expressed as follows:

J S D (P, Q) = \sqrt{\frac{D (P | | M) + D (Q | | M)}{2}},

(8)

where P and Q represent the pair of probability distributions under scrutiny (namely, the 12 km precipitation distribution as simulated by the RCM and the 12 km precipitation distribution downscaled by the predicted model), the distribution M is the average of P and Q, and D signifies the Kullback–Leibler divergence [43], which is determined using the following formula:

D (P | | M) = \sum_{x \in X} P (x) log \frac{P (x)}{M (x)},

(9)

where x denotes the segmentation applied to PDF; specifically, x ranges from 0 to 30. We have computed the J–S distance among the five predicted PDFs and the Ground Truth. A minimal (maximal) distance suggests that forecast aligns closely (diverges) with the actual distribution of the Ground Truth. The J–S distance accounts for both the distributional similarity (or lack thereof) to the Ground Truth and the shifts in variability, particularly for extremes defined by thresholds, where variability is more prone to fluctuations. For these threshold-defined extremes, the frequency of occurrence is more susceptible to alterations [44].

Our research focuses on analyzing the spatial pattern and geographical distribution of J–S distance, coupled with an examination of precipitation’s temporal evolution in terms of its mean, variability, and extremities. In order to judge the performance of a model in simulating the spatial variability of the Ground Truth data, we performed a mode correlation analysis for each pair of data sets, including the comparison of Ground truth with interpolators, and the comparison of Ground Truth with different models. The remarkably high correlation results reveal that both the interpolator and the Transformer model can well reflect spatial variations in Ground Truth data, which are influenced by local factors such as topographic features and macro climate circulation patterns.

While these indicators evaluate the performance of individual models on a grid cell or regional basis by consolidating all information into one composite measure, they fail to precisely indicate the model’s ability to capture storm attributes such as the frequency, span, severity, and dimensions of discrete events. Such nuances can be obscured in localized or spatially consolidated time series assessments. To address the constraint, we utilize a storm object-oriented [45] feature tracking algorithm that pinpoints particular precipitation occurrences using only the precipitation field information. This algorithm reduces cascading effects via a quartet of stages in labeling contiguous components, facilitating the categorization of events that adhere to realistic storm dynamics. Additionally, the algorithm acknowledges the potential for an event to split and does not presuppose that precipitation must occur without interruption. Essentially, this approach can decompose precipitation deviations, highlighting variances in event duration, intensity, magnitude, and frequency. The length of each occurrence is measured in time steps, denoted as follows:

D = T_{e} - T_{b} + 1,

(10)

where

T_{b}

and

T_{e}

represent the initiating and concluding time steps, respectively. The average extent of a single precipitation event’s life cycle, denoted as

S_{l i f e}

, is determined by aggregating the comprehensive area (square kilometers; based on the grid cells involved multiplied by 144 throughout the event’s duration) and then partitioning by D. The average precipitation intensity over the life cycle is articulated by the following:

I_{l i f e} = \frac{V_{t o t}}{S_{l i f e}},

(11)

where

V_{t o t}

signifies the cumulative precipitation volume (measured in cubic meters), ascertained from the precipitation over the life of the event (multiplied by 144,000).

5. Experimental Results and Analysis

We studied the Ground Truth and the output results of six prediction models from two aspects of qualitative visual analysis and quantitative index analysis so as to evaluate the effectiveness of the models.

5.1. Qualitative Visual Analysis

Initially, we assessed the credibility of the quintet of methodologies by contrasting their prognostications against the pristine WRF simulation outcomes—serving as the Ground Truth—at a 12 km grid interval. Additionally, we compared these with the 12 km bilinear Interpolator’s renditions, which were upscaled from the 50 km dataset (as shown in Figure 6).

Figure 6 shows the qualitative visual analysis comparison between Ground Truth and six methods at certain moments. Next, we will analyze the output in detail from each indicator.

5.2. Quantitative Index Analysis

5.2.1. MSE

Table 1 summarizes the comparison of Ground Truth with the 50th percentile MSE selected by six precipitation prediction methods during the test period. STTA models generally show MSE similar to the Ground Truth except in the Northwest region, while in the Midwest, Southern Great Plains, and Northwest regions, an Encoded-CNN model displays smaller MSE with Ground Truth. In numerous instances across various time steps and regions, the Direct-CNN and Encoded-CNN models exhibit a higher MSE when juxtaposed with the Interpolator and ESPCN models. This discrepancy arises because the ESPCN model undergoes specialized training to refine a content-based loss function, which leads to a more conservative prediction domain, characterized by an excessively smooth representation of high-resolution precipitation. Conversely, the Direct-CNN and Encoded-CNN models alter the landscape of the loss function and introduce additional small-scale features, thereby enhancing their alignment with the physical attributes of the training data and offering a more realistic depiction of the actual precipitation field. Nonetheless, the introduction of these features may result in deviations from the Ground Truth in terms of MSE, as they cannot be deduced from the low-resolution inputs alone.

5.2.2. Probability Density Function

To affirm that STTA has adeptly captured the precipitation data’s distribution, we conducted an assessment of the Ground Truth across the full expanse of CONUS and its seven subregions, as well as PDF of precipitation for the other five forecasting models at each time step (shown in Figure 7). For a specific region, we consider all grid cells and time steps of the density function. In the distribution of light precipitation density, the disparity between the Ground Truth and the forecast model’s PDF value is not considerable. Ground Truth generally has a longer tail than Interpolator and SRCNN, while Direct-CNN and Encoded-CNN generally have a longer tail than Ground Truth. The distribution of SRCNN is basically consistent with that of Interpolator, and the density of heavy precipitation is smaller than the Ground Truth. Except for the Northern Great Plains, the distribution of STTA in other regions is the closest to the distribution of Ground Truth.

5.2.3. Jensen–Shannon Distance

Compared with PDF, J–S distance provides a more comprehensive and quantitative way to evaluate model performance in precipitation downscaling missions, especially when considering global distribution characteristics and multi-model comparisons. From the J–S distance (Table 2), compared with other precipitation prediction models, Interpolator, SRCNN, and ESPCN have worse distribution similarity with Ground Truth in general. The distribution of Direct-CNN in the Southeast, Northeast, and Northern Great Plains is closer to the Ground Truth than the rest of the forecast models, but the distribution, in certain areas, mirrors or falls short of the ESPCN’s performance, particularly in under-representing intense rainfall in the Southern Great Plains, Northwest, and Southwest regions. Compared with Interpolator, SRCNN, ESPCN, and Direct-CNN models, STTA and Encoded CNN models have better precipitation distribution in four subregions, including the Southwest, Midwest, Southern Great Plains, and Northwest, and have smaller J–S distances from the Ground Truth. This finding suggests that although Direct-CNNs obtain lower grid cell errors than STTA and Encoded-CNN, they fail to seize the intricate small-scale elements (i.e., local extremes of time and space) that more accurately reflect the characteristics of precipitation patterns. Indeed, we examined the variances in the lower tail of the PDF and discovered that, across numerous subregions, STTA aligns more closely with the Ground Truth compared with Interpolator, SRCNN, ESPCN, Direct-CNN, and Encoded-CNN models.

5.2.4. Event-Based Precipitation Characteristics

This study evaluated the effectiveness of different downscaling models in identifying and tracking precipitation. To focus on those precipitation scenarios that had a real impact, we excluded grid cells that received less than 10 mm of precipitation every 3 h. Following this exclusion, we proceeded to tally the quantity of precipitation occurrences that each model detected throughout the evaluation timeframe. The statistics indicate that there were 148 incidents in the Ground Truth, 42 were recorded by Interpolator, 61 identified by SRCNN, 72 flagged by ESPCN, 65 discovered by Direct-CNN, 87 recorded by Encoded CNN, and 99 by STTA. Although STTA is the most prominent among all models, there is a large underestimation in the prediction of the number of events in all models.

To appraise the efficacy of many models in replicating the intrinsic features of actual storms, we tracked and analyzed the life cycle of each model’s predicted storm to assess its average size, intensity, duration, and total precipitation. We categorize these features and count their frequency, the results of which are shown in Figure 8. From a precipitation intensity perspective, ground truth data show a significant precipitation event of more than 11 mm every 3 h. Although the Interpolator and SRCNN models failed to capture this phenomenon, the ESPCN, Encoded-CNN, and STTA models were able to successfully simulate it. This finding reaffirms the advantages of Transformer-based training methods for generating large amounts of precipitation, compared with Interpolator and current SRCNN models that more often generate weaker precipitation events and are underperforming at capturing these large amounts of precipitation events.

When analyzing the duration of precipitation events, we found that all models were able to accurately reflect those more common short-term events (0–4 h); however, SRCNN models performed poorly when dealing with longer-term events. As for the average event scale, compared with the observation results in Ground Truth, the newly developed Transformer model is more inclined to generate larger events, while generating smaller events. In terms of total precipitation, compared with Interpolator and SRCNN models, other models are more likely to produce precipitation events with high intensity, wide coverage, and long duration, so they show larger precipitation more frequently, which makes them more effective in capturing large-scale precipitation events recorded in Ground Truth data. Nonetheless, ESPCN, Direct-CNN, and Encoded-CNN models do not mimic the frequency of small-scale precipitation events as well as those in the Ground Truth data, whereas Interpolator, SRCNN, and STTA models do better.

5.3. Ablation Experiment

In previous experiments, STTA’s performance in downscaling precipitation prediction was evaluated. To fully demonstrate the effectiveness of this strategy, we evaluate the influence of feature fusion techniques by dissecting the influence of diverse convolution kernel ensembles within the inception module.

As shown in Table 3, in the downsampling process of meteorological element data, the inception module contains a multitude of layers consisting of 3 × 3 convolutions. The amalgamation of the initial layer with three subsequent layers of 3 × 3 convolutions emerges as the optimal configuration for feature extraction of meteorological element data. With an increment in the number of initial layers, the intricacy of the model increases and training becomes more and more challenging.

Through the above ablation experiments, our Transformer model can effectively extract data features through the initial module, indirectly improving the prediction accuracy.

6. Conclusions and Discussion

As can be seen from the experimental results, STTA outperforms newer networks on our dataset, such as Direct-CNN and Encoded-CNN. This may be because Encoded-CNN focuses only on feature extraction, while Direct-CNN focuses only on fast mapping, and both lack the ability to model global dependencies. In order to solve this problem, the ST-Transformer structure is specially designed in this paper so that the model can focus on different parts adaptively according to the correlation of input feature vectors and can pay more attention to and highlight important spatial locations or regions when processing spatial information.

The results show that this model’s MSE is lower compared with other downscaling models on two simulation datasets. As depicted in the qualitative visual analysis presented in Figure 6, our model demonstrates greater precision in forecasting the regions of precipitation. The empirical outcomes suggest that our model achieves excellent feature extraction and feature alignment in feature extraction.

The ability of the technology developed in this study to generate high-resolution precipitation data has rapidly led to many interesting application prospects. For example, we can combine radar echo data with ground observation station data to conduct downscaling studies. Because radar data usually have a high spatial coverage, but the resolution may be low, Ground-based station data, though with fewer points, provide a high degree of accuracy. Through downscaling technology, the wide coverage of radar can be combined with the high accuracy of ground observation stations to generate high-resolution precipitation products. Future research could explore how to fuse these data more efficiently, as well as how to integrate other precipitation-related elements such as temperature, humidity, and wind speed.

While our Transformer model has shown encouraging performance, there are some limitations that need to be further addressed. In this study, we used a 3 h time step to process the data, which may not be sufficient to reflect the temporal dynamics of precipitation in detail. Some short precipitation events may be missed in the time interval, and even longer-lasting events may not be accurately captured because of the long time interval. Moreover, this study used 4 months of data, which may not be fully representative of long-term climate variability. Future studies can be extended to 1 year or multiple years of data to enhance the robustness of the model and its adaptability to different climatic conditions. In addition, when we take time into account, the dimensions of the data will change from two to three, which will undoubtedly increase the demand for computing resources. To overcome these challenges, we plan to utilize higher-performance GPU devices to investigate dynamic downscaling simulations with higher temporal resolution.

Author Contributions

Conceptualization, F.Y. and Q.Y.; methodology, F.Y.; software, F.Y.; validation, F.Y., K.W. and L.S.; formal analysis, F.Y.; investigation, F.Y. and L.S.; resources, L.S. and Q.Y.; data curation, F.Y.; writing—original draft preparation, F.Y.; writing—review and editing, F.Y. and L.S.; visualization, F.Y.; supervision, K.W. and L.S.; project administration, Q.Y.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China under Grant 62076137 and Grant 61931004, in part by the Natural Science Foundation of Jiangsu Province under Grant BK 20211539.

Data Availability Statement

The downscaling dataset acquired by the Weather Research and Forecasting (WRF) Model is openly available at https://zenodo.org/records/13932436, accessed on 15 October 2024.

Conflicts of Interest

Author Kai Wang was employed by the company Nanjing NARl Information & Communication Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Ghajarnia, N.; Liaghat, A.; Daneshkar Arasteh, P. Comparison and evaluation of high resolution precipitation estimation products in Urmia Basin-Iran. Atmos. Res. 2015, 158, 50–65. [Google Scholar] [CrossRef]
Held, I.M.; Soden, B.J. Robust responses of the hydrological cycle to global warming. J. Clim. 2006, 19, 5686–5699. [Google Scholar] [CrossRef]
Giorgi, F.; Im, E.S.; Coppola, E.; Diffenbaugh, N.; Gao, X.; Mariotti, L.; Shi, Y. Higher hydroclimatic intensity with global warming. J. Clim. 2011, 24, 5309–5324. [Google Scholar] [CrossRef]
Yu, Y.; Fu, L.; Cheng, Y.; Ye, Q. Multi-view distance metric learning via independent and shared feature subspace with applications to face and forest fire recognition, and remote sensing classification. Knowl. Based Syst. 2022, 243, 108350. [Google Scholar] [CrossRef]
Cheng, Y.; Fu, L.; Luo, P.; Ye, Q.; Liu, F.; Zhu, W. Multi-view generalized support vector machine via mining the inherent relationship between views with applications to face and fire smoke recognition. Knowl. Based Syst. 2020, 210, 106488. [Google Scholar] [CrossRef]
Michaelides, S.; Levizzani, V.; Anagnostou, E.; Bauer, P.; Kasparis, T.; Lane, J. Precipitation: Measurement, remote sensing, climatology and modeling. Atmos. Res. 2009, 94, 512–533. [Google Scholar] [CrossRef]
Duan, Z.; Bastiaanssen, W. First results from Version 7 TRMM 3B43 precipitation product in combination with a new downscaling–calibration procedure. Remote Sens. Environ. 2013, 131, 1–13. [Google Scholar] [CrossRef]
Fowler, H.J.; Blenkinsop, S.; Tebaldi, C. Linking climate change modelling to impacts studies: Recent advances in downscaling techniques for hydrological modelling. Int. J. Climatol. 2007, 27, 1547–1578. [Google Scholar] [CrossRef]
Groppelli, B.; Bocchiola, D.; Rosso, R. Spatial downscaling of precipitation from GCMs for climate change projections using random cascades: A case study in Italy. Water Resour. Res. 2011, 47, W03519. [Google Scholar] [CrossRef]
Wilby, R.L.; Wigley, T. Precipitation predictors for downscaling: Observed and general circulation model relationships. Int. J. Climatol. 2000, 20, 641–661. [Google Scholar] [CrossRef]
Sylla, M.; Gaye, A.; Pal, J.S.; Jenkins, G.; Bi, X. High-resolution simulations of West African climate using regional climate model (RegCM3) with different lateral boundary conditions. Theor. Appl. Climatol. 2009, 98, 293–314. [Google Scholar] [CrossRef]
Sachindra, D.; Perera, B. Statistical downscaling of general circulation model outputs to precipitation accounting for non-stationarities in predictor-predictand relationships. PLoS ONE 2016, 11, e0168701. [Google Scholar] [CrossRef] [PubMed]
Kirchmeier-Young, M.C.; Gillett, N.P.; Zwiers, F.W.; Cannon, A.J.; Anslow, F. Attribution of the influence of human-induced climate change on an extreme fire season. Earth’s Future 2019, 7, 2–10. [Google Scholar] [CrossRef]
Gutmann, E.D.; Rasmussen, R.M.; Liu, C.; Ikeda, K.; Gochis, D.J.; Clark, M.P.; Dudhia, J.; Thompson, G. A comparison of statistical and dynamical downscaling of winter precipitation over complex terrain. J. Clim. 2012, 25, 262–281. [Google Scholar] [CrossRef]
Schmidli, J.; Goodess, C.; Frei, C.; Haylock, M.; Hundecha, Y.; Ribalaygua, J.; Schmith, T. Statistical and dynamical downscaling of precipitation: An evaluation and comparison of scenarios for the European Alps. J. Geophys. Res. Atmos. 2007, 112, D04105. [Google Scholar] [CrossRef]
Sun, L.; Fang, Y.; Chen, Y.; Huang, W.; Wu, Z.; Jeon, B. Multi-structure KELM with attention fusion strategy for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5539217. [Google Scholar] [CrossRef]
Sun, L.; Zhang, H.; Zheng, Y.; Wu, Z.; Ye, Z.; Zhao, H. MASSFormer: Memory-Augmented Spectral-Spatial Transformer for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5516415. [Google Scholar] [CrossRef]
Sun, L.; Wang, X.; Zheng, Y.; Wu, Z.; Fu, L. Multiscale 3-D–2-D Mixed CNN and Lightweight Attention-Free Transformer for Hyperspectral and LiDAR Classification. IEEE Trans. Geosci. Remote Sens. 2024, 62, 2100116. [Google Scholar] [CrossRef]
Jing, W.; Yang, Y.; Yue, X.; Zhao, X. A comparison of different regression algorithms for downscaling monthly satellite-based precipitation over North China. Remote Sens. 2016, 8, 835. [Google Scholar] [CrossRef]
Chen, F.; Liu, Y.; Liu, Q.; Li, X. Spatial downscaling of TRMM 3B43 precipitation considering spatial heterogeneity. Int. J. Remote Sens. 2014, 35, 3074–3093. [Google Scholar] [CrossRef]
Ye, Q.; Huang, P.; Zhang, Z.; Zheng, Y.; Fu, L.; Yang, W. Multiview learning with robust double-sided twin SVM. IEEE Trans. Cybern. 2021, 52, 12745–12758. [Google Scholar] [CrossRef] [PubMed]
Bannister, R.N. A review of operational methods of variational and ensemble-variational data assimilation. Q. J. R. Meteorol. Soc. 2017, 143, 607–633. [Google Scholar] [CrossRef]
Wang, J.; Xu, Y.; Yang, L.; Wang, Q.; Yuan, J.; Wang, Y. Data assimilation of high-resolution satellite rainfall product improves rainfall simulation associated with landfalling tropical cyclones in the Yangtze river Delta. Remote Sens. 2020, 12, 276. [Google Scholar] [CrossRef]
Prein, A.F.; Langhans, W.; Fosser, G.; Ferrone, A.; Ban, N.; Goergen, K.; Keller, M.; Tölle, M.; Gutjahr, O.; Feser, F.; et al. A review on regional convection-permitting climate modeling: Demonstrations, prospects, and challenges. Rev. Geophys. 2015, 53, 323–361. [Google Scholar] [CrossRef]
Fu, L.; Zhang, D.; Ye, Q. Recurrent thrifty attention network for remote sensing scene recognition. IEEE Trans. Geosci. Remote Sens. 2020, 59, 8257–8268. [Google Scholar] [CrossRef]
Vandal, T.; Kodra, E.; Ganguly, S.; Michaelis, A.; Nemani, R.; Ganguly, A.R. DeepSD: Generating high resolution climate change projections through single image super-resolution. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017. [Google Scholar] [CrossRef]
Rodrigues, E.R.; Oliveira, I.; Cunha, R.; Netto, M. DeepDownscale: A deep learning strategy for high-resolution weather forecast. In Proceedings of the 2018 IEEE 14th International Conference on e-Science (e-Science), Amsterdam, The Netherlands, 29 October–1 November 2018; pp. 415–422. [Google Scholar] [CrossRef]
Wang, J.; Liu, Z.; Foster, I.; Chang, W.; Kettimuthu, R.; Kotamarthi, V.R. Fast and accurate learned multiresolution dynamical downscaling for precipitation. Geosci. Model Dev. 2021, 14, 6355–6372. [Google Scholar] [CrossRef]
Jing, Y.; Lin, L.; Li, X.; Li, T.; Shen, H. An attention mechanism based convolutional network for satellite precipitation downscaling over China. J. Hydrol. 2022, 613, 128388. [Google Scholar] [CrossRef]
Harris, L.; McRae, A.T.; Chantry, M.; Dueben, P.D.; Palmer, T.N. A generative deep learning approach to stochastic downscaling of precipitation forecasts. J. Adv. Model. Earth Syst. 2022, 14, e2022MS003120. [Google Scholar] [CrossRef]
Wang, F.; Tian, D.; Carroll, M. Customized deep learning for precipitation bias correction and downscaling. Geosci. Model Dev. 2023, 16, 535–556. [Google Scholar] [CrossRef]
Zhuang, Q.; Zhou, Z.; Liu, S.; Wright, D.B.; Gao, L. The evaluation and downscaling-calibration of IMERG precipitation products at sub-daily scales over a metropolitan region. J. Flood Risk Manag. 2023, 16, e12902. [Google Scholar] [CrossRef]
Nishant, N.; Hobeichi, S.; Sherwood, S.; Abramowitz, G.; Shao, Y.; Bishop, C.; Pitman, A. Comparison of a novel machine learning approach with dynamical downscaling for Australian precipitation. Environ. Res. Lett. 2023, 18, 094006. [Google Scholar] [CrossRef]
Yoshikane, T.; Yoshimura, K. A downscaling and bias correction method for climate model ensemble simulations of local-scale hourly precipitation. Sci. Rep. 2023, 13, 9412. [Google Scholar] [CrossRef] [PubMed]
Pierce, D.W.; Cayan, D.R.; Feldman, D.R.; Risser, M.D. Future increases in North American Extreme Precipitation in CMIP6 downscaled with LOCA. J. Hydrometeorol. 2023, 24, 951–975. [Google Scholar] [CrossRef]
Legates, D.R. Climate models and their simulation of precipitation. Energy Environ. 2014, 25, 1163–1175. [Google Scholar] [CrossRef]
Komurcu, M.; Emanuel, K.; Huber, M.; Acosta, R. High-resolution climate projections for the Northeastern United States using dynamical downscaling at convection-permitting scales. Earth Space Sci. 2018, 5, 801–826. [Google Scholar] [CrossRef]
Skamarock, W.C.; Klemp, J.B. A time-split nonhydrostatic atmospheric model for weather research and forecasting applications. J. Comput. Phys. 2008, 227, 3465–3485. [Google Scholar] [CrossRef]
Barrett, A.I.; Wellmann, C.; Seifert, A.; Hoose, C.; Vogel, B.; Kunz, M. One step at a time: How model time step significantly affects convection-permitting simulations. J. Adv. Model. Earth Syst. 2019, 11, 641–658. [Google Scholar] [CrossRef]
Melillo, J.M.; Richmond, T.; Yohe, G. Climate change impacts in the United States. Third Natl. Clim. Assess. 2014, 52, 150–174. [Google Scholar]
Osterreicher, F.; Vajda, I. A new class of metric divergences on probability spaces and its applicability in statistics. Ann. Inst. Stat. Math. 2003, 55, 639–653. [Google Scholar] [CrossRef]
Endres, D.M.; Schindelin, J.E. A new metric for probability distributions. IEEE Trans. Inf. Theory 2003, 49, 1858–1860. [Google Scholar] [CrossRef]
Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
Vitart, F.; Robertson, A.W.; Anderson, D.L. Subseasonal to Seasonal Prediction Project: Bridging the gap between weather and climate. Bull. World Meteorol. Organ. 2012, 61, 23. [Google Scholar]
Chang, W.; Stein, M.L.; Wang, J.; Kotamarthi, V.R.; Moyer, E.J. Changes in spatiotemporal precipitation patterns in changing climate conditions. J. Clim. 2016, 29, 8355–8376. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of the suggested STTA framework. In this diagram, the 0 box represents a specific feature or a processed feature map, and the * represents multiplying this feature map by another feature map element by element. This operation is used in neural networks to achieve the weighting of features, where different weights can emphasize or suppress different parts of the input features. In the attention mechanism, this operation can be used to apply attention weights to a feature map to highlight important features and suppress unimportant ones.

Figure 2. Details of the inception module.

Figure 3. Encoder module utilized to preprocess the low-resolution input data, ensuring it is appropriately formatted for subsequent transmission to the network.

Figure 4. Details of the ST-Transformer module depicted in Figure 1. The * in the 0 box usually stands for an element-wise multiplication operation, also known as the Hadamard product or dot product, where it means multiplying this feature graph by another feature graph element by element.

Figure 5. Seven subregions of CONUS.

Figure 6. Ground Truth and precipitation forecast output of Interpolator, ESPCN, SRCNN, Encoded-CNN, Direct-CNN, and STTA.

Figure 7. PDFs derived from Ground Truth, Interpolator, ESPCN, SRCNN, Encoded-CNN, Direct-CNN, and STTA precipitation computed based on an analysis of grid cells and temporal intervals across CONUS and its seven distinct subregions.

Figure 8. Relative frequency (expressed as %) of specific precipitation-related event characteristics: (a) lifetime average size, km² (x-axis is in log); (b) lifetime average intensity, mm/3 h; (c) duration in (3 h) increments; and (d) total volume, m³ (x-axis is in logarithmic scale), during the life of the event.

Table 1. MSEs (mm/3 h), computed across the entire Continental United States (CONUS) and its seven subregions (Equation (7)), at the median (50th) percentiles selected from all available time steps. The optimal (or suboptimal) results are highlighted in bold (or underlined).

	CONUS	Southwest	Northeast	Midwest	S Great Pl.	Northwest	N Great Pl.	Southeast
Interpolator	0.248	0.067	0.040	0.046	0.004	0.175	0.062	0.033
ESPCN	0.233	0.061	0.037	0.042	0.003	0.169	0.058	0.030
SRCNN	0.223	0.062	0.037	0.044	0.004	0.171	0.059	0.030
Encoded-CNN	0.218	0.061	0.037	0.040	0.003	0.154	0.055	0.029
Direct-CNN	0.211	0.059	0.036	0.043	0.003	0.166	0.054	0.029
STTA	0.205	0.057	0.035	0.040	0.003	0.157	0.052	0.027

Table 2. J–S distance (Equation (8)) utilized to gauge the resemblance of the PDFs across CONUS and seven subregions, comparing the Ground Truth with the forecasts from six predictive models. The optimal (or suboptimal) results are highlighted in bold (or underlined).

	CONUS	Southwest	Northeast	Midwest	S Great Pl.	Northwest	N Great Pl.	Southeast
Interpolator	0.169	0.318	0.240	0.239	0.208	0.375	0.299	0.153
ESPCN	0.163	0.216	0.185	0.145	0.137	0.163	0.277	0.152
SRCNN	0.163	0.327	0.176	0.211	0.178	0.318	0.259	0.149
Encoded-CNN	0.139	0.107	0.188	0.071	0.118	0.067	0.268	0.121
Direct-CNN	0.083	0.209	0.141	0.124	0.149	0.184	0.238	0.109
STTA	0.041	0.102	0.185	0.073	0.107	0.056	0.242	0.117

Table 3. Ablation experiment on the inception module. The optimal (or suboptimal) results are highlighted in bold (or underlined).

Type	Patch Size/Stride	Input Size	Output Size	MSE	J-S Distance
Conv2d	3 × 3/2	B × S × 1 × 64 × 128	B × S × 32 × 32 × 64	0.258	0.089
Conv2d	3 × 3/2	B × S × 32 × 32 × 64	B × S × 64 × 16 × 32
Conv2d	3 × 3/1	B × S × 64 × 16 × 32	B × S × 32 × 16 × 32
Conv2d	3 × 3/2	B × S × 32 × 16 × 32	B × S × 32 × 8 × 16
Inceptionx1 MaxPool	As in Figure 2	B × S × 1 × 64 × 128	B × S × 32 × 32 × 64	0.205	0.041
Conv2d	3 × 3/2	B × S × 32 × 32 × 64	B × S × 64 × 16 × 32
Conv2d	3 × 3/1	B × S × 64 × 16 × 32	B × S × 32 × 16 × 32
Conv2d	3 × 3/2	B × S × 32 × 16 × 32	B × S × 32 × 8 × 16
Inceptionx1 MaxPool	As in Figure 2	B × S × 1 × 64 × 128	B × S × 32 × 32 × 64	0.298	0.114
Inceptionx1 MaxPool	As in Figure 2	B × S × 32 × 32 × 64	B × S × 64 × 16 × 32
Conv2d	3 × 3/1	B × S × 64 × 16 × 32	B × S × 32 × 8 × 16
Conv2d	3 × 3/2	B × S × 32 × 8 × 16	B × S × 32 × 8 × 16
Inceptionx1 MaxPool	As in Figure 2	B × S × 1 × 64 × 128	B × S × 32 × 32 × 64	0.363	0.153
Inceptionx1 MaxPool	As in Figure 2	B × S × 32 × 32 × 64	B × S × 64 × 16 × 32
Inceptionx1 MaxPool	As in Figure 2	B × S × 64 × 16 × 32	B × S × 32 × 8 × 16
Conv2d	3 × 3/2	B × S × 32 × 8 × 16	B × S × 32 × 8 × 16

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, F.; Ye, Q.; Wang, K.; Sun, L. Successful Precipitation Downscaling Through an Innovative Transformer-Based Model. Remote Sens. 2024, 16, 4292. https://doi.org/10.3390/rs16224292

AMA Style

Yang F, Ye Q, Wang K, Sun L. Successful Precipitation Downscaling Through an Innovative Transformer-Based Model. Remote Sensing. 2024; 16(22):4292. https://doi.org/10.3390/rs16224292

Chicago/Turabian Style

Yang, Fan, Qiaolin Ye, Kai Wang, and Le Sun. 2024. "Successful Precipitation Downscaling Through an Innovative Transformer-Based Model" Remote Sensing 16, no. 22: 4292. https://doi.org/10.3390/rs16224292

APA Style

Yang, F., Ye, Q., Wang, K., & Sun, L. (2024). Successful Precipitation Downscaling Through an Innovative Transformer-Based Model. Remote Sensing, 16(22), 4292. https://doi.org/10.3390/rs16224292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Successful Precipitation Downscaling Through an Innovative Transformer-Based Model

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Problem Description

3.2. Network Structure

3.2.1. CNN Feature Encoder

3.2.2. ST-Transformer

3.3. Implementation

4. Data and Experimental Configuration

4.1. Data Description and Pretreatment

4.2. Comparing Models

4.3. Evaluation Criteria

5. Experimental Results and Analysis

5.1. Qualitative Visual Analysis

5.2. Quantitative Index Analysis

5.2.1. MSE

5.2.2. Probability Density Function

5.2.3. Jensen–Shannon Distance

5.2.4. Event-Based Precipitation Characteristics

5.3. Ablation Experiment

6. Conclusions and Discussion

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI