1. Introduction
Tropical cyclones (TCs) are severe weather phenomena that occur over tropical oceans and are characterized by strong winds, heavy rain, and storm surges [
1]. TCs are one of the most destructive natural disasters, causing significant damage to infrastructure and property and loss of life [
2]. With continued global warming and socioeconomic development, TCs are becoming increasingly harmful, and accurately determining TC intensity and wind speed structure is crucial for timely preparation and reducing the impact of these disasters [
3].
Synthetic Aperture Radar (SAR) is an active microwave sensor that estimates the sea surface wind field by measuring the intensity of radar echoes, known as the normalized radar cross-section (NRCS) [
4]. SAR sensors, such as Sentinel-1A/B and Radarsat-2, have high-resolution imaging capabilities and can acquire multi-polarization data, which can improve the accuracy of TC forecasting [
5,
6]. With the advent of high-quality SAR images, various algorithms have emerged to retrieve wind fields. The use of geophysical model functions such as CMOD has been effective in retrieving wind speed from the NRCS of vertical–vertical (VV) and horizontal–horizontal (HH) polarized images. The Sentinel-1 wind speed product is based on the algorithm COMD-IFR2 developed by Institut Français de Recherche pour l’Exploitation de la Mer, with model inputs of NRCS values, incidence angles, track angles, a priori wind, and ice information [
7]. However, for extreme cases such as TCs, a saturation phenomenon occurs, leading to a reduced accuracy in wind speed retrieval [
8]. The use of cross-polarization SAR images (VH/HV) has been shown to solve the problem of signal saturation under strong winds [
9,
10]. In recent years, several algorithms have utilized SAR images to retrieve TC wind speeds, including the C-2PO model [
11,
12] and the advanced C-3PO model [
13]. SAR images have also been used extensively to study the structures of TCs, such as symmetric double-eye structures, the radii of maximum wind speed, and the orientation of wind or waves within the storm [
14,
15,
16]. However, accurately inverting the structure of TCs remains challenging due to C-band attenuation caused by rainfall and the problem of signal saturation at high wind speeds [
15,
17,
18]. To the best of our knowledge, the progress of research on SAR wind speed reconstruction in TCs is still relatively limited.
Deep learning algorithms have shown great potential in processing SAR images of TCs to retrieve wind speed and analyze the structure of the storms. Boussioux et al. [
19] used a multi-modal deep learning (DL) framework to combine multi-source data for hurricane forecasting. A DL method based on topological patterns was used with the high-resolution data of Sentinel-1 to improve the accuracy of TC intensity detection [
20]. Progress has been made in exploring SAR wind speeds using DL, including deploying neural networks to invert sea surface wind speed from SAR and designing CNNs to extract SAR features to estimate TC intensity [
21,
22,
23,
24]. Using DL to construct forecast models based on Sentinel-1 and Sentinel-2 images has also shown good capability for offshore wind speed estimation [
25]. These studies illustrate the great potential of DL in SAR wind speed processing. However, rain bands can cause C-band signal attenuation and extreme wind speeds can result in signal saturation, producing low-quality data and making it difficult for SAR to invert the TC structure accurately. Therefore, it is urgent to generate high accuracy wind speed data using DL to improve the low-quality SAR wind speed data and reconstruct the complete TC structure. However, due to the difficulty of leveraging the benefits of piece-wise linear units in the generative context, adopting only a CNN could cause problems for generative tasks such as SAR wind speed reconstruction. This situation has changed with the emergence of generative adversarial networks (GANs) [
26], which correspond to a minimax two-player game in which a GAN framework with two networks, named the generator and the discriminator, are trained simultaneously. Furthermore, the ingenious approximation of some unsolvable loss functions by adversarial learning can promote the application of DL in generating tasks.
In order to fully exploit the ability of SAR to observe wind speed, especially during TCs, we propose a DL model for improving the low-quality data in SAR to reconstruct the TC structure directly. The reconstruction only targets the data in the low-quality region, retaining the original SAR high-precision wind speeds. In this way, the reconstructed results are based on real observations with a high resolution. This paper proposes a dual-level contextual attention GAN (DeCA-GAN) for reconstructing SAR wind speeds. A dual-level encoder is designed to help the model learn local and global TC features simultaneously. The neck part is designed to adaptively process the features extracted by the encoder and feed them to the decoder. The final result is generated under the guidance of the discriminator. The main contributions of this paper are as follows:
We propose a GAN-based deep learning model for directly enhancing the quality of SAR wind speeds and reconstructing the structure of TCs. The results are based on SAR observations and are close to reality, and thus can be used for TC intensity estimation, TC structure analysis, and forecast accuracy improvement.
The proposed model performs better than state-of-the-art models, especially for a large range of low-quality data and in high wind speed reconstruction. We also conduct ablation experiments to verify the components’ effectiveness in the proposed model.
The model is validated on ECMWF and SMAP, and the reconstructed results can be obtained in a few seconds using a single GPU. Compared to ECMWF, the reconstructed results achieve a relative reduction of 50% in the root mean square error (RMSE).
The rest of this paper is structured as follows:
Section 2 presents our framework for SAR wind speed reconstruction in detail. In
Section 3, we introduce the Sentinel-1 SAR TC images and present the experimental parameters and the results of reconstructed SAR wind speeds. We also conduct ablation experiments and visualize the feature maps inside the DeCA-GAN model to demonstrate its effectiveness. The discussions and conclusions are presented in
Section 4 and
Section 5, respectively.
3. Experiments
This paper proposes the DeCA-GAN for directly enhancing the quality of SAR wind speeds and reconstructing the structure of TCs. We use ECMWF for model training and validation. In this way, the model can learn the features of TC and reconstruct them from the remote sensing image. Specifically, the model is trained and validated using ECMWF and tested on Sentinel-1 images. In this section, we introduce Sentinel-1 images and describe the experimental platform and the environment configuration. The hyperparameter choices for the model are also listed. We compare our DeCA-GAN with other state-of-the-art models to prove its superiority. Most importantly, we use the model pre-trained on ECMWF to input Sentinel-1 SAR wind speeds and get a well-reconstructed TC structure, demonstrating the practical application value of our model. The reconstructed results are validated on Soil Moisture Active and Passive (SMAP) to verify the proposed method further. Finally, we implement some ablation experiments to verify the components’ effectiveness in the proposed model.
3.1. Data
To obtain TC wind speeds, we collected data from Sentinel-1A/B and selected 270 images that captured TCs. The images have been archived on the Copernicus Open Access Hub website (
scihub.copernicus.eu, accessed on 3 April 2023) and can be freely accessed. Sentinel-1A/B consists of two satellites, Sentinel-1A, and Sentinel-1B, equipped with C-band SAR sensors and active microwave remote sensors. There are four acquisition modes: Stripmap (SM), Interferometric Wide swath (IW), Extra-Wide swath (EW), and Wave (WV). The WV scan mode has only a single-polarization mode HH or VV. The other scan modes are available in single-polarization (HH and VV) and dual-polarization (HH + HV and VV + VH) modes. SMAP [
57] operates in a sun-synchronous orbit and is equipped with an L-band passive radiometer (1.4 GHz) with a resolution of about 40 km. Due to the low resolution of SMAP, we selected the L3 0.25-degree wind speed product from RSS for verification (
www.remss.com, accessed on 3 April 2023).
In this study, we collected 270 level-2 ocean (L2 OCN) product SAR wind speeds with a resolution of 1 km as the dataset for SAR wind speed reconstruction. The Sentinel-1 ocean wind field retrieval algorithm is described in [
7]. The L2 OCN product also provides an inversion quality flag that classifies the inversion quality of each wind cell in the grid as ‘good’, ‘medium’, ‘low’, or ‘poor’. The geographic locations of the OCN Sentinel-1A/B SAR wind speeds are shown in
Figure 7. ECMWF has a resolution of 0.25 degrees (ECMWF provides wind data for a 0.125-degree horizontal grid after the end of 2009) and a time resolution of 3 h. The ECMWF is selected at the closest time to the SAR data acquisition and interpolates to the same resolution as Sentinel-1 [
7]. We use ECMWF as labels to train and validate the model and to test Sentinel-1 A/B SAR wind speeds. It is worth noting that during validation and testing, we kept the ‘good’ and ‘medium’ data according to the inversion quality flag and masked the ‘low’ and ‘poor’ data as low-quality data.
3.2. Experiment Configuration and Implementation Details
Our experiments were carried out on a server with an Intel Xeon E5-2680 v4 CPU and two Tesla M60 graphics cards (two cores each), with a total video memory of 32GB. The operating system was Ubuntu 18.04.6, with Python 3.9 and CUDA 11.2.
For the model training, 270 TC images were split into training and validation sets at a ratio of 8:2. We used the Adam optimizer ( = 0; = 0.9) and set a batch size of 4 × 10 (multi-GPU training), which is the maximum load of our server. We fed 40 images into the model for each iteration and updated the parameters. We randomly cropped 160 × 160-sized sub-images from ECMWF during training. We also used random smearing to create a mask the same size as the input image and reproduce the problem of SAR low-quality data. The minimum width of the smeared lines is 1 pixel, which helps the model better adapt to the small-sized low-quality parts, and the maximum width is 40 pixels. Moreover, the minimum smearing area is 5%, and the maximum is 75%. This way of creating a mask makes the model generalizable and reduces the possibility of overfitting compared to a fixed mask. In addition, randomly cropped images from each original TC image can also reduce the amount of calculation, which avoids using a too-small batch size on limited computer resources and improves the training speed.
The learning rates for the generator and the discriminator were set to . After convergence, we reduced the learning rate to and performed several iterations to achieve better results.
3.3. Evaluation Metric
Considering the difference between RGB data and ours, we carefully chose four evaluation indicators to analyze the reconstruction effect of TCs: the structural similarity index metric (SSIM), the peak signal to noise ratio (PSNR), the root mean square error (RMSE), and the correlation coefficient (R).
Assuming there are images
x and
y, SSIM can be used to calculate their similarity, whose range is (0–1). The closer the value is to 1, the more similar the images are. SSIM is defined as follows:
PSNR is used to evaluate the image quality in units of dB, where larger values indicate less distortion and a better image quality. Letting the original image
I and the reconstructed image
K have dimensions
, PSNR is denoted as:
where
is the maximum value in image
I. Before SSIM and PSNR are calculated, the image is first normalized. Both SSIM and PSNR are commonly used evaluation metrics in image inpainting and reconstruction.
Moreover, for TC wind speed data, we use the RMSE, which is an evaluation method that is more common in wind speed inversion tasks:
where
G represents the low-quality regions and
denotes the number of pixels in this area. In this paper, we only calculate the RMSE for the area that needs to be reconstructed to verify the effect of DeCA-GAN on TC reconstruction.
We also choose R as a key evaluation metric, which is used to measure the degree of linear correlation between variables under study:
3.4. Experimental Results
In this section, we compare the proposed method with three other state-of-the-art baselines in
Section 3.4.1 and directly test our model on Sentinel1-A/B wind speeds in
Section 3.4.2. Furthermore, the reconstruction results are further verified with SMAP. To ensure fairness, we use the same hyperparameters for training and validation on ECMWF.
PConv [
51] proposes to replace vanilla convolution with partial convolution to reduce the color discrepancy in the missing area.
GatedConv [
50] is a two-stage network: the first stage outputs coarse results and the second stage is for finer ones. This structure can progressively increase the smoothness of the reconstructed image. The work also proposes an SN-PatchGAN for training.
AOT-GAN [
30] aggregates the feature maps of different receptive fields to improve the reconstruction effect of high-resolution images.
3.4.1. Comparison of DeCA-GAN to Baselines
In this section, we compare the proposed model with three baselines and show the advantages of DeCA-GAN on high wind speeds and large-range low-quality data reconstruction. The quantitative comparison results can be found in
Table 1. It shows that our model outperforms the state-of-the-art methods regarding all metrics, especially in the RMSE. Specifically, our model outperforms AOT-GAN [
30], the most competitive model.
We conducted two extra quantitative experiments for further verify that the proposed model has better performance on a large range of low-quality data and high wind speed reconstruction. First, we conducted quantitative comparisons under different low-quality data ranges. As shown in
Table 2, DeCA-GAN has apparent advantages in reconstructing SAR wind speeds that contain more than 20% low-quality data. Notably, the reconstruction results of the three models (GatedConv, AOT-GAN, and DeCA-GAN) exhibit RMSE values of 1.21 m/s, 1.10 m/s, and 0.95 m/s with under 60–80% of low-quality data, respectively. We find that GatedConv is slightly better than the proposed model regarding PSNR in 0–20% low-quality ranges, but with the increase in low-quality data, the attenuation is more serious. It is worth mentioning that the proposed model outperforms GatedConv in the rest of the metrics. This shows that the proposed model has better stability for reconstructing a large range of low-quality data.
Second, we conducted quantitative comparisons for different wind speed segments, especially for the low and high wind speed parts. As shown in
Table 1, the improvements of the proposed model are not very obvious in SSIM and R. This is mainly due to data imbalance in the TC reconstruction task. To explain this, we draw
Figure 8 to highlight the advantages of our DeCA-GAN in reconstructing low and high wind speeds. We group the data in the validation set, and each group has a wind speed range of 2 m/s. There are 2,029,266 collocations in low-quality data in our validation set. However, the main structures of a TC (the wind wall and the wind eye, which have wind speeds lower than 2 m/s or greater than 30 m/s) only account for a small part, which makes it difficult for the model to learn the TC structure effectively from the training set. It is not complicated for DL models to reconstruct the wind speed outside the TC that contains a large number of samples, but the TC structure is the primary challenge. This results in the model’s improvement in reconstructing the TC structure being averaged out, and only a slight improvement of DeCA-GAN in SSIM and R. Notably, SSIM evaluates the similarity of the whole image. Non-low-quality data are also involved in the calculation. As shown in
Figure 8, we calculated the RMSE under different wind speed segments, showing that the slight improvement in SSIM and R comes from the TC structure. The model has a good reconstruction effect for wind speeds with a large sample size, while for a small sample size, the RMSE is relatively higher. Compared with other models, when the wind speed is lower than 2 m/s or greater than 30 m/s, DeCA-GAN can overcome the data imbalance problem and obtain an improvement of 1.70 m/s in the RMSE at 34–36 m/s wind speeds.
3.4.2. Reconstruction on Sentinel1-A/B
In this section, we directly input Sentinel-1 SAR wind speeds and validated the reconstruction results on ECMWF and SMAP. The results of the proposed method are shown in
Figure 9. SAR is affected by rain bands and extreme wind speeds, which generate a large amount of low-quality data and thus make it difficult to observe TCs accurately. We masked ‘low’ and ‘poor’ data as low-quality data based on the quality flags provided in the L2 OCN product. The proposed model is able to enhance the low-quality data of SAR and reconstruct the entire TC structure from the Sentinel-1 SAR wind speeds with results similar to ECMWF. In addition, the proposed model can still predict an accurate structure based on the TC features learned from the training set when the low-quality data region is large, even when the whole TC structure is masked. In particular, low-quality regions including very high wind speeds at the wind wall and low wind speeds in the wind eye can be well reconstructed at the correct locations.
As shown in
Table 3, our model can improve the low-quality data in SAR wind speeds and achieve a relative reduction of 50% in the RMSE. Due to the slight time difference between Sentinel-1 SAR image imaging and ECMWF winds, the position and structure of the TC in the image will differ, so the evaluation metric will be reduced accordingly. Specifically, the RMSE is 2.60 m/s, the R is 0.777, the SSIM is 0.907, and the PSNR is 28.02.
To further verify the effectiveness of the proposed model, we also compared the reconstruction results with SMAP data. We searched 25 SMAP datasets to match Sentinel-1 and obtained a total of 5515 matchups. The imaging time difference between these two datasets is within 60 min. The results show that the RMSE of the reconstructed Sentinel-1 image is 3.78 m/s, and the correlation coefficient (R) is 0.79, as shown in
Figure 10. These results indicate that the proposed model performs well in reconstructing TCs from SAR wind speeds. However, as shown in the scatter plot in
Figure 10, reconstructed high wind speeds (greater than 20 m/s) are significantly underestimated compared to SMAP. We believe this phenomenon is due to the underestimation of strong TC winds in the ECMWF [
58,
59]. Although the model can reconstruct TC images through learned features, it cannot avoid the limitations of the training set. Thus, the model does not tend to reconstruct extreme winds well, such as those in SMAP data. Additionally, it is worth noting that the RMSE and R values in the comparison are slightly lower than those obtained on the test set, which may be due to imaging time differences in the SMAP data.
3.5. Ablation Studies
We conducted ablation experiments to verify the effectiveness of the three components of DeCA-GAN (i.e., the dual-level encoder, the number of AOT blocks in the neck, and the GAN loss function). To ensure the validity and credibility of the ablation experiments, we use the same training strategy and hyper-parameters as those of the original DeCA-GAN, as well as a fixed random seed.
3.5.1. Dual-Level Encoder
In this paper, we use the CoT self-attention module to build a global branch to enhance the learning ability of the model. As shown in
Section 3.6, this global branch mainly learns the structure of the entire TC image. After the global feature map is combined with the local branch, it can help the model achieve great performance in the overall structure and detailed texture. We removed the global branch and used this as the baseline. This baseline uses the same neck and decoder as those of DeCA-GAN, so any improvement can only be due to changes in the encoder. As shown in
Table 4, the model using the global branch has better performance, and the RMSE is improved by about 0.16 m/s compared to the baseline.
3.5.2. Number of AOT Blocks
To study the influence of the number of AOT blocks on the model’s learning ability, we changed the neck of the DeCA-GAN model and attempted to stack four, five, six, seven, or eight AOT blocks.
Table 5 shows that the number of blocks greatly impacts the performance of the model. Using five AOT blocks in the neck achieves the best results, with an RMSE improvement of about 0.05 m/s compared to the next best design using six blocks.
In addition, in
Section 3.6, we visualize the feature map of the network with eight AOT blocks and find that the outputs of the 4th, 5th, and 6th blocks are better, which was also part of our inspiration when designing the DeCA-GAN.
3.5.3. Benefits of Adversarial Loss
Based on a GAN, this paper adopts an adversarial loss to solve the reconstruction blur caused by L1 (or L2) loss and enhance the reconstruction effect. The baseline does not use generator loss or discriminator loss functions. Compared with the proposed DeCA-GAN, the reconstructed results generated by the baseline are significantly worse in all indicators (see
Table 6).
3.6. Visualization of DeCA-GAN Internal Feature Maps
In this section, we analyze some intermediate processes of the DeCA-GAN in extracting TC features. The design idea of the encoder is to let the model learn global and local features simultaneously. Through the dual-level fusion and the attention selection of the ECA module, the encoder of the model can learn the key channels and features for reconstructing TCs more comprehensively. For the neck of the DeCA-GAN, AOT blocks can gradually generate a reconstructed sketch and send it to the decoder for up-sampling. We wrote a simple visualization applet to exhibit some feature maps learned by various parts of the model.
3.6.1. Features Learned by the Dual-Level Encoder
In this part, we show the features extracted by the dual-level encoder and the corresponding masked image in
Figure 11a.
Figure 11b shows the outputs of the third convolutional layer of the local branch. It can be seen that after three convolutional layers, the network obtains more blurred feature maps. The model pays more attention to extracting local features, especially the distribution of texture features at the edge of the low-quality data, and finds correlation inside and outside the area that needs to be reconstructed. Nevertheless, the feature maps of this branch do not reveal the TC structure. The role of the convolutional layer is to find the relationship between the pixels in the local area, which has a high sensitivity to texture features. However, it is also necessary to learn the overall TC structure from the image for tasks such as TC reconstruction.
As can be seen from the feature maps output by the last CoT block of the global branch in
Figure 11c, this branch pays more attention to the content of the entire image than the local branch due to the effect of self-attention module. The pixels that are significant for reconstruction results are given greater weights. Moreover, this branch appears to have learned the ring structure of TCs. These results prove that the global branch can quickly learn the overall TC structure, which could be helpful for the subsequent reconstruction of the neck and up-sampling. Precisely, when the high wind speed parts and the wind eye of the TC are masked, the model must judge the exact wind speed values of the high wind speed area, and the possible position of the wind eye has to be located based on the data that are not masked in the whole image.
With such an encoder design, the two branches of DeCA-GAN can indeed pay attention to global features while preserving local features. This enables the model to perform better on the task of TC reconstruction (see
Section 3.5.1).
3.6.2. Feature Integration by AOT Blocks
After passing the designed dual-level encoder, the model has learned the local and global features of the image. These learned features are fed into the AOT neck for further processing. The original input size in [
30] is
, each AOT block incorporates four receptive fields, and the maximum expansion rate is eight (six is used in this paper), which works well for high-resolution images. However, the image input size of our selection is
, which can contain the entire TC structure; this size is suitable for the dataset studied in this paper and does not require much video memory. In
Figure 12, we stack eight AOT blocks in the neck part of the DeCA-GAN to observe the features extracted from each block separately. The colors in
Figure 12a–h represent the relative relationships of the different positions extracted by the model. We believe that the number of stacked AOT blocks affects the performance of the network, and we find an interesting phenomenon by visualizing the output of each AOT block.
The features of the first four blocks are relatively elementary, and the structure of TC is distorted and incomplete. The output features of the fifth and sixth AOT blocks are better, and the cyclone center and wind wall are close to the ground truth. From the seventh block, there is a problem with the learned features; the high wind speed area begins to spread, which is inconsistent with the actual situation and the prediction accuracy of the central low-wind area also tends to deteriorate. Therefore, in this paper, the number of AOT blocks is used as a hyper-parameter to conduct ablation experiments, and it is found that the effect is best when using five blocks (see
Section 3.5.2).
5. Conclusions
In conclusion, this paper presents a DeCA-GAN model for improving low-quality SAR wind speed data and reconstructing TC structures. The reconstructed results retain the original SAR high-precision wind speed, and only the data in the low-quality region are improved to obtain results that are close to reality. In particular, the model works well for reconstructing high wind speeds and large ranges of low-quality data. The reconstructed results can be used for TC intensity estimation, data assimilation, TC forecast accuracy improvement, TC structure analysis, etc. Furthermore, the DL algorithm is very low-cost and fast.
The proposed method is based on an encoder–neck–decoder architecture, with two parallel branches that combine CNNs and the self-attention mechanism to extract local and global features. The neck part of the model consists of AOT blocks that extract contextual features and send them to the decoder to generate reconstruction results. We also introduce an ECA module to calculate channel attention in the model, which enables cross-channel interactions. In addition, we use a joint loss to improve the model’s performance. Through ablation experiments, we find that the global branch we designed and the selection of the number of AOT modules have positive impacts. We also find that using GAN loss significantly improves the reconstruction ability of the model. When applied to reconstruct SAR wind speeds, the model achieves an RMSE of 2.60 m/s, an R value of 0.777, an SSIM of 0.907, and a PSNR of 28.02, achieving a relative reduction of 50% in the RMSE. Additionally, comparing the reconstructed SAR wind speeds with SMAP data yields an RMSE of 3.78 m/s and an R value of 0.79. Overall, our results suggest that the proposed DeCA-GAN model is a promising approach for reconstructing SAR wind speeds in TCs.