**2. Materials and Methods**

#### *2.1. Materials*

This study uses Himawari-8 satellite data as a model input to reconstruct the radar composite reflectivity (CREF, unit dBZ), which is the maximum reflectivity from any of the reflectivity angles of the weather radar. Usually, when the CREF > 35 dBZ, a severe convective weather (SWC) can be considered to occur.

#### 2.1.1. Himawari-8 Satellite Data

The Himawari-8 satellite data can be downloaded from http://www.eorc.jaxa.jp/ ptree/index.html, accessed on 1 June 2020, which includes visible bands (central wavelength ranges from 0.47 to 0.64 μm), near-infrared bands (central wavelength ranges from 0.86 to 2.3 μm), and infrared bands (central wavelength ranges from 3.9 to 13.3 μm), with a total of 16 bands and by collecting data on the distribution of clouds, air temperature, wind, precipitation, and aerosols. In order to produce a generalized model that can be used during both daytime and nighttime, only infrared bands were chosen in this study. Band 12 is abandoned because it characterizes O3 content.

In addition, the brightness temperature differences (BTDs) between bands can also characterize cloud property information and facilitate the capture of severe convective regions [37]. Therefore, according to previous studies [37–39], 17 bands in total are chosen or calculated as the model input, including 9 single infrared bands, and 8 BTD bands, as shown in Table 1.

**Table 1.** 17 satellite bands selected or calculated in this study, along with physical meaning of each band. '-' indicates minus. E.g., tbb08-tbb10 (Band 08 minus Band10) indicates the BTDs between band 08 and band 10.



**Table 1.** *Cont.*

The temporal resolution is 10 min and the spatial resolution is 2 km. The latitude and longitude ranges of 20◦N–40◦N, and 110◦E–130◦E are used in this study.

#### 2.1.2. Composite Reflectivity (CREF)

The output variable used in the reconstruction model in this study is the composite reflectivity (CREF), which is obtained from the China Meteorological Administration. The CREF data have a 10 min time interval before June 2016, 6 min time interval in July 2016 and beyond, and the spatial resolution is 1 km. The latitude and longitude ranges of the study area are consistent with the selected range of Himawari-8 satellite data, specifically 20◦N–40◦N, and 110◦E–130◦E.

The data (both the satellite data and the CREF) from May to October for the period 2016–2018 are used in this study.

#### 2.1.3. GPM Precipitation Data

The Global Precipitation Measurement (GPM) is the next generation of the Global Satellite Precipitation Measurement Program carried out in collaboration with NASA and JAXA. The precipitation and radar CREF have a certain correlation, although the data of GPM are difficult to fully quantify in regard to the effectiveness of CREF reconstruction, the data can qualitatively verify the effectiveness of the models in areas without radar coverage, which can be used as supplementary information to indicate the area of severe radar echoes [26].

#### 2.1.4. Data Preprocessing

#### Spatial and Temporal Matching

The Himawari-8 satellite data are matched with the spatial and temporal resolution of CREF data as features and labels of the model, respectively. At the temporal level, satellite data and radar data that do not match are discarded to ensure that they are consistent in time (the time difference is less than 5 min). Spatially, the CREF data are sampled onto a network with a spatial resolution of 2 km, maintaining the same spatial resolution as the Himawari-8 satellite data.

#### Normalization

Data standardization processing can increase the learning ability of the model, improve the speed of the convergence, and avoid the difficulty of the model's training due to the non-uniformity of magnitudes.

In this study, both the satellite data and the CREF data are normalized by z-score normalization. The formula is as follows:

$$\alpha^\* = \frac{\mathfrak{x} - \mu}{\sigma} \tag{1}$$

where *μ* and *σ* are the mean and variance of the original data, respectively. *x* denotes the original data, and *x\** denotes the result after z-score normalization.

#### *2.2. Method*

#### 2.2.1. Satellite to Radar U-Net

U-Net has been shown to demonstrate good performance in the reconstruction of the radar data in previous studies [23–27]. We use the U-Net architecture to construct a CREF reconstruction model, namely satellite-to-radar U-Net (STR-UNet, Figure 1).

**Figure 1.** The STR-UNet architecture.

Overall, the STRU-Net designed in this study is of an encoder–decoder structure [27]. The left side of the network is often referred to as the contracting path and the right side is referred to as the expanding path. The shortcut in the middle is called the jump connection layer, which is also known as the feature splicing layer.

The left half of the model (contracting path) is used for feature extraction, which is repeatedly composed of convolution blocks and 2 × 2 pooling layers, and each convolution block contains the 3 × 3 convolution layer, batch normalization layer, and ReLU activation function. The input size of the model is a 64 × 64 × 17 satellite image, where 64 × 64 represents the length and width of the satellite image after padding, and 17 represents the number of input channels (i.e., bands). Then, after each convolution block and pooling, the number of feature maps is doubled and the length and width are halved, respectively.

The right half of the model (expanding path) performs the up-sampling operation, which is composed of several transposed convolution layers, feature splicing layers, and convolution blocks repeatedly, and the convolution block also encapsulates the batch normalization layer, 3 × 3 convolution layer, and ReLU activation function. In the expanding path, first we perform transposed convolution on the feature map obtained on the contracting path; next, the obtained feature map is spliced on the channel with the feature map at the corresponding position on the contracting path; then, the convolution operation is performed on the feature map after splicing, and so on and so forth. After each transposed convolution and convolution block, the number of feature maps is halved, and the length and width are doubled. In the last layer of the model, a 1 × 1 convolution layer maps the tensor of 32 channels to 1 channel, which in turn yields a target image of size 64 × 64 × 1. For this study, the reconstruction of CREF data is completed.

STRU-Net combines the low-resolution information in down-sampling process and high-resolution information in up-sampling process, and applies long-range jump connection combined with the feature details from the shallow convolution layer at the bottom of the satellite images, which can effectively compensate for the lack of spatial information of satellite images during the down-sampling process, and help the network to achieve

more accurate localization. It is very important for reconstructing accurate radar data and boundary information.

2.2.2. Research Scheme of the CREF Reconstruction

This paper aims to construct a satellite reconstruction model with satellite data that is suitable for monitoring severe convective weather in the ocean without deploying a radar. In order to achieve this objective, we designed the following research scheme, as shown in Figure 2.

**Figure 2.** Research scheme of the CERF reconstruction.

Step 1. Preprocess the dataset, as described in Section 2.1.4.

Step 2. Build four STR-UNet models with different underlying surfaces. As shown in Figure 3 (Left), Region A includes four different underlying surfaces: land, coast, offshore, and sea. Four STR-UNet models, namely Land-Model, Coast-Model, Offshore-Model, and Sea-Model, are constructed.

Step 3. Train and test the STR-UNet models. The first 24 days of each month (May to October) in 2016 and 2017 are used as the training set, and the remaining days of these months are used as the validation set. The data in 2018 are used as the test set. It is worth noting that each model is trained and tested on its own underlying surface data. For example, the Coast-Model is trained and tested by using the data from the coastal area.

Then, the performances of the four STR-UNet models on the oceanic areas are assessed. The orange box regions, as shown in Figure 3 (Right), are defined as oceanic areas that are not overlapping with the "offshore" areas shown in Figure 3 (Left). It is difficult for us to obtain radar CREF from the ocean, and at this time, the offshore radar CREF has relatively high accuracy and its data comes from the ocean, which means it has data features of oceanic underlying surface. Based on this, we assume that Region B can represent the

"ocean" underlying surface. The performance of the four models will be evaluated based on the test set (2018) in this area.

Step 4. Perform interpretability study of STR-UNet. See Section 2.2.4 for more details.

**Figure 3.** Schematic diagram of Region A (**left**) and Region B (**right**). Region A includes four different underlying surfaces: land (yellow box), coast (green box), offshore (cyan box), and sea (blue box). Region B only includes one underlying surface: offshore, and it does not coincide with the four underlying surfaces of Region A.

#### 2.2.3. Evaluation Metrics

In this study, root mean square error (RMSE) and mean absolute error (MAE) are used to quantitatively verify the performance of the four STR-UNet models built in this paper. RMSE and MAE can measure the deviation between the reconstructed CREF and the radar CREF. The equations are as follows:

$$\text{RMSE} = \frac{1}{n} \sqrt{\sum\_{i=1}^{n} \left( y\_i' - y\_i \right)^2} \tag{2}$$

$$\text{MAE} = \frac{1}{n} \sum\_{i=1}^{n} |y\_i' - y\_i| \tag{3}$$

where *n* represents the number of samples, *y <sup>i</sup>* represents the reconstructed CREF value, and *yi* represents the radar CREF value.

The classification metrics used in this study include probability of detection (POD), false alarm ratio (FAR), and critical success index (CSI), BIAS, as shown in Table 2. The model's ability to reconstruct for CREF above 35 dBZ, a critical issue for many industries, including aviation and ship navigation, was evaluated using the classification criteria.

$$\text{POD} = \frac{H \text{its}}{H \text{its} + M \text{issets}} \tag{4}$$

$$\text{CSI} = \frac{H \text{its}}{H \text{its} + M \text{isses} + False \text{ alarms}} \tag{5}$$

$$\text{CSI} = \frac{H \text{its}}{H \text{its} + M \text{is} \text{see} + False \text{ alarms}} \tag{6}$$

$$\text{BIAS} = \frac{Hits + False\,\,alarms}{Hits + Misss} \tag{7}$$


**Table 2.** Contingency table of the classification score parameters.

#### 2.2.4. Interpretability

In recent years, with the rapid development of deep learning, the interpretability of models has received more and more attention from scholars at home and abroad. Most deep learning models are "black box" models [40]. Their "black box" nature makes it difficult for scholars to understand the decision logic of the models in many cases, and thus they cannot fully trust the deep learning models. In order to improve the interpretability and transparency of deep learning models, this study investigates the interpretability of models.

For this study, a total of 17 features of satellite data were used as the input, after obtaining a model in different underlying surfaces, respectively, besides being interested in the effect of the model, it was essential to determine which features played an important role in the reconstruction.

In this paper, the DeepLIFT algorithm is used to conduct interpretability research on the above models. The DeepLIFT [36] method allocates the prediction results of the neural network to each dimension of the input. Its working principle is to compare the activation of each neuron with its "reference" activation, and back propagate the importance signal in order to assign a contribution score based on the difference. In essence, this is a method of tracing the internal feature selection of the algorithm, which uses the input differences of some "reference" inputs to explain the output differences of some "reference" outputs.

In this study, for each band feature, first, we conducted the normalization on Region B, and the vector with the "reference" value of all zeros was set to calculate the attribution of each feature, which is the contribution of each input feature to the results. Finally, the absolute value of attribution was taken, and the ratio of the absolute value of each feature attribution to the sum of the absolute values of all feature attributions was expressed as the importance of the feature.

Based on this method, several more important features on different underlying surfaces can be selected, and whether surface information affects the band selection of the model can be analyzed. Finally, we explored the relationship between the importance of bands and underlying surfaces, as well as the reasons why different underlying surfaces differed in importance.

#### **3. Results**

#### *3.1. Performances of the Four STR-UNet Models*

The performances of the four STR-UNet models are shown in Table 3.

First, in the process of advancing from land to coast, to offshore, and then to ocean, it can be found that the RMSE and MAE are getting smaller on the test set in Region A. However, it does not indicate that the model's performance is getting better. The main reason is that the CREF refers to the ratio of the radar waves reflected from clouds of different heights within a certain range received by the meteorological radar. With the increasing distance from the coastline to the ocean, the radiation elevation of the radar is also increasing. It means that at a distance from the radar, the CREF is only calculated based on a small number of basic reflectivity factors of higher elevations. As a result, the proportion of CREF larger than 35 dBZ decreases significantly as one moves from land to coast, offshore, and finally to the ocean, as depicted in Figure 4. Especially for the sea areas in Region A, the proportion of CREF larger than 35 dBZ is only a few tenths of that of the other three areas.


**Table 3.** Performances of the four STR-UNet models.

Then, we observe the performance of the four models on the test set in Region B. For all the metrics, it can be seen that when the models are evaluated in Region B, the performances of the Land-Model, Coast-Model, and Offshore-Model are significantly better than those of Sea-Model. Four metrics (POD, CSI, BIAS and FAR) of the Sea-Model are 0, and the RMSE and MAE are the largest among all these models. It indicates that the Sea-Model does not have the ability to reconstruct the CREF.

Since the Offshore-Model's data selection resembles that of Region B (without overlap), it can serve as a proxy for the highest level of precision that the reconstruction model is capable of. Compared to the Offshore-Model, the Land-Model's performances are a little worse on the test set in Region B for all the evaluation metrics.

It is a little complicated to evaluate the performance of the Coast-Model compared to the Offshore-Model. The RMSE, MAE, and FAR of the Coast-Model are a little larger (worse) than those of Offshore-Model. However, the Coast-Model has better POD, CSI, and BIAS than the Offshore-Model. This is due to the fact that, as shown in Figure 4, the coastal area has the highest fraction of CREF larger than 35 dBZ compared to the other three areas. It denotes a complicated meteorological situation affected by the complex underlying surface of the coast [41,42]. Thus, compared to the Offshore-Model, the prediction of Coast-Model is bolder.

**Figure 4.** Probability statistics of CREF data in different regions. The horizontal axis represents the range of CREF values, while the vertical axis represents the corresponding proportion. The different colors on the figure correspond to different underlying surfaces.

In conclusion, a mix of land-dataset, coast-dataset, and offshore-dataset can be taken into consideration when employing satellite data to reconstruct radar data. The Offshore-Model can give the medium results, while the Coast-Model can give bolder results. Due to the abundance of data on land area, the Land-Model can provide a good baseline for reconstructing the radar data, despite being slightly inferior to the Offshore-Model on the test set in Region B.

#### *3.2. Case Study*

In order to demonstrate the actual combat effect of the models, we selected several severe convective weather cases from the test set (UTC) to visually show the reconstruction effect of the models.

Typhoon is one of the important disastrous weather systems that affects the safety of people's lives and property. It often brings rainstorm, strong wind, and secondary disasters [43]. Typhoon "Yagi" was generated on 7 August 2018 (Beijing time, the same below) with the intensity of a tropical depression. The intensity of "Yagi" increased to a tropical storm on 8 August, moving towards the north by east, turning to the northwest at night on 9 August, and entering the eastern region of the East China Sea at night on 11 August [44]. During the influence of "Yagi", extreme precipitation and a large-scale rainstorm had been brought to the cities along the way, resulting in heavy economic losses. Figure 5 shows the GPM precipitation and radar echo distribution of severe convective events that occurred in the research area in the test set. It can be seen that the models can more accurately reconstruct the shape, location, intensity, and range of the convective center, whether it is a slightly lower intensity convective event or a severe convective event such as a typhoon. In addition, for areas beyond the radar coverage, the radar echoes can be also reconstructed, and the distribution of reconstructed CREF is quite consistent with the pattern of the GPM precipitation.

**Figure 5.** Radar echo map: comparative study of observation and reconstruction. The first column shows GPM precipitation distribution at different times; the second column shows the radar echo observed; the gray areas on the map represent areas outside the radar deployment range. The third, fourth, and fifth columns show the reconstructions of the Land-Model, Coast-Model, and Offshore-Model, respectively, and the last column represents the average of the three reconstructed models mentioned above.

#### *3.3. Results of Interpretability*

For the models selected above: Land-Model, Coast-Model and Offshore-Model, the DeepLIFT method is adopted to analyze the differences in feature importance under different underlying surfaces (land, coast, and offshore). The results are shown in Figure 6.

**Figure 6.** The importance of each type of band (cloud, water, and temperature) under different underlying surfaces (land, coast, offshore).

According to the previous description, and their physical meaning (Table 1), the 17 input features of satellite bands can be classified as satellite cloud-related features, satellite water-related features, and satellite temperature-related features. It is worth mentioning that for bands with more than one type of physical meaning, such as Band 11, the center wavelength is 8.6 μm, which can both measure water vapor and cloud phase state. When classifying the input features, Band 11 is classified as both satellite water-related features and satellite cloud-related features. It means that after calculating the importance of Band 11 using the DeepLIFT method, we will calculate the average importance of satellite water-related features along with other bands that measure water vapor. At the same time, the importance of Band 11 will be used along with bands that measure the cloud phase state to calculate the average importance of satellite cloud-related features. Similarly, we calculated the average value of the importance of the bands under each type. For the land's underlying surface, it can be intuitively seen that satellite cloud-related features are more important to the reconstruction, far outweighing the importance of satellite water-related features and satellite temperature-related features.

Overall, satellite cloud-related features are the most important, followed by satellite water-related features, and satellite temperature-related features are the least important. When the underlying surface changes to the coast, then to the offshore, the importance of satellite cloud-related features gradually decreases, but they still play an important role in reconstruction, while the importance of satellite water-related features gradually increases, which is also important for reconstruction. When the underlying surface is located on the ocean, it is clear that satellite water-related features are more important than satellite cloud-related features. The importance of satellite temperature-related features gradually decreases as the model changes to the ocean, compared with the former two, they are relatively unimportant in reconstruction.

In summary, during the transition of the model from the land to the ocean, for all the underlying surface cases, clouds have a great impact on the amount of solar radiation reaching the Earth and play a crucial role in the water cycle of the climate system [45,46]; the cloud phase state can also reflect the temperature and humidity state, and dynamic characteristics of the atmosphere to a certain extent [47]. In addition, water vapor has a

strong correlation with severe convective weather; the increase in water vapor content is conducive to the development of convective weather and it can easily cause the rapid growth of convective weather. Therefore, the satellite features characterizing cloud amount, cloud phase state, and water vapor play an important role in reconstruction. Secondly, for the underlying surface of the ocean, because the ocean has the characteristics of high heat capacity and high thermal inertia, it means that it needs more energy to make its temperature change greatly. Therefore, compared with the land's underlying surface, satellite temperature-related features have a lower significance in the reconstruction of severe convective weather.

#### **4. Conclusions**

In this study, we, respectively, sampled land, coast, offshore, and sea areas in the eastern area (20◦N–40◦N, 110◦E–130◦E), built four deep learning models using U-Net, and compared their accuracy. The results show that a mix of land-dataset, coast-dataset, and offshore-dataset can be taken into consideration when deploying satellite data to reconstruct radar data. This allows for more accurate reconstruction and monitoring of severe convective weather in the ocean without radar deployment.

In addition, in previous studies, there was a lack of research on the interpretability of the models. In this paper, the DeepLIFT method was used to obtain the feature importance ranking and the differences in different underlying surfaces. Overall, satellite cloud-related features are most important, followed by satellite water-related features, and satellite temperature-related features are the least important. The importance of satellite waterrelated features gradually increases, and the importance of satellite cloud-related features and satellite temperature-related features gradually decreases as the model changes from land to ocean. Then, the reasons for this phenomenon are briefly analyzed in combination with physical meaning. It is beneficial to the research in the oceanic area, which is of great significance for aviation, navigation, and the maintenance of people's lives and property safety.

In addition to the research tasks outlined in this study, future research will be conducted from the following aspects.

Firstly, the data used in this paper are infrared band data, while the data used in previous studies include lightning and other data. In subsequent studies, we will also increase the data types in order to further reduce the error and improve the reconstruction effect. Secondly, we used the DeepLIFT method to preliminarily analyze the differences in feature importance caused by the differences in the underlying surfaces of the model. However, using only one method to study the interpretability means the results lack credibility [48]. In the future, we will use more interpretable methods and optimize them to obtain more convincing interpretable conclusions. We hope the reconstruction method we have proposed will spur new developments in the deep learning and meteorological fields.

**Author Contributions:** Conceptualization, X.Y., J.X. and D.Z.; methodology, X.L., X.Y., Y.Y., J.X. and Z.W.; software, X.L., W.C., X.Y. and Y.Y.; validation, X.Y. and X.L.; formal analysis, X.Y.; resources, J.X. and Z.Y.; data curation, X.L. and X.Y.; writing—original draft preparation, X.Y. and J.X.; writing review and editing, Z.Y., Y.Y., W.C., Z.W. and D.Z.; visualization, X.Y. and X.L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This study was funded by the National Natural Science Foundation of China (Grant No. 42275158).

**Data Availability Statement:** Not applicable.

**Acknowledgments:** We would like to express our gratitude to the Japan Meteorological Agency (JMA) for freely providing the Himawari-8 satellite data used in this research.

**Conflicts of Interest:** The authors declare no conflict of interest.
