**1. Introduction**

Coastal areas with high population densities and rapid growth and urbanization have relatively vulnerable structures to coastal flooding, such as the sea-level rise and storm surge due to climatic extremes [1,2]. The losses caused by these disasters have also continued to increase in recent years. The western North Pacific (WNP) is one of the oceanic regions most prone to typhoons [3–9]. Since China is located on the west coast of the WNP, it is greatly affected by typhoons, particularly along the east coast [10]. The strong winds, heavy precipitation, and storm surge of typhoons pose serious threats to China's social economy and national personal safety. For example, the super typhoon "Mangkhut" affected many provinces and regions over South China in September 2018. The number of people affected was close to 3 million, with ~1200 houses damaged and ~174.4 thousand hectares of crops being affected. The direct economic loss exceeded CNY 5.2 billion (USD 77.5 million) [11].

Failure to properly manage water resources due to incorrect rainfall forecasts during the typhoon season can lead to serious flooding or water shortage, regardless of how well forecast and water management was carried out before the typhoon [10,12]. In recent years, however, the development of satellite observations and mathematical modeling, along with integration and data assimilation techniques using various observational datasets, typhoon tracking and intensity prediction have continuously improved [13–20]. Nevertheless, typhoon-induced rainfall prediction remains very difficult and less accurate than typhoon track prediction [21–29]. For example, Li et al. [26] established a non-parametric statistical method using numerical models and typhoon intensity predictions to estimate the maximum daily rainfall and three-day cumulative rainfall amounts. Previously, Ebert et al. [27] noted that a satellite-based tracking of tropical rain could improve the short-term prediction of typhoon-induced heavy rainfall. More recently, Kim et al. [28] hypothesized that typhoons with similar tracks have similar rainfall patterns, and demonstrated the use of tracks, intensities, and precipitation data for 91 typhoons affecting the Korean Peninsula over the course of several decades to establish a statistical model for forecasting typhoon-induced rainfall over that region.

Although typhoon-induced rainfall prediction models are constantly being improved, the rainfall conditions related to typhoons differ from region to region and most of the aforementioned methods were developed according to one or other specific regions [26–30]. While the establishment of a typhoon-induced rainfall prediction model requires accurate track and intensity forecasts; however, complex physical processes such as the interaction between typhoon and land also need to be considered. These factors may cause rapid changes in precipitation during the passage of typhoons [21,22]. Therefore, typhoon-induced rainfall prediction is particularly challenging work.

The purpose of the present study is to establish a new statistical prediction model based on the principle of track similarity, using fuzzy C-means clustering, intensity correction, and other methods to optimize typhoon-induced accumulated rainfall (TAR) forecasts over China. The following section introduces the data used to develop the prediction model and describes how the TAR of each typhoon in the western North Pacific in recent decades is determined. Then, in Section 3, typhoons with tracks similar to that of the target typhoon are selected. In addition, TAR correction is conducted based on typhoon intensity, and the optimal number of similar-track typhoons is selected for ensemble averaging. After substituting the previous typhoon data, the results of the prediction model are given. Finally, Section 4 provides a summary and conclusions, including a discussion of the advantages of this method as well as the limitations that can be improved in future work.

#### **2. Data and Methods**

#### *2.1. Data*

To establish the TAR prediction model, the daily rainfall data without any gaps between 1961 and 2017 from 537 meteorological stations in China (Figure 1a; http://data.cma.cn) were used, along with best-track data for a total of 1536 typhoons in the WNP were used during the period 1961–2017 (Figure 1b). Typhoon intensity correction was performed and the effects of ensemble averaging and typhoon similarity levels were analyzed using primarily the 55 tropical cyclone (TC) datasets affecting 75 meteorological stations in the southeast coastal area of China listed in Table 1. The 6-hourly location and intensity data for the typhoons, including the specific date, time, longitude, latitude, maximum wind speed, and typhoon number, were obtained from the Regional Specialized Meteorological Center (RSMC)—Tokyo.

Due to the proximity of typhoons to mid-latitude regions, typhoons will transition into tropical storms under the impacts of landfall, cold air mixing, and other factors, leading to a rapid weakening of their intensities. Nevertheless, the impact of the associated rainfall will impact large areas and generate disasters such as debris flows and floods that may cause losses of life and property. Therefore, in order to better estimate the rainfall that a typhoon can cause, the present study includes the period after each typhoon turns into a tropical storm.

**Figure 1.** Weather stations and typhoon track data used in the present study: (**a**) the locations of the meteorological stations (*n* = 537); and (**b**) the long-term average of the tropical cyclone (TC) track density in the western North Pacific (WNP) region during the period 1961–2017. The solid line in (**b**) indicates the location of the WNP subtropical high represented by 5880 gpm during the study period.


**Table 1.** The 55 typhoons used in the present study.

#### *2.2. Calculation of Typhoon-Induced Accumulated Rainfall (TAR)*

The first step in establishing the TAR prediction model is to calculate the TAR for each local station. The specific calculation process is as follows:


It was noted that a substantial error would arise if coexisting typhoons were used to establish a TAR prediction model, which would result in an inaccurate model forecast. To prevent this problem, typhoons of this type were discarded during the prediction model establishment process.

#### *2.3. Selection of Typhoons Using the Fuzzy C-Means Clustering Algorithm*

In the present study, the fuzzy C-means clustering (FCM) algorithm was used to select typhoons with similar tracks. This is a partitioning algorithm in which objects with the greatest similarities are grouped into the same cluster and objects with few similarities into separate clusters. The FCM was proposed by Bezdek [34] as an improvement on the hard C-means clustering method and enables an estimate of the degree to which each data point belongs to a certain cluster, i.e., the degree of membership. In detail, the FCM divides *n* vectors X*<sup>i</sup>* (*i* = 1, 2,..., *n*) into a number (c) of fuzzy groups and identifies the clustering center of each group so that the value function of the dissimilarity index is minimized. A fuzzy division is then used to assign a degree of membership between 0 and 1 and examine how well each data point belongs to each group. According to the FCM, the membership matrix U assigns the values of the elements between 0 and 1, while the constraints of the normalization dictate that the total membership of the dataset must always be equal to unity, as indicated by Equation (1):

$$\sum\_{i=1}^{c} u\_{ij} = 1, \quad \forall j = 1, \cdots, n \tag{1}$$

Then, the value function (or objective function) of the FCM is given by Equation (2):

$$f(\mathcal{U}, \mathcal{c}\_1, \dots, \mathcal{c}\_c) = \sum\_{i=1}^c f\_i = \sum\_{i=1}^c \sum\_{j}^n u\_{ij}^m d\_{ij}^2 \tag{2}$$

where *uij* is between 0 and 1, *c<sup>i</sup>* is the clustering center of the fuzzy group *i*, and *dij* = k*c<sup>i</sup>* − *xi*k is the Euclidean distance between the *i*-th clustering center and the *j*-th data point.

In the process of clustering typhoons using the FCM method, the membership coefficient *Wik* is calculated. This indicates the probability, *X<sup>i</sup>* , that each typhoon belongs to the target typhoon group *Ck* [28,35]. The value of *Wik* is determined by the partial derivative of the sum of squared errors (SSE) according to Equations (3) and (4):

$$SSE = \sum\_{k=1}^{K} \sum\_{i=1}^{n} \mathcal{W}\_{ik}^{p} d(\mathbf{x}\_{i}, \mathbf{c}\_{k})^{2} \tag{3}$$

$$\mathcal{W}\_{ik} = \frac{\{\frac{1}{d(\mathbf{x}\_i \mathbf{c}\_k)}\}^{\frac{1}{p-1}}}{\sum\_{K=1}^K \{\frac{1}{d(\mathbf{x}\_i \mathbf{c}\_k)}\}^{\frac{1}{p-1}}} \tag{4}$$

where *d*(*x<sup>i</sup>* , *c<sup>k</sup>* ) 2 is the distance between each typhoon track and the target typhoon track.

When using the FCM method to cluster all the typhoon tracks, these must first be divided into lines with the same number of location points. In the present study, all typhoons were uniformly interpolated according to the typhoon with the largest number of location points in its track data. In addition, the FCM membership coefficient was used as a criterion for screening typhoons that were similar to the target typhoon: the larger the coefficient value, the higher the typhoon similarity. For example, the eight typhoons with the greatest similarity to typhoons Usagi (#1319) and Nesat (#1117) according to the FCM method are indicated in Figure 2. Typhoon Usagi (#1319) made landfall on the coast of Fujian Province in southern China in 2013 and affected the surrounding areas of Taiwan and southern provinces of China (Figure 2a), whereas typhoon Nesat (#1117) passed through Hainan Province and Qiongzhou Strait in 2011 and then caused serious damage to the surrounding areas, including Hainan, Guangdong, and other provinces (Figure 2b). The results in Figure 2 indicate that the tracks of typhoons Nuri (#0812) in 2008 and Sharon (#9404) in 1994 are the most similar to those of Usagi (#1319) and Nesat (#1117), respectively.

**Figure 2.** Top eight typhoon tracks most similar to those of: (**a**) Usagi (#1319) and (**b**) typhoon Nesat (#1117). The identification number and similarity level of the selected typhoons are indicated in the key.

#### **3. Results**

#### *3.1. Correcting the TAR Using Typhoon Intensity Information*

Since it is impossible for different typhoons to have exactly the same intensity and structure, every typhoon is unique. Therefore, it is not theoretically possible to accurately predict the amount of rainfall caused by a typhoon based only on the track of one typhoon only. In other words, even when two typhoons have exactly the same tracks, differences in their intensities will result in different rainfall amounts, with higher intensity typhoons usually resulting in more rainfall [36]. Therefore, a typhoon wind intensity correction (TWIC) was used in the present study to further reduce the error in the TAR prediction model. The effects of the TWIC and ensemble averaging were first assessed using the training datasets of 55 TCs and then verified for model performance later in Section 3.3.

The eastern and southern coastal areas of China were selected as target areas for prediction during the training of this model because these are the areas that are most frequently affected by typhoons, whereas the inland areas of China are rarely affected. In the process of TAR correction based on TC wind speed, data from typhoons affecting 75 weather stations along the southeastern coast of China

(Pearl River Basin and Southeast River Basin) were used. Typhoons that occur simultaneously in the same region were not used for this process, as it is difficult to obtain their individual TAR periods and rainfall amounts accurately.

After processing the data, the 55 most representative typhoons with high data accuracy and their corresponding similar-track typhoons were finally selected. The TC wind speed and average rainfall values during the passage of these typhoons were then calculated from the data obtained from 75 stations in the southeast coastal area of China. Using these data, the linear regression equation relating the TC wind speed of the 75 weather stations and the average TAR during typhoon passage was obtained (Figure 3) and the best fit was given by Equation (5):

$$
\overline{P}\_{\rm TAR} = 0.654V + 10.891\tag{5}
$$

**Figure 3.** Linear relationship between the TC wind speed (V, m/s) and the average typhoon-induced accumulated rainfall (TAR) (*PTAR*, mm). The average TAR and TC wind speed were obtained using the most similar typhoons from 55 storms and 75 stations.

The equation shows that there is a positive correlation between the TC wind speed (V, m/s) and the average TAR (*PTAR*, mm). This shows a significant relationship (*p* < 0.05) between the TC wind speed and the average TAR (*R* <sup>2</sup> <sup>=</sup> 0.654 <sup>±</sup> 0.291). During the training process for the TAR prediction model, this linear equation was adopted to apply an intensity correction to all typhoons with similar tracks.

#### *3.2. E*ff*ects of Track Similarity, Ensemble Averaging, and Intensity Correction on the TAR Predictions*

The similarity level of the typhoon track, the number of ensemble averages, and whether the typhoon intensity is corrected may have an impact on the TAR prediction. To examine the influence of the typhoon track similarity level, the accuracy of the prediction result is judged by the root mean square error (RMSE), where a smaller error indicates a more accurate result. The results presented in Figure 4 (black line) show that the use of a single typhoon with the most similar track to predict the TAR values of the target typhoon in the target areas from 1961 to 2017 gives an average RMSE of 62.2 mm. However, if the typhoon with the second-best track similarity is used alone for the prediction, the RMSE is slightly decreased to 60.8 mm, while using only the typhoon rainfall data with the third-best track similarity decreases the average RMSE to 58.7 mm. Thereafter, the average RMSE continues to decrease as the similarity of the selected typhoon increases. In general, the prediction error decreases with the use of individual typhoons with increasing track similarity levels, but the use of only a single typhoon in the TAR prediction process may nevertheless result in an unsatisfactory error reduction even if its track is very similar to that of the target typhoon.

**Figure 4.** A comparison of the change in the RMSE of the prediction obtained with the increasing number of ensemble typhoons used before (black line) and after (red line) TC wind intensity correction. "X" represents the optimal ensemble number after TC wind intensity correction.

To further reduce the prediction errors, ensemble averaging (EA) was then considered [21,37]. To detect the influence of EA on the TAR prediction result, the number of high track-similarity typhoons used in the prediction at each station was increased step-by-step to form an ensemble, then their average TAR values were calculated and compared with the observed values. The results in Figure 4 (black line) indicate that as the number of typhoons in the ensemble increases, the RMSE initially decreases to a minimum of 51.5 mm with an ensemble of the 27 most similar typhoons, and gradually increases thereafter.

Then, to study the influence of typhoon wind intensity correction upon TAR prediction, the TAR obtained after TWIC was calculated using the EA method, and the results were compared with those obtained without TWIC in Figure 4. Here, the red line indicates a decrease of 0.5–0.9 mm in the average RMSE after the TWIC. In other words, the TWIC helps reduce the error in TAR predictions. In addition, the above results indicate an optimal ensemble number of 26 when using the EA method to predict the TAR.

Based on the results of the above analysis, the operational process of the statistical TAR prediction model used in the present study for the southeast coastal area of China is as follows:


The spatial distribution of the RMSE (mm) and correlation coefficients of 55 typhoons at 75 stations in the eastern and southern coastal areas of China from 1961 to 2017 estimate using this TAR prediction model during the training period are presented in Figure 5. Here, the RMSE of the 55 typhoons at the majority of stations is seen to be below 70 mm. The particularly large error and low correlation at the southern and eastern coastlines of China may be due to the impact of the co-existing rainy front in southeastern China and the relatively strong TC passing through the coastal areas, since a strong rainfall intensity with considerable regional variations can reduce the accuracy of TAR predictions.

**Figure 5.** Graphs showing the average RMSE (**a**) and correlation coefficient (**b**) calculated using the prediction model for 55 typhoons at 75 stations during the model training period.

#### *3.3. Model Performance*

Typhoon Sarika (#1621), which affected the coastal area of southern China in 2016, typhoon Nesat (#1709), which affected the coastal area of southeast China in 2017, and typhoon Utor (#0104), which passed between Hainan Island and Taiwan, were then used to evaluate the actual performance of the TAR prediction model. The three typhoons had different tracks as they approached and made landfall in China, with Sarika (#1621) crossing Hainan Island and moving northwestward to land along the southern coastline of Guangxi Province in China, while Nesat (#1709) landed in Fujian Province through the Taiwan Strait after passing through northern Taiwan and then moving southwest. The FCM approach was used to selected typhoons with the most similar tracks, then their TAR intensities were corrected according to the aforementioned equation and were then averaged. The most similar tracks obtained from the FCM analysis are presented in Figure 6. By averaging the TC wind intensity-corrected historical TAR records of these typhoons, the TAR values of Sarika (#1621), Nesat (#1709), and Utor (#0104) at the 75 stations in the southeastern coastal and southern coastal areas of China were predicted and compared with the observed values. The results indicate RMSE values of 35.7, 55.5, and 47.2 mm for typhoons Nesat (#1709), Utor (#0104), and Sarika (#1621), respectively. Thus, the error in the results of TAR prediction for two of the three typhoons using the proposed statistical model is lower than the average error (51.2 mm) obtained using 55 typhoons during the model training period.

**Figure 6.** Verification of the prediction model established in the present study: (**a**) the TC tracks used; (b–d) the selected typhoon trajectories most similar to those of (**b**) typhoon Nesat (#1709); (**c**) typhoon Utor (#0104); and (**d**) typhoon Sarika (#1621). The typhoon numbers and similarity levels are indicated in the key.

The observed TAR values for typhoons Nesat (#1709), Utor (#0104), and Sarika (#1621) are presented, along with the differences between observed and predicted values, in Figure 7. Here, the predicted TAR spatial pattern for typhoon Nesat (#1709) is seen to be very similar to the observed outcome, except that the TAR for part of the area farther away from the coast is overestimated. For typhoon Utor (#0104), the predicted results show significant differences in the Southeastern and Pearl river basins, being slightly overestimated in the former and underestimated in the latter compare to the observation. For typhoon Sarika (#1621), the distribution of predicted TAR values in the southern and southeastern coastal areas of China is very similar to the actual observations, although it is overestimated in Fujian Province and underestimated in Guangdong Province. These results are further illustrated by the violin plots (boxplot-density trace synergism) in Figure 8. In conclusion, the results of the TAR prediction model presented in this study are effectively similar to the actual observations and indicate the overall good performance of the model for predicting the spatial distribution of TAR values.

**Figure 7.** TAR estimation at 75 stations along the southern and southeastern coasts of China for typhoons Nesat (#1709), Utor (#0104), and Sarika (#1621): (**a**) the observed values; and (**b**) the difference between observed and predicted values.

**Figure 8.** Violin plots (boxplot-density trace synergism) of the TAR difference between the observed and predicted values for 75 stations along the southern and southeastern coasts of China for typhoons Nesat (#1709), Utor (#0104), and Sarika (#1621). The white circles indicate the median value of the TAR difference for 75 stations.

#### **4. Summary and Conclusions**

A statistical approach for predicting typhoon rainfall was developed herein based on the historical storm track, intensity, and rainfall data for 55 typhoons affecting the southeastern coastal areas of China from 1961 to 2017. Specifically, the statistical model was based on the principle of track similarity. Since tropical cyclones (TCs) with similar tracks tend to produce relatively similar rainfall patterns, therefore, historical TC rainfall data with similar tracks were used to predict the accumulated rainfall caused by the target TC. In addition, TC intensity correction and ensemble averaging for multiple similar TC tracks were used to reduce prediction errors. The fuzzy C-means clustering (FCM) algorithm was used to select the typhoons with the most similar tracks to that of the target typhoon. The typhoon-induced accumulated rainfall (TAR) values of the selected typhoons observed at each of the 75 stations were corrected according to typhoon intensity, and then averaged to provide an estimate of the target typhoon's TAR value at each station.

The results indicated an average error of 51.2 mm across the 75 stations in the coastal area of southern China. In addition, three typhoons that were excluded from the model training process (i.e., Nesat (#1709), Utor (#0104), and Sarika (#1621)) were subsequently used to generate a forecast according to their best-track data and, thus, verify the predictive performance of the model. The resulting RMSE for the predicted TAR of Utor (#0104) is slightly high (55.5 mm), while those of Nesat (#1709) and Sarika (#1621) were 35.7 and 47.2 mm, respectively. The latter two errors were lower than the average error (51.2 mm) obtained during the model training period, thus proving the feasibility of the

model for use in actual predictions. Subsequently, the spatial distribution results of the TAR values for the three typhoons predicted by this model at 75 stations were analyzed and found to be similar to the actual observations. This further demonstrated the overall good performance of the model in predicting the spatial distribution of the TAR values.

Nevertheless, the TAR prediction model presented in this study is limited to predicting only the accumulated rainfall caused by typhoons; it cannot predict the change in rainfall over time at all locations. Although numerical weather prediction (NWP) models are more advanced in this respect, the results predicted by the proposed statistical model have greater significance in certain contexts—especially for regulating reservoir discharge and flood control. The roles of the proposed model are to provide a more accurate forecast of the TAR at the target site, to coordinate the prediction of traditional numerical models, and to ensure that the region has responded well to typhoon-related rainfall measures.

Predicting rainfall caused by typhoons is challenging because, in addition to the track and intensity of the typhoon, many factors such as the regional terrain, the interaction of the typhoon with the land, and the speed of the storm translation can have certain effects upon the TAR. Notably, the TAR prediction model established in the present study did not consider these factors. Additionally, the number of typhoon samples used to build the TAR prediction model in the southern and southeastern coastal areas of China was not large. If additional factors are considered in future research, such as a correction for storm translation speed and size, and if the effective sample size is increased by using more typhoon data, the predicted results might become more accurate. In addition, confirmation is required via a comparison with NWP-based ensemble prediction models. All these approaches can help improve the performance of the TAR prediction model over China.

**Author Contributions:** Conceptualization, Resources, Formal analysis, Writing—Original draft, J.-S.K., and A.C.; Conceptualization, Methodology; I.-J.M., J.L., and J.-S.K., Writing—Review & editing, I.-J.M., Y.-I.M. and J.-S.K. All authors have read and agreed to the published version of the manuscript.

**Funding:** We appreciate the support of the State Key Laboratory of Water Resources and Hydropower Engineering Science, Wuhan University.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
