3.1. Selection of Target Areas and Data Collection
This study targeted the Midwest region of the Republic of Korea, including Daejeon Metropolitan City, Sejong Special Self-Governing City, Chungnam Province, and Chungbuk Province. The area consists of 28 cities and gun (districts or counties in Korea), including two municipalities, 11 cities, and 15 gun (see
Figure 3). The advantage of the area is that it provides various geographical environments to select flood risks because it encompasses large cities, small and medium cities, coastal cities, mountainous regions, and rural areas.
The elements needed to construct an indicator database are summarized as statistical data and geographic information system (GIS)-based data. Demographic, social, and economic data were collected from the Korean Statistical Information Service (KOSIS), the statistical yearbooks of local governments, and the Statistical Yearbook of Natural Disasters. Meteorological data were collected from the Korea Meteorological Administration. All data were collected based on GIS for spatial analysis. The base year of the data is 2016, and statistical data for the ten years prior (2007–2016) were used. As shown in
Table 1, data were collected for all 28 indicators of four components. All indicators were normalized to values between 0 and 1 using the average values estimated for each region and the standard deviations. Thus, the larger the value of an indicator for a region is, the closer it is to 1. The indicators of the H, S, and C components are positively correlated with FRI, whereas the indicators of P are negatively correlated.
3.2. Selection of Indicators Using Factor Analysis and Principal Component Analysis
Factor analysis and PCA were performed on each of the four components (hydro-geology, socio-economics, flood protection, and climate), which each consisted of several indicators. First, the indicators were grouped by factor using factor analysis, and the indicators with the highest component point for each group were selected using PCA. The procedure can prevent the duplication of meaning of the indicators and reduce dimension of the indicators in each group.
Table 2 shows the results of the factor analysis. The hydro-geology, socio-economic, and protection components were classified into three groups, whereas climate was classified into two groups. Kaiser–Meyer–Olkin (KMO) [
34] and Barlett’s test of sphericity [
35] were used to determine the appropriateness of the analysis. The Kaiser–Harris measurement [
34] was used to select principal components that have an eigenvalue of 1 or higher (see
Table 2). The result of each component was determined to be significant because KMO remained at 0.5 or higher, and the probability value (p) remained below 0.05.
PCA was used to select indicators that have the most significant contribution for each group. Among 28 indicators, 17 were eliminated and 11 were chosen: (1) three indicators for hydro-geology (damage cost, urban rate, and lowland area rate); (2) three for socio-economics (total number of houses, financial independence rate, and dependence population); (3) three for flood protection (number of pump stations, drainage capacity, and number of public servants per resident); and (4) two for climate (frequency of intensive rainfall and probability rainfall), as shown in
Table 2.
3.3. Weight Assignment by Method and Calculation of Integrated Weights
3.3.1. Weight Assignment by Method
The factors selected are expressed as normalized values between 0 and 1, and each indicator is estimated using the assigned weights. The flood risk index can be quantified using the weight of each indicator. To this end, weights can be assigned by various methods, and three weight assignment methods were applied, as described in
Section 2.3. For the AHP, a survey was conducted with 30 respondents from academia and research. The survey was constructed in such a way that the importance of each indicator was compared in pairs. The terms of each indicator were defined and presented in the questionnaire to improve the accessibility for the respondents. The first-level hierarchy consists of four upper-level assessment components (hydro-geology, socio-economics, flood protection, and climate), and the second-level hierarchy consists of 11 lower-level assessment indicators.
For the CSS, a survey was conducted with 21 experts who have experience in work related to flood or wind damage and did not participate in the AHP survey. The questionnaire was structured in such a way that the sum of the four components presented and the sum of indicators for each component was 10. A sufficient explanation of the survey method was provided to supplement the questionnaire so that the respondents would not be confused.
For entropy weighting, Equations (1)–(6) from
Section 2.3 were used based on the data collected for each component. The weights are shown in
Table 3 and
Figure 4. In summary, the weights for the socio-economic component were low in the survey methods, while they were high in the entropy weight method. In particular, the weights were evenly distributed among the rest of the components other than the socio-economic component and thus gave a relatively identical position concerning importance.
The indicators within a component showed a different aspect in the survey and entropy methods: the entropy method showed higher weight values for specific indicators, while the surveys showed relatively similar weight values for the indicators. This was due to the unique characteristics of entropy, which increases when deviations between alternatives are low. Moreover, the deviations between the normalized values of the indicators were small. However, the number of public servants per resident indicator of the flood protection component and the annual precipitation indicator of the climate component were low, and divisions in indicators between regions were high, which resulted in low entropy weights.
3.3.2. Integrated Weight Assignment Using Bayesian Networks (BNs)
The estimated weights affect the outcome of the estimated flood risk. It is not easy to determine the weights for each component and indicator, particularly when the entropy weight is higher than the other weights, such as for the total number of houses in
Table 4. Thus, the BN method was used to estimate the combined weights while considering causal relationships between weights obtained from the AHP, CSS, and entropy techniques. First, a BN with 20 nodes and 19 links was constructed with AgenaRisk 10, as shown in
Figure 5. The BN was constructed in consideration of the relationships between the components and indicators. The pre-probability assigned to each higher node can be inferred directly from the conditional probability, and the deviation of the probability determines the post-probability of the lower nodes. That is, the post-probability (the integrated weights) can be derived from pre-weights (the current weights), the conditional probability of each component (hydro-geology, socio-econometrics, flooding protection, and climate), and its indicators. As each component was weighted separately, it will not affect its indicators and can be expressed as dotted-line links that have indirect influences.
Table 4 shows all of the probabilities (weights) of each component and indicator obtained from the configuration in
Figure 5 and the post-probabilities (integrated weights).
The estimated weight of each component was relatively uniform in the range of 0.20–0.28. The weight of the socio-economic component was low in the survey method but increased significantly in the entropy method. This indicates that the entropy weight contributed to conditional probabilities as a prior probability. Similarly, the other indicators within each component were adjusted adequately by prior and conditional probabilities. For example, the drainage capacity indicator of the flood protection component was weighted as 0.50 and 0.45 in two surveys, respectively, while it was weighted as 0.35 in the entropy method, and its integrated weight became 0.36.
Moreover, the annual precipitation of the climate component was weighted with a small value of 0.01 in the entropy method, but the integrated weight was 0.06 because it was weighted with 0.23 and 0.28 in the two surveys. The BN model has an effective and optimal decision-making capability to integrate different knowledge and data [
36,
37]. Thus, BNs are expected to be a new alternative in assigning weights between indicators.
3.4. Results of Calculation with InFRA
The final InFRA was estimated for hydro-geology, socio-economics, flood protection, and climate components in 28 cities in the Chungcheong Province using several formulas (see
Figure 4). As a result, InFRA did not show a significant gap between regions except for some areas and showed a flood risk of 0.3–0.5 in most places. The resulting values for Seosan (17), Dangjin (20), and Taean (27) were close to 0.7, despite their low risk from the socio-economic component. This occurred because the risk from the other components was high. Some village areas including Jeungpyeong (8) and Jincheon (9) showed a low InFRA level because they had a low level of flood protection and other components. The risk related to the hydro-geology component was high in the countryside because these areas are more influenced by flood damage, and there are more lowland areas than urban areas. In the socio-economic component, the indicators of the total number of houses and financial independence rate showed a high risk in large cities, followed by some villages that have a high dependent population.
In the flood protection component, large and medium cities showed a high level of protection, whereas villages showed a low level of protection because they lack flood protection systems. The flood protection component shown in
Figure 6 is expressed in the concept of “1-flood protection,” so it is interpreted accordingly. In the climate component, the basins were clustered in a continuous pattern and showed a constant flood risk, particularly in coastal areas according to the consistent measurement of the measurement stations in the Thiessen network along the coastal cities. This is attributed to the high frequency of intense rainfall in coastal areas, and thus, the frequency of intensive rainfall indicator is weighted higher than the annual precipitation indicator.
3.5. Comparison with Other Methods and Discussion
The proposed method was compared with other three methods used to assess flood risk: PFD, FDI, and RSA [
15,
17,
21]. The three methods are the most popular in a practical field because they are easy to collect data and simple to apply. These methods are briefly explained in
Table 5. However, the same assessment criteria must be used to compare the two sets of methods. The other assessment methods use various criteria with grades (1–5) or groups (A–D), and thus, comparing the methods using the same set of criteria is not appropriate. Therefore, they were compared in an alternative way using risk values between 0 and 1 instead of using grades or groups for a consistent comparison. In all the assessment methods, the risk of flood increased when the risk value was closer to 1, indicating that appropriate measures need to be taken for flood mitigation.
The other assessment methods generally showed high flood risk in cities and low risk in villages. In particular, flood risks were high in Daejeon, Cheongju, Chungju, and Cheonan but low in Yeongdong, Jincheon, Goesan, and Geumsan. The other assessment methods were polarized in urban and rural areas and showed large regional variations compared to InFRA (see
Figure 7). It seems that the duplicated meaning in the construction of indicators and the insufficient level of flood protection in cities are major reasons for such results. Nevertheless, indicators such as population, financial independence rate, and infrastructure are typically high in urban areas. Thus, the other risk assessment methods are considered to have produced somewhat overestimated values because they use a system that would inevitably estimate large flood risk in large cities.
This comparison was qualitative, and a quantitative comparison is necessary. Therefore, the methods were validated by analyzing the area under the curve (AUC) of the receiver operating characteristic (ROC) curve. The ROC curve is a graph where the x-axis shows the specificity, which indicates the probability of the estimated value being false. The y-axis shows the sensitivity, which indicates the probability of the estimated value being true. That is, the evaluation method is better if the risk assessment is more likely to be correct and has a lower false probability rate. A higher AUC indicates higher accuracy of the prediction, and the accuracy of the results increases as AUC approaches 1.
The limit of this validation is that the factors to be compared must be considered to evaluate the accuracy of each method, but the assessment factors are already included as indicators, and there is no suitable criterion to apply. As the best alternative, data from [
38] were used, and total flood damage costs were derived, including injuries and flooding of farmland and cities, for the flood damage cost for public facilities (see
Table 6). Then, the integrated sums were normalized. If the value is 0.5 or higher, the corresponding region is considered to have high damage cost. The ROC analysis was then conducted.
According to the AUC of the ROC, the accuracy of InFRA was 0.67, while that of PFD, FDI, and RSA was less than 0.5 (0.296, 0.417 and 0.174, respectively). Thus, they were withdrawn from the assessment of flood damage cost (see
Figure 8). In other words, the other risk assessment methods were revealed to be inappropriate for assessing flood damage costs. The evaluation showed that InFRA is better than the classic methods for assessing flood risk and could thus be applicable in the field.