*3.2. Deep Learning Model*

### 3.2.1. Hyper Parameter

A deep learning model is a multilayer stack of simple modules and many of which compute non-linear input to output mappings. The hidden layers of 3 to 18 in the deep learning model can implement extremely complex functions of the original variables that are sensitive to minute details [23]. Given the small data set in this study, we chose a four-layers back propagation network. The number of neurons in the hidden layers was optimized for getting accurate output [21]. To select the number of two hidden layer's neurons, the training began with ten and three neurons separately and was repeated for more neurons. The number of neurons in the hidden layers were determined to be 40 and 5 separately for reducing the complexity and running time of the model.

There are different methods to avoid over fitting during the training process and to obtain models that are expert in generalizing the problem to be explained. For instance, stopping iteration in advance, regularization, dropout, and data set expansion. Stopping iteration in advance and data set expansion are not applicable to this model. The dropout method is a random deletion of half of the hidden layer nodes, hence, it is also not applicable. Finally, the regularization method was chosen. *3.2. Deep Learning Model* 

3.2.1. Hyper Parameter

Regularization in deep learning includes coefficients L1 and L2. We chose the most commonly used L2 regularization. L2 regularization is to add an additional regularization term to the loss function, presented in Equation (4). There are different methods to avoid over fitting during the training process and to obtain models that are expert in generalizing the problem to be explained. For instance, stopping iteration in advance, regularization, dropout, and data set expansion. Stopping iteration in advance and data

*Sustainability* **2019**, *11*, x FOR PEER REVIEW 9 of 21

the training set and 38 data as the test set. The remaining ten were used as validation sets. The time was recorded in minutes, and the hours are converted into minutes, which are calculated at 1440 min per day. When the time is x: y, the data will be processed as (60x+y)/144. For example, if the time is

A deep learning model is a multilayer stack of simple modules and many of which compute non-linear input to output mappings. The hidden layers of 3 to 18 in the deep learning model can implement extremely complex functions of the original variables that are sensitive to minute details [23]. Given the small data set in this study, we chose a four-layers back propagation network. The number of neurons in the hidden layers was optimized for getting accurate output [21]. To select the number of two hidden layer's neurons, the training began with ten and three neurons separately and

04:32, the change is 0.19. Other parameters need not be specially processed.

40 and 5 separately for reducing the complexity and running time of the model.

$$\mathbf{c} = \mathbf{c}\_0 + \frac{\lambda}{2\pi} \sum\_w w^2 \tag{4}$$

where c and c<sup>0</sup> are the new and old loss functions, separately, λ is the L2 regularization rate, n is the number of the sample and w is the weight. Through a large number of tests, the optimal parameters were obtained as shown in Table 5. Figure 5 demonstrates the model with two hidden layers, consisting of forty and five neurons respectively, five input layers, and one output layer. L2 regularization. L2 regularization is to add an additional regularization term to the loss function, presented in Equation (4). c= + 2 <sup>ଶ</sup> (4)


**Figure 5.** Scheme of deep learning network. **Figure 5.** Scheme of deep learning network.

### 3.2.2. Optimization Algorithm

The optimization algorithms used in the model including the adaptive moment estimation (Adam) [28], mini-batch gradient descent [29], and moving average model [30]. We used the adam algorithm to make the learning rates updating independent for parameters by calculating the first and second moment estimation of gradients. The mini-batch is to reduce the randomness for the data in a pool determine the direction of the gradient together and reduce the possibility of deviation during the descent process [31]. The model computer was a workstation with a good performance and the data volume of the model was less than 2000. The size of the training pool was usually chosen as the n-th power of 2. Hence, we chose 16 data as a training pool through experiments. The moving average model is to prevent the sudden change when updating parameters [32].

#### 3.2.3. Process of the Model

Definition structure and forward propagation used tensorflow framework [30]. The parameters of deep learning network needed only define weights w and b, and the normal distribution was chosen as the weight generating function. The activation function was relu function [23] and was used in the first hidden layer. The second hidden layer and the output layer are basically equivalent to linear regression. We selected Adam algorithm as back propagation algorithm. The learning rate is optimized. The mean square error loss function was selected for the data set was little (Equation (5)).

$$\mathbf{c} = \frac{1}{n} \sum (y - t)^2 \tag{5}$$

where c presents loss function, n is the number of the samples, y is the predictive value and t is the true value.

#### **4. Results**

We chose ten seismic cases with different parameters as the test data, of which the eighth, ninth, and tenth cases were Ludian earthquake in 2014, Yushu earthquake in 2010 and Wenchuan earthquake in 2008. Selection of specific test sets is shown in Table 6. Comparing with the intensity-based mortality estimation method in *Assessment of Earthquake Disaster Situation in Emergency Period* [33] (China-National Standard), presented in Equation (6) and (7). The Standard is an important basis for the government's decision-making on earthquake relief. After an earthquake, the primary tasks of the emergency work are to conduct disaster investigation and assessment in a short period of time and to provide the government with timely and necessary information on the disaster situation. Therefore, relevant departments have formulated the Standard in a comprehensive summary of the research methods and field practices of earthquake disaster assessment in China. Focus on rapid and dynamic disaster recovery and assessment.

$$N\_D = \sum\_{j=6}^{I\_{\text{max}}} A\_j \rho R\_j \tag{6}$$

$$
\ln R\_{\circ} = -44.365 + 7.516 I\_{\circ} - 0.329 I\_{\circ}^2 \tag{7}
$$

where *N<sup>D</sup>* is the number of deaths; *Imax* is the intensity of the meizoseismal area; *A<sup>j</sup>* is the distribution area of intensity value *j*; ρ is the population density; *R*<sup>j</sup> is the death rate corresponding to the intensity value *j*; *I<sup>j</sup>* is the intensity value j.



Table 7 and Figure 6 show the comparison results of the deep learning model, China-National Standard and true value. Table 8 presents the accuracy of the two methods in cases from1 to 7 and cases from 8 to 10, separately. The results demonstrate as follows:


(4) The China-National Standard model selected fewer parameters. The model was based on the experience of many experts and had been tested for many years. In the cases 1–7, the predicted values and the true values were all on the order of magnitude, but the performance on the cases 8–10 was far worse than the deep learning model, which was far from the true value and has no reference for the actual earthquakes.


**Table 7.** Comparison results.

**Figure 6.** The number of deaths of the deep learning model and the national standard on test data. And the errors of two models with actual death toll are also presented. (The results of earthquake cases 8–10 are not shown for the large difference will affect the results of other earthquake cases). **Figure 6.** The number of deaths of the deep learning model and the national standard on test data. And the errors of two models with actual death toll are also presented. (The results of earthquake cases 8–10 are not shown for the large difference will affect the results of other earthquake cases).

**Table 8.** The accuracy of two models. **Table 8.** The accuracy of two models.


#### **5. Discussion 5. Discussion**

as follows.

than other factors.

*5.1. Importance of Different Features* 

Three issues were considered in this study: the importance of nine features to the human losses in China mainland, the importance of 43 structure types of the fatalities, and human losses assessment. The first issue was addressed by adopting three traditional machine learning method domains to investigate different features' influence on the seismic fatalities. The results can provide Three issues were considered in this study: the importance of nine features to the human losses in China mainland, the importance of 43 structure types of the fatalities, and human losses assessment. The first issue was addressed by adopting three traditional machine learning method domains to investigate different features' influence on the seismic fatalities. The results can provide a basis for

a basis for further seismic assessment. The second issue was addressed by RF algorithm to assess the importance of 43 structure types. The third issue was addressed by establishing a deep learning model domain to predict the seismic death. A detailed investigation of these two issues is presented

We propose an automatic classifier based on RF, CART, and AdaBoost, and some attributes to describe the feature importance of seismic. Random forest is the most stable among three methods. Figure 7 exhibits the results of three tests on that three methods. RF provides probability estimates on the classification that are useful to accept or reject a new classification [8]. Thus, we chose the RF algorithm as the main method. It fully proved the importance of the population density, magnitude, focal depth, epicentral intensity. The sum of the four features is 74.68%, which is far more important

Time is vital to seismic, for example, the Tangshan earthquake occurred at 3:24 on July 28, 1976, when people slept in the house, and time aggravated the disaster. Losses will be relatively mitigated in daytime for most people are not sleeping and could escape quickly. The reason that time ranks seventh in this study is that time was only divided into two possibilities: daytime and sleeping time. More classes are discriminated by an attribute, the more important the attribute is [8]. In the same way, the ratio of abnormal intensity and the secondary disaster are only 2.63% and 1.18% separately further seismic assessment. The second issue was addressed by RF algorithm to assess the importance of 43 structure types. The third issue was addressed by establishing a deep learning model domain to predict the seismic death. A detailed investigation of these two issues is presented as follows.

#### *5.1. Importance of Di*ff*erent Features*

We propose an automatic classifier based on RF, CART, and AdaBoost, and some attributes to describe the feature importance of seismic. Random forest is the most stable among three methods. Figure 7 exhibits the results of three tests on that three methods. RF provides probability estimates on the classification that are useful to accept or reject a new classification [8]. Thus, we chose the RF algorithm as the main method. It fully proved the importance of the population density, magnitude, focal depth, epicentral intensity. The sum of the four features is 74.68%, which is far more important than other factors. *Sustainability* **2019**, *11*, x FOR PEER REVIEW 13 of 21 for the fewer classes. However, time is divided into different levels, not modeled at an actual time, so there is interference with the results. The secondary disasters and the abnormal intensity are classified because they occur less frequently. In summary, the features selected by the RF algorithm experiments are consistent with those used by other scholars to study earthquake casualties (population density [2], magnitude [2], focal depth [34], epicentral intensity [16], and time [13]).

**Figure 7.** Results of three tests by Random Forest (**a**), AdaBoost (**b**), and CART (**c**) algorithms. The horizontal axis is the first letter of the feature, followed by: PD (population density), M (magnitude), FD (focal depth), EI (epicentral intensity), ES (economic status), D (date), T (time), AI (abnormal intensity) and SD (secondary disaster). **Figure 7.** Results of three tests by Random Forest (**a**), AdaBoost (**b**), and CART (**c**) algorithms. The horizontal axis is the first letter of the feature, followed by: PD (population density), M (magnitude), FD (focal depth), EI (epicentral intensity), ES (economic status), D (date), T (time), AI (abnormal intensity) and SD (secondary disaster).

*5.2. Importance of Different Structure Types*  We chose all 43 structure types in China mainland to assess, which are far more than the number of the types in the previous studies [35]. Although the structure types of the WHE project are suitable for the most region of the world, it also lacks some structures that Chinese characteristic structure, such as the national civil structure, the old Tibetan house and the national brick-wood structure. Hence, the data from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland of every earthquake in China mainland is more suitable for the study comparing the data of the WHE project. The HAZUS system (HAZad United States of Multi-Hazard) only has twelve structure types and estimates the casualties for collapsed buildings and not for the heavy damaged buildings [18]. In reference [18], the casualties are estimated at the level of single buildings and not for an entire zone in an adapted HAZUS system. The population density varies greatly in different parts of China, so Time is vital to seismic, for example, the Tangshan earthquake occurred at 3:24 on July 28, 1976, when people slept in the house, and time aggravated the disaster. Losses will be relatively mitigated in daytime for most people are not sleeping and could escape quickly. The reason that time ranks seventh in this study is that time was only divided into two possibilities: daytime and sleeping time. More classes are discriminated by an attribute, the more important the attribute is [8]. In the same way, the ratio of abnormal intensity and the secondary disaster are only 2.63% and 1.18% separately for the fewer classes. However, time is divided into different levels, not modeled at an actual time, so there is interference with the results. The secondary disasters and the abnormal intensity are classified because they occur less frequently. In summary, the features selected by the RF algorithm experiments are consistent with those used by other scholars to study earthquake casualties (population density [2], magnitude [2], focal depth [34], epicentral intensity [16], and time [13]).

#### the HAZUS system and the adapted system are not suitable for the casualties estimation of China. *5.2. Importance of Di*ff*erent Structure Types*

The importance of different structural types was obtained based on RF method first. In the part of data collecting, there are no integration of structural forms and some may be slightly repeated in order not to omit any structure type. For instance, brick-column civil structure, brick-concrete structure (building of two or more floors) and brick-concrete structure (building of only one floor). However, repeated structures, even when combined, the importance was close to zero. Figure 4 displays that the contribution to the casualties of reinforced concrete structure is the largest, followed by civil structure, stone-concrete structure, brick-concrete structure, brick-wood structure, brick house and the frame house. The other structures are not shown in Figure 4 and could be neglected. We concluded that the structure with large effect may not be good at seismic behavior. The result manifests that once the building is destroyed, the damage will be greater. *5.3. Human Losses Assessment Model*  We chose all 43 structure types in China mainland to assess, which are far more than the number of the types in the previous studies [35]. Although the structure types of the WHE project are suitable for the most region of the world, it also lacks some structures that Chinese characteristic structure, such as the national civil structure, the old Tibetan house and the national brick-wood structure. Hence, the data from the Earthquake Disasters and Losses Assessment Report in Chinese Mainland of every earthquake in China mainland is more suitable for the study comparing the data of the WHE project. The HAZUS system (HAZad United States of Multi-Hazard) only has twelve structure types and estimates the casualties for collapsed buildings and not for the heavy damaged buildings [18]. In reference [18], the casualties are estimated at the level of single buildings and not for an entire zone in an adapted HAZUS system. The population density varies greatly in different parts of China, so the HAZUS system and the adapted system are not suitable for the casualties estimation of China.

large destructive earthquake is not high according to the traditional accuracy calculation (Equation (8)). The reason may be that there are more factors affecting large earthquakes than small earthquakes [2], and the uncertainty is greater than that of small earthquakes [34]. For example, the mountainous

We proposed a human losses rapid prognostic assessment model based on the above feature engineering. The results indicate the predictions of large earthquakes with magnitude 6.5 or more are

The importance of different structural types was obtained based on RF method first. In the part of data collecting, there are no integration of structural forms and some may be slightly repeated in order not to omit any structure type. For instance, brick-column civil structure, brick-concrete structure (building of two or more floors) and brick-concrete structure (building of only one floor). However, repeated structures, even when combined, the importance was close to zero.

Figure 4 displays that the contribution to the casualties of reinforced concrete structure is the largest, followed by civil structure, stone-concrete structure, brick-concrete structure, brick-wood structure, brick house and the frame house. The other structures are not shown in Figure 4 and could be neglected. We concluded that the structure with large effect may not be good at seismic behavior. The result manifests that once the building is destroyed, the damage will be greater.

#### *5.3. Human Losses Assessment Model*

We proposed a human losses rapid prognostic assessment model based on the above feature engineering. The results indicate the predictions of large earthquakes with magnitude 6.5 or more are lower than the others. For example, the actual death toll of the Wenchuan earthquake is 69227, but the estimation of the deep learning model is 37406, the PAGER model is 50,000 and the other empirical model [20] is 30,000. It can be seen that the accuracy of the assessment of casualties for a large destructive earthquake is not high according to the traditional accuracy calculation (Equation (8)). The reason may be that there are more factors affecting large earthquakes than small earthquakes [2], and the uncertainty is greater than that of small earthquakes [34]. For example, the mountainous area of Ludian County is as high as 87.9% and the high incidence of secondary disasters (e.g., debris flow and landslide) cause large number of casualties. The accuracy of general prediction models is reduced due to the neglect of geological conditions [12].

$$\text{accuracy} = \left(1 - \frac{|true\text{ value} - \text{estimate value}|}{true\text{ value}}\right) \times 100\% \tag{8}$$

Different geological conditions in different parts of China. In China, the areas with the most earthquakes are Sichuan Province, Yunnan Province, and Xinjiang Province. According to the earthquake cases and the results of the deep learning model, we can obtain conclusions as follows: (1) The basin region with soft rock-soil can aggravate earthquake disaster in Yunnan Province and Sichuan Province, such as Ludian earthquake in 2014 and Wenchuan earthquake in 2008; (2) In the event of an earthquake, the number of casualties in the area of the fault zone will be more serious in Xinjiang Province and Yunnan Province, such as Jinghe earthquake in 2017, Hutubi earthquake in 2016, and Ninger earthquake in 2007, and (3) Qinghai Province and Sichuan Province are close to each other, and the geological conditions are similar. Areas with crumbly strata have higher seismic vulnerability, such as Yushu earthquake in 2010. Above all, the error of the deep learning most comes from the geological problems.

There are some important reasons that affect the differences of the accuracy between the case 1–7 and case 8–10 besides the geological conditions. Earthquake casualties come from the direct and the secondary disasters, and the assessment mode of the latter is more difficult. However, there is currently no professional method to distinguish the human losses from the direct and the secondary disasters after the earthquake. Therefore, although the secondary disaster was evaluated at the time of importance assessment, it was not selected as an input variable. For destructive earthquakes, such as case 8–10, the types of secondary disasters are more abundant, so the impact is greater. Moreover, the traffic network is very important for post-earthquake rescue. Small earthquakes with magnitudes less than 6.5, such as cases 1–7, usually have limited damage to the traffic network and rarely completely block traffic. However, in the case of a large earthquake with a magnitude more than 6.5, such as the cases 8–10, the traffic network is usually interrupted, which seriously hinders rescue and greatly increases the number of deaths. The study did not consider the damage of the traffic network is also the main reason for the accuracy of the case 8–10.

The purpose of this study is to rapidly assess the human losses. Hence, the input features should be obtained in a short time after an earthquake. In reference [21], the train set was only from the Bam earthquake in 2003, thus, the results were not representative. Compared with empirical methods [36], the data set is larger [4], covering almost all the data from 1990 to 2017 without default. The accuracy of the results is higher; the data set is larger, and the factors considered are more than the China-National Standard [33]. It proved the deep learning technical can be used to estimate the causalities without any assumptions as compared with statistical methods, and can be directly processed by inner functions.

### **6. Conclusions and Future Works**

### *6.1. Conclusions*

We proposed a method to assess the importance of the nine factors affecting casualties in earthquakes and rank the importance of each feature based on the random forest algorithm. At the same time, 43 structural types were evaluated by this method, and the contribution of different structural forms to the death toll were obtained, which provides a basis for the future construction of structural forms. Based on the above evaluation of importance, we have reached the following conclusions:


A deep learning model for estimating human losses based on the results of the Random Forest algorithm was established. We selected five important features and compared the results with the China-National Standard because the Standard is suitable for the rapid assessment works. The results demonstrate that the accuracy is higher than the other methods and the running time are suitable for the emergency rescue work. Therefore, this method can be used to evaluate the fatalities for future earthquakes in China mainland. This model can serve the China Earthquake Administration and the Chinese government.
