**Hanxi Jia \*, Junqi Lin and Jinlong Liu**

Key Laboratory of Earthquake Engineering and Engineering Vibration, Institute of Engineering Mechanics, China Earthquake Administration, Harbin 150080, China; linjunqi@iem.net.cn (J.L.); liujinlong@iem.ac.cn (J.L.)

**\*** Correspondence: jiahanxi@iem.ac.cn; Tel.: +86-0451-866-73509

Received: 27 March 2019; Accepted: 8 May 2019; Published: 14 May 2019

**Abstract:** This study aims to analyze and compare the importance of feature affecting earthquake fatalities in China mainland and establish a deep learning model to assess the potential fatalities based on the selected factors. The random forest (RF) model, classification and regression tree (CART) model, and AdaBoost model were used to assess the importance of nine features and the analysis showed that the RF model was better than the other models. Furthermore, we compared the contributions of 43 different structure types to casualties based on the RF model. Finally, we proposed a model for estimating earthquake fatalities based on the seismic data from 1992 to 2017 in China mainland. These results indicate that the deep learning model produced in this study has good performance for predicting seismic fatalities. The method could be helpful to reduce casualties during emergencies and future building construction.

**Keywords:** earthquake fatalities; deep learning; random forest; feature importance; structure type

### **1. Introduction**

Earthquakes impose a large number of threats to the Chinese (Table 1). If there is a proper rapid estimation of the number of casualties in an earthquake, the impact and losses of the disaster could be decreased [1]. The human and material resources of emergency management can be allocated by predicting the death toll [2]. We use the surface-wave magnitude (Ms) in the study. According to the current emergency response regulations of relevant Chinese departments, the following categories of emergency personnel and materials are obtained: (1) When the magnitude is less than 6 and the predicted number of deaths is 0–10. The government will need 10–50 emergency personnel and 200–300 tents; (2) When the magnitude is greater than or equal to 6 and less than 6.5, and the predicted number of deaths is 0–10. The number of emergency personnel is 50–100, and the number of tents is 1000–3000. (3) When the magnitude is greater than or equal to 6.5 and less than 7, and the predicted death toll is 0–10, 200–500 emergency personnel and 3000–5000 tents will be needed. If the predicted number more than 10, 500–1000 emergency personnel and 5000–10000 tents will be needed. (4) When the magnitude is more than 7 and the death toll is less than 10, 500–1000 emergency personnel and 5000–10000 tents will be required. When the death toll is between 10–100, 1000–5000 emergency personnel and 10000–20000 tents will be required. When the death toll is 100–1000, 5000–10000 emergency personnel and more than 20,000 tents will be needed and (5) when the number of deaths is greater than 1000, it is necessary to draw the necessary emergency personnel and material distribution according to the specific economic and political conditions in the local area.


**Table 1.** Earthquake disaster losses in China mainland from 1992 to 2017, including the number of death and injured people and the economic costs. The unit of the economic loss is Chinese Renminbi Yuan (CNY).

However, there are many factors which may affect fatalities, and not every factor has a decisive impact on earthquake casualties. Therefore, it is also necessary to select a suitable method to evaluate the importance of each factor.

Linear models are the most constantly used methods for assessing feature correlation [3]. In reference [4], the research has given the relationship between human losses and factors such as population density and the intensity and magnitude of the earthquakes based on the linear models. Nevertheless, due to the uncertainties and fuzziness in the data of the factors [5], integrated ensemble models were proposed and applied to the feature importance assessment models [6–9] for the purpose of improving accuracy and generalization ability of the traditional linear models [10]. In the present studies, the excellent performance of ensemble algorithms on prediction ability and generalization capacity has been proven better than the linear models [3]. However, so far, no research has been conducted to evaluate the importance of influencing factors and different structure types on earthquake casualties using machine learning methods. Previous studies of earthquake casualties based on experience directly gave influencing factors [2,11] and structural types [12] or based on the statistical methods gave [13].

Different methods were developed to estimate the casualties in earthquakes. Most studies used empirical analysis methods [12,14,15] and some software systems to assess casualties. For instance, geographic information system (GIS) [11,16], the U.S. Geological Survey's Prompt Assessment of Global Earthquakes for Response (PAGER) system [17] and the Disaster Management Tool (DMT) software [18]. In reference [18], the authors present the casualty estimation model, which is part of the DMT software. The model is based on the evaluation of laserscanning data that collected by the airborne sensors and it also can be used to detect collapsed buildings, to assess their damage type, and to compute the number

and regression tree.

2. Features

of the trapped victims for each collapsed building. The PAGER system, made recourse to the EERI World Housing Encyclopedia (WHE) project (including the non-engineered building) [19], can estimate the fatality for large earthquakes in the two hours [17]. However, these systems cannot assess losses in a few minutes. Empirical methods usually established linear models that were evaluated by fitting one or more functions [20]. These models have many disadvantages: the workloads tend to be large and the amount of data small; the abnormal points were usually deleted instead of calculating fit together within the models; and they have strong subjectivity. These shortcomings can be compensated by neural networks in the field of machine learning [21]. engineered building) [19], can estimate the fatality for large earthquakes in the two hours [17]. However, these systems cannot assess losses in a few minutes. Empirical methods usually established linear models that were evaluated by fitting one or more functions [20]. These models have many disadvantages: the workloads tend to be large and the amount of data small; the abnormal points were usually deleted instead of calculating fit together within the models; and they have strong subjectivity. These shortcomings can be compensated by neural networks in the field of machine learning [21]. With the rise of machine learning algorithms, some studies of estimating fatalities based on back

made recourse to the EERI World Housing Encyclopedia (WHE) project (including the non-

With the rise of machine learning algorithms, some studies of estimating fatalities based on back propagation neural network (BPNN) method have begun to emerge [21]. Because of different earthquakes of intensity, population density, and different structure types, it is extremely perplexing to define a certainty relevance to evaluate fatalities caused by an earthquake. Hence, deep learning method, with its abilities to estimate perplexed relevances, could be an outstanding method to evaluate fatalities. However, BPNN method is not a very perfect network, it has many shortcomings: (1) The convergence speed is too slow and it takes hundreds or more than hundreds of times to learn to converge [22]; (2) it cannot guarantee convergence to a global minimum point [23,24]; (3) there are a number of hidden layers and neurons in that are not theoretically guided, but are determined empirically, thus, the network tends to be large [22]—the redundancy invisibly increases time of network learning [25]; and (4) learning and memory of the network are unstable. Deep learning optimization algorithms can improve the shortages of BPNN method. propagation neural network (BPNN) method have begun to emerge [21]. Because of different earthquakes of intensity, population density, and different structure types, it is extremely perplexing to define a certainty relevance to evaluate fatalities caused by an earthquake. Hence, deep learning method, with its abilities to estimate perplexed relevances, could be an outstanding method to evaluate fatalities. However, BPNN method is not a very perfect network, it has many shortcomings: (1) The convergence speed is too slow and it takes hundreds or more than hundreds of times to learn to converge [22]; (2) it cannot guarantee convergence to a global minimum point [23,24]; (3) there are a number of hidden layers and neurons in that are not theoretically guided, but are determined empirically, thus, the network tends to be large [22]—the redundancy invisibly increases time of network learning [25]; and (4) learning and memory of the network are unstable. Deep learning optimization algorithms can improve the shortages of BPNN method. Therefore, we assessed the importance of the factors based on three machine learning methods

Therefore, we assessed the importance of the factors based on three machine learning methods and selected the random forest algorithm as the optimal classifier. At the same time, we evaluated the contribution degree of 43 different structure types based on the random forest algorithm. Finally, the deep learning assessment model was established with the factors of population density, magnitude, focal depth, epicentral intensity, and time. Figure 1 shows the flowchart of the entire assessment process. and selected the random forest algorithm as the optimal classifier. At the same time, we evaluated the contribution degree of 43 different structure types based on the random forest algorithm. Finally, the deep learning assessment model was established with the factors of population density, magnitude, focal depth, epicentral intensity, and time. Figure 1 shows the flowchart of the entire assessment process.

**Figure 1.** Flowchart for assessing human losses with machine learning methods. CART: classification **Figure 1.** Flowchart for assessing human losses with machine learning methods. CART: classification and regression tree.
