**4. Prediction Model**

#### *4.1. Algorithm*

Support vector machine (SVM) is a kind of machine learning method that is based on statistical learning theory and is a supervised learning model [44]. SVM implements the structural risk minimization principle rather than the empirical risk minimization principle [45], which gives it unique advantages in solving small-sample, nonlinear and high-dimensional pattern recognition problems. Although SVM was initially applied to classification problems, it has been gradually used to solve regression problems due to its good performance in function fitting [46]. SVR is an extension of SVM for solving regression problems. Compared with other machine learning algorithms, SVR can obtain the optimal solution with a small number of samples and avoid problems such as overfitting and partial extreme values as much as possible [28], and its generalization ability and performance have been well demonstrated.

The SVR algorithm is explained as follows. Consider a given training sample set *D* = {(*x*1, *y*1), (*x*2, *y*2), ......,(*xm*, *ym*)}, where *xi* = (*xi*1, *xi*2,..., *xid*) *<sup>T</sup>* <sup>∈</sup> <sup>R</sup>*d*, *yi* <sup>∈</sup> <sup>R</sup>, *<sup>i</sup>* <sup>=</sup> 1, 2, ... , m, *xi* is the *i*th sample and has feature dimensionality *d*, *xij* is the value of the *j*th feature, *yi* ∈ *R* is the corresponding target value of the *i*th sample, and *m* is the number of samples. The goal of SVR is to find a regression model *f*(*x*) = *ωTx* + *b* such that *f*(*x*) is close to its corresponding target value *y*, where *ω* and are parameters to be calculated. In the traditional regression model, the function loss is calculated based on the difference between *f*(*x*) and *y*, which is too strict and will eventually lead to overfitting [47]. To overcome this disadvantage, SVR sets a maximum deviation between *f*(*x*) and *y*, and the function loss is counted only when the difference between *f*(*xi*) and *yi* is greater than (Figure 10). This is equivalent to constructing a spacer band of width 2*ε* with *f*(*x*) as the center; when the training sample is within the spacer band, the prediction result will be designated as correct [48]. Therefore, the SVR problem can be formulated as

$$\min\_{\omega, b} \frac{1}{2} ||\omega||^2 + \mathbb{C} \sum\_{i=1}^{m} \ell\_{\mathfrak{c}} (f(x\_i) - y\_i) \tag{1}$$

where *C* > 0 is a regularization constant and is an -insensitive loss function (Figure 11), which is expressed as

$$\ell\_{\mathfrak{c}}(\boldsymbol{x}) = \begin{cases} \boldsymbol{0}, & \text{if } |\boldsymbol{x}| \le \boldsymbol{\epsilon};\\ |\boldsymbol{x}| - \boldsymbol{\epsilon}, & \text{otherwise.} \end{cases} \tag{2}$$

**Figure 10.** Sketch diagram for SVR.

**Figure 11.** Sketch diagram for the -insensitive loss function.

The first term of Equation (1) represents the flatness of the function, which is also called the structural risk, and the second term of the equation, namely, *<sup>m</sup>* ∑ *i*=1 -(*f*(*xi*) − *yi*), represents the fitness between *f*(*x*) and its corresponding target values, which is also called the empirical risk [48]. The regularization constant *C* is a compromise between the structural risk and empirical risk. The constant *C* > 0 determines the trade-off between the flatness of *f*(*x*) and the amount up to which deviations larger than are tolerated [49]. To describe the real deviation, two slack variables, namely, *ξ<sup>i</sup>* and ˆ *ξi*, are introduced, and Equation (1) can be reformulated as

$$\min\_{\omega, \mathbf{b}, \mathbf{b}, \mathbf{\tilde{s}}, \mathbf{\tilde{s}}^\*} \frac{1}{2} ||\omega||^2 + \mathbb{C} \sum\_{i=1}^m (\mathbb{J}\_i + \mathbb{J}\_i^{\mathbb{I}});\\ \text{s.t.} \begin{cases} f(\mathbf{x}\_i) - y\_i \le \epsilon + \mathbf{\tilde{s}}\_i;\\ y\_i - f(\mathbf{x}\_i) \le \epsilon + \mathbf{\tilde{s}}\_i;\\ \mathbf{\tilde{s}}\_i \mathbf{\tilde{s}}\_i \ge \mathbf{0}, \ i = 1, 2, \dots, m. \end{cases} \tag{3}$$

To efficiently solve the above optimization problem with inequality constraints, multipliers *μ<sup>i</sup>* ≥ 0, *μ*ˆ*<sup>i</sup>* ≥ 0, *α<sup>i</sup>* ≥ 0, and *α*ˆ*<sup>i</sup>* ≥ 0 are introduced. Based on the Lagrange multiplier method, the following function can be deduced from Equation (3):

$$\begin{split} L\left(\omega, b, a, \pounds, \hat{\xi}\_{i}, \hat{\xi}\_{i}, \mu\_{i}\right) \\ &= \frac{1}{2} \left||\omega\right||^{2} + \mathbb{C} \sum\_{i=1}^{m} \left(\mathbb{J}\_{i} + \hat{\xi}\_{i}\right) - \sum\_{i=1}^{m} \mu\_{i} \mathbb{J}\_{i} - \sum\_{i=1}^{m} \hat{\mu}\_{i} \hat{\xi}\_{i} \\ &+ \sum\_{i=1}^{m} a\_{i} \left(f(\mathbf{x}\_{i}) - y\_{i} - \epsilon - \xi\_{i}\right) + \sum\_{i=1}^{m} \hat{\mu}\_{i} \left(y\_{i} - f(\mathbf{x}\_{i}) - \epsilon - \xi\_{i}\right). \end{split} \tag{4}$$

*f*(*x*) = *ωTx* + *b* is substituted into Equation (4), the partial derivatives of *L ω*, *b*, *α*, *α*ˆ, *ξ*, ˆ *ξ*, *μ*, *μ*ˆ with respect to *ω*, *b ξ<sup>i</sup>* and ˆ *ξ<sup>i</sup>* are calculated, and these partial derivatives are set equal to 0. The following system of equations is obtained:

$$
\omega = \sum\_{i=1}^{m} (\mathfrak{A}\_i - \mathfrak{a}\_i) \mathfrak{x}\_{i\prime} \tag{5}
$$

$$0 = \sum\_{i=1}^{m} (\mathfrak{A}\_i - \mathfrak{a}\_i)\_\prime \tag{6}$$

$$
\mathbb{C} = \mathfrak{a}\_i + \mu\_{i\prime} \tag{7}
$$

$$
\mathcal{C} = \mathfrak{k}\_{\mathrm{i}} + \mathfrak{p}\_{\mathrm{i}}.\tag{8}
$$

After solving the above system of equations, the dual problem of SVR can be formulated as

$$\begin{aligned} \max\_{a,\mathbb{A}} \sum\_{i=1}^{m} \left( y\_i (\mathbb{A}\_i - a\_i) - \varepsilon (\mathbb{A}\_i + a\_i) \right) - \frac{1}{2} \sum\_{i=1}^{m} \sum\_{j=1}^{m} \left( \mathbb{A}\_i - a\_i \right) (\mathbb{A}\_j - a\_j) \mathbf{x}\_i^T \mathbf{x}\_j; \\ \text{s.t. } \sum\_{i=1}^{m} \left( \mathbb{A}\_i - a\_i \right) = \mathbf{0}, \ 0 \le a\_i, \mathbb{A}\_i \le \mathbb{C}. \end{aligned} \tag{9}$$

To solve the above quadratic programming problem, the Karush-Kuhn–Tucker (KKT) conditions [50] are used:

$$\begin{array}{l} \alpha\_{i}(f(\mathbf{x}\_{i}) - y\_{i} - \boldsymbol{\varepsilon} - \boldsymbol{\xi}\_{i}) = \mathbf{0}, \\ \alpha\_{i}(y\_{i} - f(\mathbf{x}\_{i}) - \boldsymbol{\varepsilon} - \boldsymbol{\xi}\_{i}) = \mathbf{0}, \\ \alpha\_{i}\boldsymbol{\hat{\alpha}}\_{i} = \boldsymbol{0}, \ \boldsymbol{\xi}\_{i}\boldsymbol{\hat{\xi}}\_{i} = \mathbf{0}, \\ (\mathbf{C} - \boldsymbol{\alpha}\_{i})\,\boldsymbol{\xi}\_{i} = \mathbf{0}, \ (\mathbf{C} - \boldsymbol{\hat{\alpha}}\_{i})\boldsymbol{\hat{\xi}}\_{i} = \mathbf{0}. \end{array} \tag{10}$$

Substituting Equation (5) into *f*(*x*) = *ωTx* + *b* yields the following solution of the SVR:

$$f(\mathbf{x}) = \sum\_{i=1}^{m} (\mathbf{a}\_i - \mathbf{a}\_i) \mathbf{x}\_i^T \mathbf{x} + b. \tag{11}$$

⎧ ⎪⎪⎨ ⎪⎪⎩

If the term (*α*ˆ*<sup>i</sup>* − *αi*) of Equation (11) is not equal to 0, the corresponding sample is a support vector of SVR that is located outside the spacer band. Based on the KKT conditions, it is found that in Equation (10), every sample (*xi*, *yi*) satisfies the conditions (*C* − *αi*) *ξ<sup>i</sup>* = 0 and *αi*(*f*(*xi*) − *yi* − − *ξi*) = 0; therefore, *ξ<sup>i</sup>* is equal to 0 when 0 < *α<sup>i</sup>* < *C*. Then, the value of *b* can be deduced from Equation (11) as

$$b = y\_i + \epsilon - \sum\_{i=1}^{m} (\pounds\_i - \mathfrak{a}\_i) \mathfrak{x}\_i^T \ge. \tag{12}$$

However, Equation (11) is merely a solution for linear SVR. For real-world problems with high feature dimensionality, it is impossible to find a hyperplane that satisfies both fitness and flatness simultaneously [47]. An efficient approach is to map samples from the original space to a higher-dimensional feature space where the samples are linearly separable [48], and Equation (5) can be reformulated as

$$
\omega = \sum\_{i=1}^{m} (\mathbb{A}\_i - a\_i) \phi(\mathbf{x}\_i) \tag{13}
$$

where *φ*(*xi*) is the feature vector after mapping to a higher-dimensional feature space.

With the utilization of the kernel function method, the following solution for nonlinear SVR is obtained:

$$f(\mathbf{x}) = \sum\_{i=1}^{m} (\hat{\alpha}\_i - \alpha\_i) \kappa(\mathbf{x}, \mathbf{x}\_i) + b \tag{14}$$

where *κ*(*x*, *xi*) = *φ*(*x*) *<sup>T</sup>φ*(*xi*) is the kernel function. Table 4 presents various widely used kernel functions.

**Table 4.** Specification of kernel functions.


<sup>1</sup> *<sup>u</sup>* and *<sup>v</sup>* are multivariate vectors, and *<sup>d</sup>* ≥ 1 is the degree of the polynomial.

#### *4.2. Model Construction*

Based on the results of the importance assessment and feature selection, we selected the magnitude, population density, geological fault density and GDP as the input variables and selected the number of earthquake casualties as the output variable. Considering that different prediction indicators have different units of measurements, it is necessary to normalize the sample dataset to enhance the convergence speed in finding the optimal solution and to improve the accuracy of the Z-SVR model. The normalization method that was used in this study was z-score normalization, which can be formulated as

$$z\_i = \frac{\mathbf{x}\_i - \overline{\mathbf{x}}}{\sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\mathbf{x}\_i - \overline{\mathbf{x}})^2}} \tag{15}$$

where *n* is the number of samples in the dataset, *xi*. is the initial value of the *i* th sample, *<sup>z</sup>* is its corresponding normalized value, and *<sup>x</sup>* <sup>=</sup> *<sup>n</sup>* ∑ *i*=1 *xi* is the average initial value of all samples.

Previous studies [51,52] have shown that the type of kernel function and corresponding parameters have substantial impacts on the prediction performance of the SVR model. To construct a fine-tuned Z-SVR model, parameter *C* for the linear kernel, parameters (*C*, gamma) for the Gaussian kernel and sigmoid kernel, and parameters (*C*, gamma, degree) for the polynomial kernel should be selected [47]. *C* is the regularization parameter; gamma and degree are equivalent to *γ* and *d*. in Table 4, respectively. Grid search is a general and effective method for parameter optimization, which is usually combined with crossvalidation [17]. To find the best SVR model for each risk zone, this study invoked the GridSearchCV module in the scikit-learn package to search for optimal kernel functions and their corresponding model parameters in a specified range based on grid search. The selected parameters of the Z-SVR model are presented in Table 5.

**Table 5.** Model parameters of Z-SVR.


This study obtained the Z-SVR model using the Python programming language and machine learning package scikit-learn. The procedure of model establishment is summarized as follows: (1) Select suitable features as input parameters. (2) Preprocess the sample dataset by normalizing and dividing samples into training data and testing data. (3) Establishing a scoring rule for comparing the predicted results with the actual number of death casualties; if these two values are of the same order of magnitude, the prediction will be considered correct. (4) Invoke the SVR module in the scikit-learn package to build a model for each risk zone. (5) Invoke the GridSearchCV module in the scikit-learn package, and obtain parameters and search ranges; then, use the 10-fold cross-validation method to test the robustness of the model. (6) Input the training dataset into the SVR model for each risk zone to obtain optimal kernel functions and their corresponding model parameters for the Z-SVR model. (7) Input the testing dataset into Z-SVR model and predict the earthquake death tolls. (8) Since the number of earthquake casualties should not be negative, revise negative prediction results by setting them to 0. (9) Assess the performance of the Z-SVR model on the testing dataset.

### **5. Results**

#### *5.1. Spatial Division of the Study Area*

Considering the vast area and diverse environments of China's mainland, to build an earthquake casualty prediction model with better applicability, it is helpful to propose a machine learning approach with submodels that are applied to different regions. Using the strata fault dataset and population dataset, we divided the study area into risk zones using the raster calculator tool in ArcGIS software according to the proposed partition standard. We plotted the spatial division results and overlaid historical earthquakes with various magnitudes and numbers of casualties onto it, as shown in Figure 12.

**Figure 12.** Distribution of risk zones and historical earthquake in China's mainland.

As shown in Figure 12, low risk zones were the most extensive, which accounted for 51.94% of China's mainland, followed by high risk zones, which accounted for 25.59%. The area of moderate risk zones was the smallest, which accounted for 22.47% of the study area. According to the distribution of historical earthquakes, the majority of destructive earthquakes occurred in high risk areas, which indicates the validity of the proposed spatial division method. Fewer destructive earthquakes occurred in some provinces of Northern China (Heilongjiang, Jilin, Beijing and Shanxi), Southern China (Hubei, Hunan and Guizhou) and Eastern China (Zhejiang and Fujian), while these regions were divided into high or moderate risk zones. This can be explained by the presence of dense strata fault lines or high population density in these provinces. Considering that regions with fewer earthquakes usually encounter more casualties due to failure to take necessary precautions for disasters, it is significant for people in high and moderate risk zones to be trained with anti-seismic knowledge and to engage in evacuation practices. Interestingly, although earthquakes occurred in Xizang, Qinghai and Xinjiang Provinces of Western China, most parts of these regions were divided into low risk zones. This inconsistency is due to the low population densities of these provinces, which contain vast depopulated zones; this is supported by the observation that most earthquakes with high seismicity caused minor casualties in low risk zones.

#### *5.2. Prediction Result of Z-SVR Model*

This study improved the SVR model and proposed the Z-SVR model with optimal parameters for different risk areas. We randomly selected 10 samples in low risk zones (L1~L10), 3 in moderate risk zones (M1~M3) and 17 in high risk zones (H1~H17) to predict the numbers of casualties and compare them with corresponding true values, which are presented in Figure 13 and Table 6. Although the number of casualties varied over a large range in the risk zones, the differences between the majority of the predicted values by Z-SVR and the true values were acceptable. However, there were three samples with noticeable error. Among these three earthquake cases, 2 occurred in Puer (H7 and H14), and 1 occurred in Lijiang (H17); both cities are located in Yunnan Province. Considering that Yunnan is a region with significant variation of the geological environment and a huge

economic gap between cities and villages, further research should be conducted to develop a specific approach for predicting earthquakes in this region.

**Figure 13.** Prediction result of Z-SVR compared with the corresponding true values.


**Table 6.** Representative earthquakes in testing samples.

#### **6. Discussion**

#### *6.1. Comparison between Z-SVR and Other Models*

To evaluate the effectiveness of the proposed model, this study selected training samples and used a cross-validation method to evaluate the robustness of the Z-SVR model. The regression and classification performances of the proposed model were also assessed by predicting the numbers of casualties in testing samples and comparing the results in terms of numerical difference and order of magnitude. Similar experiments were also implemented on other widely used machine learning methods, including random forest (RF), back propagation neural network (BP) and logistic regression (LR). This was followed by a series of experiments and detailed analyses.

Several commonly used regression model evaluation indicators were employed in this study, including root mean square error (RMSE) and mean absolute error (MeaAE), which are defined as follows:

$$\text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (y\_i - \hat{y}\_i)^2} \tag{16}$$

$$\text{MeaAE} = \frac{1}{n} \sum\_{i=1}^{n} |y\_i - \hat{y}\_i| \tag{17}$$

where *y*ˆ*<sup>i</sup>* is the predicted death toll of the *i*th sample, *yi* is the corresponding true death toll, and *n* is the number of samples.

The classification model evaluation indicators that were applied in this study were *Precision*, *Recall* and *F*1, which are defined as follows:

$$Precision = \frac{TP}{TP + FP} \tag{18}$$

$$Recall = \frac{TP}{TP + FN} \tag{19}$$

$$\frac{1}{F1} = \frac{1}{2} \left( \frac{1}{Precision} + \frac{1}{Recall} \right) \tag{20}$$

where *TP* is the number of true-positive samples, *FP* is the number of false-positive samples, *TN* is the number of true-negative samples, and *FN* is the number of false-negative samples.

#### 6.1.1. Cross-Validation

The robustness of each model was evaluated using the cross-validation method. As discussed in Section 2.2.1, 113 seismic cases were selected as the training dataset, among which 49 cases were in low risk areas, 13 in moderate risk areas, and 81 in high risk areas. We randomly divided the cases in low and high risk zones into ten groups, respectively; considering the limited number of samples, we randomly divided the cases in moderate risk zones into five groups. The sample data in each group were not repeated. We used RMSE and MeaAE to compare the regression precision between the Z-SVR model and other machine learning models using the spatial division method. RMSE and MeaAE were calculated for three degrees of risk zones (L, M and H) and the average values (RMSE(A) and MeaAE(A)) were also given. The comparison result of all models is shown in Figure 14.

**Figure 14.** Model performance evaluated by the cross-validation method.

Judging from the stability of the prediction results on the training samples, all models performed relatively better in low and moderate risk zones than in high risk zones. A possible explanation is that there are 64 training samples in high risk zones, much more than in low and moderate risk zones. In addition, the true numbers of casualties in these 64 samples vary from 1 to 748, which is a huge range and increases the difficulty for machine learning models to achieve accurate prediction. Among all prediction models, Z-LR performed the worst, as its RMSE and MeaAE were 83.37 and 52.72, respectively, which ranked last in the two evaluation indicators. Z-BP and Z-RF outperformed the Z-LR model, with RMSEs of 67.30 and 74.27, respectively, and MeaAEs of 42.80 and 49.17, respectively. In contrast to the above prediction methods, Z-SVR showed higher overall accuracy in cross-validation experiments for all risk zones. Its RMSE was 59.15, and its

MeaAE was 36.16, which were significantly lower than those of the compared models; this indicates that the proposed Z-SVR model had the smallest dispersion and the highest stability.

#### 6.1.2. Regression Accuracy Evaluation

For samples in low, moderate and high risk zones, this study used Z-SVR and other models to predict their death tolls. Evaluation indicators of RMSE (L, M and H) and MeaAE (L, M and H) were calculated for the risk zones, and the overall regression performances (RMSE(A) and MeaAE(A)) of all models were also calculated, which are plotted in Figure 15. For samples in low and moderate risk zones, the majority of models showed relatively high regression accuracy, while for those in high risk zones, the Z-SVR and Z-BP models showed good regression performance. Among all prediction models, in terms of overall MeaAE, the Z-BP model showed the best regression accuracy with the lowest value of 16.73, and the Z-SVR model also performed well with MeaAE(A) of 17.39. In terms of the overall RMSE, the average value of Z-SVR was 35.61, which was the lowest value, followed by 35.89 for Z-BP. The precision evaluation results from Figure 15 further prove that the proposed spatial division method has the advantages of enhancing prediction accuracy and stability. For example, the RMSE of the Z-SVR model was the lowest, namely, nearly half that of the SVR model; a similar result was obtained between the Z-BP and BP models. In addition, the best fitting results were obtained by the Z-SVR and Z-BP models, while the worst results were obtained by the RF, SVR and LR models, among which the SVR and BP algorithms showed obviously improved performance with the utilization of the spatial division method. The above analysis demonstrates that spatial division is an effective method for improving the performance of machine learning algorithms in predicting earthquake casualties and that the proposed Z-SVR model showed good and stable performance in casualty prediction.

**Figure 15.** Regression performances of Z-SVR and other models.

#### 6.1.3. Classification Accuracy Evaluation

The prediction results of Z-SVR, Z-RF, Z-BP, Z-LR and their initial models were also compared with the corresponding true values in terms of classification performance, where pairs of prediction and true values with the same order of magnitude were considered correct. Based on this criterion, we calculated the evaluation indicators of *Precision*, *Recall*, and *F*1 for all prediction models for the risk zones, which are presented in Table 7. In low and moderate risk zones, although the *Precision* of the LR model was 1, its *Recall* performance was unsatisfactory, which led to a low *F*1 value; compared with LR and other models, Z-SVR showed better classification performance in low and moderate risk areas with relatively high *Precision* values and the highest *Recall* and *F*1 values. With regard

to samples in high risk zones, Z-BP was the model with the best prediction performance, with an *F*1 value of 0.87. However, the classification result of Z-SVR in high risk zones was also excellent, with the highest *Recall*, the second-highest*Precision* and the third-highest *F*1 values. In general, the Z-SVR model showed significant stability in classification prediction, with the highest values of *Recall* and *F*1 and a relatively high value of *Precision*. The *F*1 order of Z-SVR in all risk areas from high to low is moderate, low, and high risk zones. However, only a few earthquakes with casualties occurred in moderate risk areas; hence, we obtained a limited number of historical cases for training prediction models and verifying their performances, which made it difficult to evaluate the difference in classification performance order between the two models.


**Table 7.** Comparison of classification performance between Z-SVR and other models for three degrees of risk zones.

We also divided the testing samples into three groups according to the number of casualties, where the division criterion was order of magnitude (1 to 9, 10 to 99, 100 and greater). We compared the classification performances of Z-SVR and other models in the groups and calculated the evaluation indicators of *Precision*, *Recall*, and *F*1 for all prediction models. Figure 16 presents the comparison results of classification performance between Z-SVR and other models on samples with various numbers of casualties. Z-SVR provided the most balanced and accurate classification into the three groups. Although models such as Z-BP and Z-LR showed better classification performance in terms of *Precision* or *Recall* in some groups, the *Precision* and *Recall* values of the Z-SVR model in the three groups were high, balanced and stable; thus, Z-SVR had the highest *F*1 values in each group. In general, the Z-SVR model was the most precise and stable model, which provided accurate classification results for earthquakes with various numbers of casualties.

**Figure 16.** Classification results of Z-SVR and other models for earthquakes with casualties of different orders of magnitude: (**a**) Comparison of *Precision*; (**b**) comparison of *Recall*; and (**c**) comparison of *F*1.
