Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning

Mao, Ning; Sun, Ke; Zhang, Jingye

doi:10.3390/app14199001

Open AccessArticle

Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning

by

Ning Mao

,

Ke Sun

^* and

Jingye Zhang

Institute of Earthquake Forecasting, China Earthquake Administration, Beijing 100036, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 9001; https://doi.org/10.3390/app14199001 (registering DOI)

Submission received: 15 September 2024 / Revised: 29 September 2024 / Accepted: 3 October 2024 / Published: 6 October 2024

(This article belongs to the Special Issue Advanced Research in Seismic Monitoring and Activity Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

The North–South Seismic Belt is one of the major regions in China where strong earthquakes frequently occur. Predicting the monthly maximum magnitude is of significant importance for proactive seismic hazard defense. This paper uses seismic catalog data from the North–South Seismic Belt since 1970 to calculate and extract multiple seismic parameters. The monthly maximum magnitude is processed using Variational Mode Decomposition (VMD) with sample segmentation to avoid information leakage. The decomposed multiple modal data and seismic parameters together form a new dataset. Based on these datasets, this paper employs four deep learning models and four time windows to predict the monthly maximum magnitude, using prediction accuracy (PA), False Alarm Rate (FAR), and Missed Alarm Rate (MR) as evaluation metrics. It is found that a time window of 12 generally yields better prediction results, with the PA for Ms 5.0–6.0 earthquakes reaching 77.27% and for earthquakes above Ms 6.0 reaching 12.5%. Compared to data not decomposed using VMD, traditional error metrics show only a slight improvement, but the model can better predict short-term trends in magnitude changes.

Keywords:

deep learning; north–south seismic belt; VMD; magnitude prediction

1. Introduction

As a major natural disaster, earthquakes have the potential to inflict substantial damage on humans and infrastructure in a very short time. According to research, earthquakes not only cause direct damage to buildings and infrastructure but also trigger secondary disasters such as fires, tsunamis, and landslides, further exacerbating the losses [1]. Moreover, the impact of earthquakes is not limited to physical destruction but also has profound negative effects on socio-economic aspects, posing significant challenges to the recovery and reconstruction efforts in the affected areas and creating substantial safety risks for people’s lives. Therefore, in regions where strong earthquakes frequently occur, earthquake prediction and research are crucial in reducing the damage caused by earthquakes. In recent years, machine learning and deep learning have achieved promising results in the field of prediction. Against this backdrop, many researchers are dedicated to developing and improving earthquake prediction models [2,3,4,5].

In 2007, Panakkat et al. [6] developed a scientific method for short-term earthquake prediction by converting earthquake catalogs into eight seismic activity parameters based on seismic statistical parameters and fundamental statistical characteristics in seismology. They used prior statistical models, including the Gutenberg–Richter relationship (G-R) and characteristic earthquakes, and combined these parameters with neural network models to predict the magnitude of the largest seismic event expected in the coming month for a specific region. Since the emergence of these seismic activity parameters, the rapid development in fields such as machine learning has led to the derivation of multiple new parameters, which have been applied using various models for earthquake prediction research. Wang et al. [7,8] summarized the parameters and models used, along with the results, and highlighted the general applicability and importance of these parameters in earthquake prediction. Sadhukhan et al. [9] investigated the relationship between eight seismic activity parameters and earthquakes using Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Transformer models, achieving significant positive results for earthquakes with magnitudes ranging from Ms 3.5 to 6.0 across multiple regions. Li et al. [10] used the eight indicators as explicit features, applied convolutional neural networks to extract implicit features, and effectively fused the explicit and implicit features of the earthquake data. Asim et al. [11] used machine learning techniques to study eight seismic indicators calculated for the Hindu Kush region, achieving significant results in predicting earthquakes with magnitudes greater than or equal to Ms 5.5.

As research progresses, models, methods, and prediction indicators for earthquake forecasting are becoming increasingly diverse. Wang et al. [12] utilized the spatial relationships between earthquakes to predict seismic events and achieved good results. Rafiei et al. [13] utilized a combination of classification algorithms (CAs) and mathematical optimization algorithms (OAs) to predict earthquake magnitudes and locations, demonstrating that with sufficient data, deep learning models can forecast severe catastrophic events. Hasan Al Banna et al. [14] used earthquake indicators from the Bangladesh earthquake catalog to predict the magnitude and location of earthquakes for the next month using the Attention-based Long Short-Term Memory (ATT-LSTM) model. Their results showed that the accuracy of magnitude prediction reached 74.67%, and the root mean square error (RMSE) for location prediction was 1.25. Zhang et al. [15] employed the CovLSTM network, fully considering the spatial correlation of global earthquake data for high-resolution earthquake prediction, achieving results with a higher resolution and accuracy compared to previous methods. Kavianpour et al. [16] proposed a CNN-BiLSTM-AM model combined with ZOH technology to predict the maximum magnitude and the number of earthquakes for the next month. In a study by Last et al., they predicted the annual maximum earthquake magnitude in 10 different regions by considering the correlation between the number of earthquakes and the annual maximum magnitude. The Multi-objective Information Fuzzy Network (M-IFN) achieved the most accurate results (AUC = 0.698) [17]. Essam et al. [18] utilized artificial neural networks (ANNs) to demonstrate good performance in predicting earthquake acceleration, depth, and velocity in the Terengganu region. Zhang et al. [19] proposed a purely data-driven deep learning model called EPT, which takes into account both local and overall seismic information. Experimental results on datasets from five provinces in China achieved an accuracy rate of 90%.

In summary, it is feasible to use various machine learning and deep learning models for earthquake prediction. Many researchers have achieved certain results in earthquake prediction by employing different models and datasets [20,21,22,23,24]. They all directly utilize and predict seismic magnitude data. However, due to the poor temporal characteristics of earthquake data, they often require more data or more complex models to achieve good results in earthquake prediction. Dragomiretskiy et al. [25] proposed a signal processing method in 2014 called VMD. This method can effectively handle nonlinear and non-stationary signals. Banjade et al. [26] proposed a seismic data denoising algorithm that combines VMD and Savitzky–Golay (SG) filtering. This method was applied to the 2015 Nepal earthquake and improved the signal-to-noise ratio while preserving significant features. Sarlis et al. [27] used Empirical Mode Decomposition (EMD) for analyzing magnitude time series of global earthquakes and concluded that similar results to natural time analyses of global earthquakes can be obtained at mesoscale time series.

Granda et al. [28] demonstrated that the Intrinsic Mode Functions (IMFs) obtained through VMD provide a more organized way to understand the frequency peak amplitudes and energy of accelerations related to earthquakes, thereby offering a better explanation of seismic events. Chi et al. proposed a VMD method combined with a GRU-LUBE deep learning model based on machine learning theory. This algorithm enhances data correlation and extracts more comprehensive pre-earthquake anomalies than previous methods from the borehole strain data of two major earthquakes in Sichuan’s Wenchuan and Lushan regions. Furthermore, the anomalies observed in the pre-earthquake borehole strain data for both earthquakes were similar [29].

Based on the above research, this paper applies the VMD method to the monthly maximum magnitude data in the North–South Seismic Belt region. It then processes the multiple modes obtained from VMD using four deep learning models: LSTM, BiLSTM, ATT-LSTM, and Attention-based Bidirectional Long Short-Term Memory (ATT-BiLSTM). Hyperparameter analysis is conducted for each deep learning model, and the optimal parameters for each mode are selected by comparing the minimum MSE of each model. After obtaining the results for each mode, they are summed to form the final prediction result. The effectiveness of the models is evaluated by comparing the PA, FAR, and MR among the different models. The final conclusion is that decomposing magnitude data using the VMD method is feasible.

The rest of the paper is organized as follows: Section 2 provides an overview of the study area, the data used, and the basic data processing. Section 3 describes the models and evaluation metrics used in this paper. Section 4 details the experimental process. Section 5 presents the experimental results. Section 6 discusses the results of the study. Section 7 summarizes the conclusions and outlines future work in this area.

2. Data and Data Preprocessing

2.1. Study Area

The extent of the North–South Seismic Belt has not yet reached a unified understanding [30,31,32]. Taking everything into account, this paper considers the region within 21.0° N to 43° N and 95° E to 110° E as the North–South Seismic Belt; see Figure 1.

The North–South Seismic Belt is a large-scale, nearly north–south-oriented zone of intense seismic activity in central China, characterized by very high magnitudes and frequencies of strong earthquakes. This seismic Belt not only experiences frequent seismic events but also involves a complex tectonic background. It is composed of multiple fault zones and fold structures of varying orientations and natures, including the Longmenshan Fault Zone and the North China Fault Zone. These fault zones interweave to form a complex tectonic system due to crustal movements. Being situated within the Chinese mainland, crustal stress accumulates and is released here, leading to frequent strong earthquakes. These characteristics make the North–South Seismic Belt one of the main regions in mainland China where severe earthquakes occur, with very active seismic activity that significantly impacts the regional geological structure and local residents’ lives [33,34,35].

From January 1970 to December 2023, there were 280 months in which the maximum monthly earthquake magnitude in the North–South Seismic Belt exceeded Ms 5.0 out of a total of 648 months. This means that, on average, an earthquake with magnitude Ms 5.0 or higher occurs in this region approximately every three months. Earthquakes of this magnitude can be relatively damaging, potentially causing severe damage to buildings, infrastructure disruption, and even casualties. Due to the immense energy released, such earthquakes can have a significant impact on the socio-economic conditions and daily lives of people in the epicenter and surrounding areas. Therefore, earthquake prediction for this region is of considerable importance. The distribution of the maximum monthly magnitudes in the North–South Seismic Belt is shown in Figure 1.

2.2. Data

The raw data used in this study are the earthquake catalog data, which include earthquake time, date, latitude, longitude, magnitude, and focal depth. The data can be downloaded from the China Earthquake Networks Center (https://news.ceic.ac.cn). Considering the completeness of the earthquake records, using the G-R formula, the minimum completeness magnitude Mc = 3.0. Therefore, only earthquakes with a magnitude of Ms 3.0 and above were selected for this study. The primary information used includes earthquake time, latitude, longitude, and magnitude. Based on this information, several seismological parameters were calculated, which are introduced in detail below:

T Value

This parameter represents the time elapsed for n events with magnitudes greater than or equal to the minimum completeness magnitude in the earthquake catalog before each month arrives. (The parameter is calculated based on the 100 earthquake events that occurred before February 2000. These 100 events are counted starting from the last earthquake in January 2000 and going backward.

T

represents the number of days between these 100 earthquake events.) The formula is as follows:

T = t_{n} - t_{1}

(1)

where

t_{1}

represents the time of the first earthquake and

t_{n}

represents the time of the

n

th earthquake, with

n = 100

used in this paper to calculate the time elapsed before 100 earthquake events occur each month. A smaller

T

indicates a higher frequency of earthquakes, meaning events are more frequent; conversely, a larger

T

indicates a lower frequency of earthquakes, meaning events are less frequent.

2.: Average Magnitude

This parameter represents the average magnitude of

n

earthquake events, as given by the following formula:

M_{m e a n} = \sum M_{i} / n

(2)

where

M_{i}

represents the magnitude of the

i

th earthquake event. It is generally observed that the average magnitude before a major earthquake tends to gradually increase as the major earthquake approaches.

3.: The square root rate of the earthquake energy released ( $d E^{1 / 2}$ )

This parameter represents the intensity and frequency of seismic activity, and is given by the following formula:

d E^{1 / 2} = \frac{\sum E^{1 / 2}}{T}

(3)

where

E^{1 / 2}

represents the square root of the earthquake energy (

E

), and

M

represents the magnitude. The earthquake energy can be calculated using the following formula:

E = 10^{(11.8 + 1.5 M)} e r g s

(4)

4.: Slope of the log of the earthquake frequency versus the magnitude curve (b value)

The parameter is based on the G-R inverse power law of earthquake magnitude and frequency, which is expressed by the following formula:

{l o g}_{10} N = a - b M

(5)

where

N

is the number of earthquakes with magnitudes greater than or equal to

M

,

a

and

b

are constants, and

b

is the slope of the approximately linear relationship between the logarithm of earthquake frequency and magnitude. Their calculation formulas are as follows:

a = \frac{n \sum_{1}^{n} ({l o g}_{10} N_{i} + b M_{i})}{n}

(6)

b = \frac{n \sum_{1}^{n} M_{i} {l o g}_{10} N_{i} - \sum_{1}^{n} M_{i} \sum_{1}^{n} {l o g}_{10} N_{i}}{{(\sum_{1}^{n} M_{i})}^{2} - n \sum_{1}^{n} M_{i}^{2}}

(7)

5.: The mean square deviation (η)

The calculation formula for this parameter is as follows [36]:

η = \frac{\sum_{1}^{n} {({l o g}_{10} M_{i} - (a - b M_{i}))}^{2}}{n - 1}

(8)

where the lower the

η

value, the more likely the distribution is to be estimated using an inverse power law; conversely, a higher

η

value indicates a greater degree of randomness.

6.: Magnitude deficit (∆M value)

This parameter represents the residual between the observed maximum magnitude in n seismic events and the maximum magnitude based on the G-R law. The calculation formula is as follows:

Δ M = M_{\max m a g n i t u d e o b s e r v e d} - M_{\max m a g n i t u d e e x p e c t e d}

(9)

where

M_{\max m a g n i t u d e o b s e r v e d}

represents the maximum magnitude observed in

n

seismic events;

M_{\max m a g n i t u d e e x p e c t e d}

represents the maximum magnitude expected based on the G-R law in n seismic events, as shown in the formula below:

M_{\max m a g n i t u d e e x p e c t e d} = \frac{a}{b}

(10)

7.: Other parameters

In addition to the aforementioned seven seismological parameters, based on the earthquake catalog data, we also extracted the maximum magnitude for each month and its corresponding latitude and longitude data. The latitude and longitude information of the maximum magnitude helps in learning the spatial relationships, while the monthly maximum magnitude data are used for VMD. An example of the seismological parameter information extracted for the year 1971 is shown in Table 1.

2.3. Data Preprocessing

This paper extracts the maximum magnitudes of earthquakes in the North–South Seismic Belt regions on a monthly basis from January 1970 to December 2023. However, as shown in Figure 2, the time series of monthly maximum magnitude data are relatively poor, which may make it difficult for deep learning models to learn patterns effectively. VMD can decompose data with poor time series characteristics into modes with relatively good time series properties. In earthquake prediction, many scholars have achieved promising results using machine learning and deep learning methods based on seismological parameters calculated from earthquake catalog data [37,38]. In this study, the VMD method is employed to decompose the monthly maximum magnitudes of the northern and southern seismic Belt regions into multiple modes, and then each mode is combined with the calculated seismological parameters to create new datasets. For each dataset, this study predicts the maximum magnitude for the following month using a rolling time window approach (the time window in this paper is measured in months, and the size of the time window indicates the number of months). Finally, the predictions from multiple modes are summed to obtain the final prediction result. Through experiments, this paper finds that decomposing into six modes yields better results. Figure 3 shows the decomposition results for the first two samples when the time window is set to 12, as well as the decomposition results for each mode across the entire dataset.

3. Methods

In this paper, the North–South Seismic Belt is treated as a whole. The monthly maximum magnitudes for this region are extracted and subjected to VMD, resulting in 6 columns of data. These six columns are combined with seismological parameters calculated from the earthquake catalog to form a new dataset. Different deep learning methods are applied to these datasets for hyperparameter analysis. For each dataset, the parameters corresponding to the minimum Mean Squared Error (MSE) are selected as optimal parameters to obtain the corresponding prediction results. Finally, the prediction results from multiple modalities are summed to obtain the final prediction result. The FA, FAR, and MR for each model are calculated. The model with the highest forecast accuracy is selected as the optimal model. In the prediction process, a rolling time window approach is used with windows of sizes 6, 12, 18, and 24. The results are also compared with those obtained without VMD, directly using magnitudes for prediction. The general process is similar for different time windows and models. Figure 4 illustrates the specific process of the experiment.

3.1. VMD

VMD is an adaptive, fully non-recursive method used to address modal variation and signal processing problems [25]. VMD achieves signal decomposition by solving an optimization problem, which enhances its robustness and adaptability. It can automatically determine the central frequency and bandwidth of each mode, avoiding the issue of mode mixing. Additionally, it employs a constrained variational problem to minimize the bandwidth between modes, thereby improving the accuracy of the decomposition. Its core advantage lies in its ability to extract intrinsic modes from the signal, thereby revealing the underlying physical significance of the data. At a monthly scale, earthquake magnitudes exhibit non-stationarity, and VMD can decompose these complex seismic signals into more analyzable components. By eliminating noise and capturing key signal features, VMD provides a solid foundation for subsequent magnitude prediction and other seismic analyses. Assume that the original signal can be composed of

K

finite-bandwidth modal components

v_{k} (t)

, with each IMF having a center frequency

ω_{(t)}

. The constraint is that the sum of these

K

modal components equals the input signal. The specific decomposition process concept is as follows:

(1) Obtain the analytic signal of

v_{k} (t)

through the Hilbert transform and compute its one-sided spectrum. By multiplying it with the operator

e^{- j ω_{k} t}

, shift the center band of

v_{k} (t)

to the corresponding baseband:

[(δ (t) + \frac{j}{π t}) \times v_{k} (t)] e^{- j ω_{k} t}

(11)

(2) Calculate the squared norm

L^{2}

of the demodulation gradient and estimate the bandwidth of each modal component. The process is presented by the following formula:

\underset{{μ_{K}}, {ω_{K}}}{m i n} {\sum_{K} {‖ d_{t} [(δ (t) + \frac{j}{π t}) \times μ_{K} (t)] e^{- j ω_{K} t} ‖}_{2}^{2}}

(12)

s . t . \sum_{k} v_{k} = s

(13)

In the above formula,

\{v_{k}\} = {v_{1}, \dots, v_{k}}

represents the decomposed IMF components, and

\{ω_{k}\} = {ω_{1}, \dots, ω_{k}}

represents the center frequencies of the respective components.

To find the optimal solution to the constrained variational problem, we first introduce the Lagrange multiplier

τ (t)

and the second-order penalty factor

α

, which transform the constrained variational problem into an unconstrained variational problem. The second-order penalty factor

α

ensures the accuracy of signal reconstruction in the presence of Gaussian noise. The Lagrange multiplier

τ (t)

ensures the strictness of the constraint conditions. The extended Lagrangian expression is as follows:

L (\{ν_{k}\}, \{ω_{k}\}, τ) ≔ α \sum_{k} {‖\partial_{t} [(δ (t) + \frac{j}{π t}) \times ν_{k} (t)] e^{- j ω_{k} t}‖}_{2}^{2} + {‖s (t) - \sum_{k} ν_{k} (t)‖}_{2}^{2} + ⟨ τ (t), s (t) - \sum_{k} ν_{k} (t) ⟩

(14)

Using the alternating direction method of multipliers (ADMM) to iteratively update each component and its center frequency, the saddle point of the unconstrained model is ultimately obtained, which represents the optimal solution for the original data.

3.2. Deep Learning Model

3.2.1. LSTM

LSTM networks are a specialized type of Recurrent Neural Network (RNN) introduced by Hochreiter et al. in 1997 [39]. They are designed to address the issues of gradient vanishing and gradient explosion that traditional RNNs encounter when processing long sequences. By incorporating memory cells and gating mechanisms, LSTMs can effectively capture and retain long-term dependencies within sequences, making them widely used in time series prediction, natural language processing, and other areas.

The basic unit of an LSTM is called the LSTM cell, which consists of a cell state and three gating mechanisms: the input gate, the forget gate, and the output gate. The interplay of these components allows the LSTM to selectively remember or forget information, addressing the shortcomings of traditional RNNs. Figure 5 illustrates the structure of the LSTM cell.

The forget gate determines which information in the cell state should be forgotten, with the following calculation formula:

f_{t} = σ (W_{f} \cdot [h_{t - 1}] + b_{f})

(15)

The forget gate processes the input

x_{t}

and the previous time step’s hidden state

h_{t - 1}

through a sigmoid function (σ), generating a value between 0 and 1 to control which information should be retained.

f_{t}

is the output of the forget gate.

W_{f}

is the weight matrix of the forget gate.

x_{t}

is the input at the current time step. The input gate controls the impact of the current input information on the cell state. The calculation formula is as follows:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(16)

\tilde{C_{t}} = t a n h (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{C})

(17)

The input gate uses both a sigmoid function and a

t a n h

function to determine the importance of the new information and the extent to which it should be updated.

i_{t}

is the output of the input gate. The cell state is updated based on the outputs of the input gate and the forget gate. The calculation formula is as follows:

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot \tilde{C_{t}}

(18)

The cell state is updated by combining the output of the forget gate

f_{t}

and the output of the input gate

i_{t}

. The new cell state

C_{t}

contains the forgotten old information and the new input information. Here,

C_{t}

is the cell state at the current time step,

C_{t - 1}

is the cell state at the previous time step, and

\tilde{C_{t}}

is the candidate cell state at the current time step. The output gate controls the influence of the cell state on the output at the current time step. The calculation formula is as follows:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(19)

The output gate processes

x_{t}

and the cell state

C_{t}

through a sigmoid function to generate the output at the current time step. The hidden state is updated based on the cell state and the output of the output gate, and the calculation formula is as follows:

h_{t} = o_{t} \cdot t a n h (C_{t})

(20)

The hidden state

h_{t}

, which combines the cell state and the output of the output gate, serves as the final output of the LSTM unit and is passed to the next time step.

o_{t}

is the output of the output gate.

3.2.2. BiLSTM

BiLSTM builds upon regular LSTM by introducing a bidirectional processing mechanism, which means that while processing a sequence, it simultaneously considers information from both the forward direction (forward LSTM) and the backward direction (backward LSTM) [40]. This bidirectional processing allows the model to gain more contextual information at each time step, enhancing the model’s performance.

As shown in Figure 6, the forward LSTM processes the sequence from the start to the end, while the backward LSTM processes the sequence from the end to the start. The outputs of the two LSTMs are concatenated at each time step to incorporate bidirectional contextual information. At time step

t

, the output of the forward LSTM is

{\vec{h}}_{t}

, the output of the backward LSTM is

{\overset{\leftarrow}{h}}_{t}

, and the output at this time is

H_{t}

. The specific calculation formula is as follows:

{\vec{h}}_{t} = L S T M (x_{t}, {\vec{h}}_{t - 1})

(21)

{\overset{\leftarrow}{h}}_{t} = L S T M (x_{t}, {\overset{\leftarrow}{h}}_{t - 1})

(22)

H_{t} = \vec{W} {\vec{h}}_{t} + \overset{\leftarrow}{W} {\overset{\leftarrow}{h}}_{t} + b_{H}

(23)

3.2.3. Attention Mechanism

The Attention Mechanism (ATT) is a technique that simulates human visual and auditory attention allocation. Essentially, it involves using a neural network to autonomously learn a set of weight coefficients and apply dynamic weighting to enhance important information while suppressing less-important information [41,42]. In magnitude prediction, by incorporating the ATT, the model can capture key features in the magnitude data and dynamically adjust the weights of different features, further improving forecast accuracy. This paper introduces the ATT into LSTM and BiLSTM models to learn the importance of each feature, enhancing the models’ ability to focus on critical information. The ATT calculates feature weights

α_{t}

using the output vector

h_{t}

as input, with the following calculation formula:

u_{t} = t a n h (W_{a} h_{t} + b)

(24)

α_{t} = \frac{e x p (u_{t}^{T} u_{ω})}{\sum_{t} e x p (u_{t}^{T} u_{ω})}

(25)

C_{t} = \sum_{t} α_{t} h_{t}

(26)

First, the intermediate vector

u_{t}

is calculated, followed by the computation of the attention weight

α_{t}

. After weighting,

C_{t}

is calculated and passed to the next layer of the neural network. Here,

W_{a}

is the weight coefficient matrix,

b

is the bias vector, and

t a n h

is the hyperbolic tangent activation function used to introduce nonlinearity.

u_{ω}

is the initialized weight matrix. The numerator

e x p (u_{t}^{T} u_{ω})

in the formula calculates the importance score for the current time step

t

, while the denominator

\sum_{t} e x p (u_{t}^{T} u_{ω})

is the normalization factor for the importance scores of all time steps, ensuring that the sum of the attention weights is 1.

3.2.4. ATT-LSTM/ATT-BiLSTM

To better capture important information in the data, this paper introduces the ATT into the deep learning models LSTM and BiLSTM. The principle has already been introduced earlier. We added an ATT layer between the two layers of LSTM and the two layers of BiLSTM, respectively. Here, we take the newly formed dataset as an example, and the specific processing steps of ATT-LSTM are shown in Figure 7.

Similarly, the processing steps for ATT-BiLSTM are shown in Figure 8.

3.3. Evaluation Metrics

It is generally considered that a forecast is accurate if the predicted magnitude is within Ms ±0.5 of the actual magnitude. If the predicted magnitude is more than Ms +0.5 above the actual magnitude, it is defined as a false alarm. If the predicted magnitude is more than Ms −0.5 below the actual magnitude, it is defined as a missed alarm [43]. This paper mainly studies the prediction effects for magnitudes Ms 5.0–6.0 and above Ms 6.0. Therefore, when treating this as a regression problem, the actual magnitude range needs to be predefined. Hence, the PA, FAR, and MR in this paper are calculated under the premise that the actual magnitudes are between Ms 5.0 and 6.0 and above Ms 6.0. Suppose we are calculating the metrics for earthquakes with actual magnitudes greater than or equal to Ms 5.0 but less than Ms 6.0. Let N represent the total number of samples with actual magnitudes between Ms 5.0 and 6.0, A represent the number of samples where the predicted magnitude is within Ms ±0.5 of the actual magnitude, F represent the number of samples where the predicted magnitude is more than Ms +0.5 from the actual magnitude, and M represent the number of samples where the predicted magnitude is more than Ms −0.5 from the actual magnitude. The metrics are defined as follows:

PA: The proportion of samples where the predicted magnitude is within ±0.5 of the actual magnitude to the total number of samples. The calculation formula is as follows:

P A = \frac{A}{N}

(27)

FAR: The proportion of samples where the actual magnitude is between Ms 5.0 and 6.0 but the predicted magnitude is more than +0.5 from the actual magnitude to the total number of samples. The calculation formula is as follows:

F A R = \frac{F}{N}

(28)

MR: The proportion of samples where the actual magnitude is between Ms 5.0 and 6.0 but the predicted magnitude is more than −0.5 from the actual magnitude to the total number of samples. The calculation formula is as follows:

M R = \frac{M}{N}

(29)

Additionally, we calculated the MSE, Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE) of the test set when treating this as a regression problem. The specific definitions and calculation formulas are as follows:

MSE is a common metric for measuring the difference between predicted values and actual values. MSE is the average of the squared differences between the predicted and actual values and is used to quantify the overall deviation of the prediction results. The formula is as follows:

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(30)

where

y_{i}

is the

i

-th actual value,

{\hat{y}}_{i}

is the

i

-th predicted value, and

n

is the total number of samples. The smaller the MSE value, the better the predictive performance of the model. Due to the squared error, MSE is more sensitive to outliers.

RMSE is the square root of MSE. Similar to MSE, RMSE is a metric for measuring the difference between predicted values and actual values. However, RMSE has the same unit as the original data, which makes it more interpretable. The formula is as follows:

R M S E = \sqrt{M S E} = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(31)

The smaller the RMSE, the better the predictive performance of the model. Like MSE, RMSE is also sensitive to outliers.

MAE is the average of the absolute values of the errors. It is another metric for measuring the difference between predicted values and actual values. The formula for calculating MAE is as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} ∣ y_{i} - {\hat{y}}_{i} ∣

(32)

MAE reflects the average magnitude of the errors without considering their direction, making it less sensitive to outliers compared to MSE and RMSE. The smaller the MAE, the better the predictive performance of the model. But it does not weight the size of the errors.

4. Experiment

4.1. VMD Processes the Data

When decomposing the magnitude data into six modes, it is important to note that the VMD of the magnitude data is not performed on the entire magnitude dataset at once, as this would lead to information leakage. To avoid information leakage, a rolling time window approach is adopted, selecting a total of four time windows: 6, 12, 18, and 24. Here, using a time window size of 12 as an example, for the first window (i.e., the first 1–12 data points), all data are retained. Then, the time window is rolled, and for the second time window (i.e., the 2–13 data points), only the decomposition results of the last data point within the time window are retained, and so on, until all data are decomposed. The process is similar for other time windows. However, when performing VMD, there is an issue of boundary effects on both sides of the data. We mitigate this impact by using mirror extension.

Mirror extension involves adding a mirrored part of the signal at both ends, making the signal symmetrical during analysis and reducing the impact of boundary effects on the results. As shown in Figure 3, the decomposed data are smoother than the original data, making it easier for deep learning models to learn their patterns.

4.2. Model Parameter Setting

The decomposed mode data are combined with seismological parameters to form new datasets. For each dataset, the data are split into a training set and a test set in an 8:2 ratio. Then, each column of data in the training and test sets is normalized separately. The normalization operation scales the data to a fixed range to accelerate the search for the optimal solution. The normalized data are input into different deep learning models for training and learning.

In this study, the deep learning models LSTM, BiLSTM, ATT-LSTM, and ATT-BiLSTM are used to conduct experiments on the same dataset to compare and find the model with the best predictive performance. For these four models, both LSTM and BiLSTM have two layers. ATT-LSTM also consists of two LSTM layers, but includes an ATT between the two LSTM layers to capture key features in the data. ATT-BiLSTM is similar to ATT-LSTM.

For each model, we conducted a hyperparameter analysis with the following settings:

−: Epochs: 30, 50, 100;
−: Batch size: 16, 32, 64;
−: Units: 32, 64, 128.

Finally, the PA, FAR, MR, MSE, RMSE, and MAE of the magnitude are compared across different time windows and models.

5. Results

This section first analyzes the prediction results of the model for different time window sizes. Due to space constraints, we will provide a detailed analysis only for the prediction results of the LSTM model. For the other three models, we will evaluate only the time windows with relatively better prediction results. However, all results will be compared together later.

5.1. Analysis of LSTM Model Prediction Results

The prediction results of the LSTM model under different time windows are shown in Figure 9.

When the time window is 6, there are 16 instances of earthquakes with Ms 6.0 or above in the test set, and the model predicted 0 of these events. Consequently, the PA is 0%, the FAR is 0%, and the MR is 100%. For earthquakes with Ms 5.0 or above, there are 44 instances in the test set, with 24 predicted by the model. Thus, the PA is 54.55%, the FAR is 0%, and the MR is 45.45%.

With a time window of 12, there are 16 instances of earthquakes with Ms 6.0 or above in the test set, and the model predicted 1 of these events. Consequently, the PA is 6.25%, the FAR is 0%, and the MR is 93.75%. For earthquakes with Ms 5.0 or above, there are 44 instances in the test set, with 28 predicted by the model. Thus, the PA is 63.64%, the FAR is 0%, and the MR is 36.36%.

For a time window of 18, there are 14 instances of earthquakes with Ms 6.0 or above in the test set, and the model predicted 1 of these events. Consequently, the PA is 6.25%, the FAR is 0%, and the MR is 93.75%. For earthquakes with Ms 5.0 or above, there are 43 instances in the test set, with 31 predicted by the model. Thus, the PA is 72.09%, the FAR is 0%, and the MR is 27.91%.

When the time window is 24, there are 12 instances of earthquakes with Ms 6.0 or above in the test set, and the model predicted 0 of these events. Consequently, the PA is 0%, the FAR is 0%, and the MR is 100%. For earthquakes with Ms 5.0 or above, there are 40 instances in the test set, with 24 predicted by the model. Thus, the PA is 60%, the FAR is 0%, and the MR is 40%.

Among the LSTM model predictions, the best predictive performance, in terms of correct predictions and PA, was achieved with a time window size of 18.

5.2. Analysis of BiLSTM Model Prediction Results

The prediction results of the BiLSTM model under different time windows are shown in Figure 10.

In the BiLSTM model, the best predictive performance, in terms of PA, was achieved with time windows of 12 and 18.

5.3. Analysis of ATT-LSTM Model Prediction Results

The prediction results of the ATT-LSTM model under different time windows are shown in Figure 11.

In the ATT-LSTM model, the best predictive performance was achieved with a time window size of 12.

5.4. Analysis of ATT-BiLSTM Model Prediction Results

The prediction results of the ATT-LSTM model under different time windows are shown in Figure 12.

In the ATT-BiLSTM model, the best predictive performance was achieved with a time window size of 12.

5.5. Overall PA, FAR, and MR

For the results of each model under different time windows, we have summarized the overall PA, FAR, and MR in Table 2 and Table 3.

From Table 2 and Table 3, it can be seen that for earthquakes of magnitude Ms 6.0 or above, the forecast accuracy is generally very low across all time windows and models. The FAR is almost zero, while the MR is over 85%, indicating that for earthquakes of Ms 6.0 or above, the predicted values are generally lower than the actual values, and the forecasting performance is not ideal. For earthquakes of magnitude Ms 5.0 or above and below Ms 6.0, the PA is generally above 60%, the FAR is also nearly zero, and the MR is around 30%. This indicates better forecasting performance for earthquakes of Ms 5.0 or above and below Ms 6.0. Among these, when the time window size is 12, the overall model performance is optimal. This may be because the information contained within a time window of size 12 has a significant impact on the maximum magnitude of earthquakes in the following period. Under this time window, the ATT-BiLSTM model achieved the best results in the study of monthly maximum magnitude earthquake predictions for the North–South Seismic Belt. For earthquakes of magnitude Ms 5.0–6.0, there were 44 events, with 34 predictions being accurate, resulting in a PA of 77.27%, a FAR of 0%, and an MR of 22.73%. For earthquakes of magnitude Ms 6.0 or above, there were 12 events, with two predictions being accurate, resulting in a PA of 12.5%, a FAR of 0%, and an MR of 87.5%. The prediction performance of each dataset section on the test set is shown in Figure 13, and the final prediction results are shown in Figure 14.

From the prediction plots of these six test-set segments, it is evident that the model effectively identifies the general direction of the data trends well for each mode. It quickly captures short-term trends in the data, making it relatively easy for the model to learn the characteristics of individual modes.

From Figure 13, it can be seen that the model has a good prediction performance for earthquakes with magnitudes between Ms 5.0 and 6.0, but the prediction performance is poor for earthquakes with magnitudes of Ms 6.0 and above.

5.6. MSE, RMSE, and MAE

We calculate the MSE, RMSE, and MAE for the different models with different time windows during VMD. The results are shown in Table 4.

The MSE, RMSE, and MAE for the different models with different time windows without VMD are shown in Table 5.

As can be seen from Table 4, the MSE, RMSE, and MAE of different models and different time window sizes do not show significant differences. The data in Table 5 also reflect this. However, overall, the error values after VMD are smaller than those without VMD. There is a slight improvement in terms of MSE, RMSE, and MAE.

6. Discussion

6.1. Using Time Window Sampling for VMD

In this study, the data are decomposed using VMD with a rolling time window approach for single samples. The benefits of this approach are as follows:

1. Capturing Individual Characteristics: Single-sample decomposition better captures the individual characteristics of each sample, avoiding information loss that may occur with overall decomposition. This is particularly important for magnitude prediction because earthquake events are highly localized and random. According to Li et al. [44], the independent decomposition of individual samples using VMD can effectively capture local features, thereby improving the model’s prediction accuracy.

2. Enhancing Model Sensitivity: Since each sample is processed independently, single-sample decomposition enhances the model’s sensitivity to anomalous samples and extreme events. Shi et al. noted that for data with extreme variations, the single-sample decomposition strategy can significantly improve the model’s ability to respond to abnormal changes [45].

3. Reducing Information Leakage: Overall decomposition might lead to information leakage. The single-sample decomposition method effectively avoids the problem of information leakage that can occur with overall decomposition. Information leakage refers to the unintentional use of future sample information in the decomposition of current samples. Although this might lead to high accuracy in predictions, the lack of future data in actual applications could result in poor performance. Research by Proaño et al. [46] shows that single-sample decomposition can effectively avoid information leakage, thus improving the model’s prediction reliability.

6.2. Comparative Analysis of the Effectiveness of VMD in Magnitude Prediction

This paper uses PA, FAR, and MR as criteria for evaluating the optimal model. When applying VMD to earthquake prediction, we found that this method significantly improves the prediction trends and evaluation metrics for earthquakes of magnitude Ms 5.0 and above compared to direct prediction without VMD, and is better at learning the patterns between magnitudes.

In addition, we compared the overall error between methods using VMD and those not using VMD. The experimental results (see Table 4 and Table 5) show that although traditional error metrics such as MSE, RMSE, and MAE did not show significant improvement with VMD compared to direct prediction, the VMD method exhibits a clear advantage in prediction accuracy.

Observations of predictions without VMD reveal that the predicted values fluctuate within a narrow range (see Figure 15), indicating that the model has limited learning capabilities regarding the data. In contrast, predictions made after VMD show a wider range. Although there was no significant improvement in evaluation metrics, VMD better predicts the trends in magnitude changes and is closer to the actual magnitude variations. This suggests that VMD is effective in reducing noise and extracting key features, enabling the model to better capture the inherent patterns in the data, thereby improving the reliability and stability of the predictions.

In addition, traditional error metrics have certain limitations when applied to earthquake magnitude prediction:

1. Sensitivity to Outliers: MSE and RMSE are very sensitive to outliers. The distribution of earthquake magnitudes is often highly irregular and volatile, with a few large-magnitude events potentially causing significant increases in these error metrics. For example, if a predicted value deviates greatly from the actual value, it will substantially increase the overall error.

2. MAE Limitations: MAE applies linear penalties to all error values, which may not fully reflect the model’s performance in handling large-magnitude events [47,48].

In the prediction results of this study, when the time window size was 12, the ATT-BiLSTM model achieved the best prediction performance. However, its MSE, RMSE, and MAE were not the lowest. This is because, without VMD, the model’s learning ability was poor, leading to predictions fluctuating within a narrow range, which made the errors relatively averaged. Our model, on the other hand, learned some of the underlying patterns, which was initially reflected in its predictions. Additionally, we observed that adjacent data in Mode 1 sometimes had strong correlations. When there is a large and infrequent difference between adjacent earthquake magnitudes, the model struggles to learn these patterns, leading to a certain lag in predictions and thus increased numerical errors. Such data points act as “outliers” in the calculation of these metrics, explaining why there was no significant improvement in MSE, RMSE, and MAE. We also attempted methods such as differencing, moving averages, and filtering to mitigate this effect, but did not achieve better results.

6.3. Limitations and Future Development

From the prediction results for each mode, we can see that the model’s ability to capture data trends improved for decomposed data, achieving our initial experimental goal: for nonlinear data with poor time series characteristics, we can first decompose them into relatively stable data and then learn their patterns. Although decomposing the data into multiple columns has reduced the difficulty of finding data patterns, it also introduces the possibility of error accumulation, which may lead to suboptimal final results. While this method has shown some effectiveness in predicting earthquakes between magnitudes Ms 5.0 and 6.0, it also shows limited sensitivity to earthquakes of magnitude Ms 6.0 and above.

1. Seismological Parameters and Data Sources: The seismological parameters used in this study are calculated from earthquake catalog data. Therefore, we are still relying only on catalog data, which may not fully reflect the occurrence of large earthquakes. Prior research has shown that anomalies in thermal infrared data, ionosphere, gravity and magnetic data, and groundwater level changes occur before earthquakes [49,50,51,52,53,54]. Therefore, combining earthquake catalog data with these other types of data might better reflect the impending arrival of large earthquakes and improve forecast accuracy.

2. Limited Samples of Large Earthquakes: The number of earthquakes of magnitude Ms 6 and above in our dataset is relatively small. This limited number of large earthquake samples results in a poorer learning of patterns preceding their occurrence.

3. Model Limitations: The deep learning model used may not effectively capture underlying patterns. Future research could explore more complex and powerful models, such as Transformer, to better learn these patterns.

4. Decomposition Methods: While the VMD method has been effective in transforming poorly timed monthly maximum magnitude data into more stable data, potentially enhancing forecast accuracy, other decomposition methods like Successive Variational Mode Decomposition (SVMD), Feature Mode Decomposition (FMD), and Wavelet Transform might achieve even better results in earthquake prediction research [55,56,57].

5. Regional Segmentation: The current study focuses on the entire North–South Seismic Belt. Future research could segment this region into southern, northern, and central sections for separate analysis. This approach would effectively reduce the impact of inter-regional heterogeneity on the model, clarifying seismic activity characteristics within each sub-region. Independent analysis and prediction for each segment could lead to the more accurate identification of local seismic patterns and improved forecast accuracy.

7. Conclusions

Earthquakes can cause severe damage to people’s lives and property. They have strong nonlinear characteristics, and their patterns are difficult to discern, as their occurrence has always been considered a random event. This paper combines deep learning with VMD to predict the monthly maximum magnitudes of earthquakes in the North–South Seismic Belt region based on earthquake catalog data. The results show that using multiple relatively stable mode data obtained through VMD, along with a new dataset composed of seismological parameters, can enhance the prediction performance of deep learning models. It can effectively capture short-term magnitude changes and is sensitive to trends in earthquake magnitudes. With a time window size of 12, the ATT-BiLSTM achieved the best prediction results, showing good performance for earthquakes of magnitude Ms 5.0–6.0. Although the research results for earthquakes above magnitude Ms 6 are less satisfactory, with the development of satellite technology, more earthquake-related data are being acquired. Deep learning is advancing rapidly and has shown good performance in data mining, and various signal decomposition methods are becoming more refined. It is believed that integrating these approaches may provide some learning capability for understanding certain patterns of earthquake occurrences.

The study area of this paper is the entire North–South Seismic Belt, which is too broad. Future research could consider the impact on specific local areas. For example, when predicting future earthquakes in a particular province, it would be beneficial to not only consider the earthquakes occurring in that specific region but also incorporate data from the North–South Seismic Belt and even the entirety of China to study the seismic patterns of that area. Currently, the focus is on predicting the magnitude of earthquakes. However, predicting earthquake occurrence time, epicenter location, and even focal depth is also crucial for a more comprehensive earthquake prediction task. Future work should involve incorporating multiple parameters, considering more spatial relationships, and using more refined models to better understand the patterns of earthquake occurrences.

Author Contributions

Conceptualization, N.M. and K.S.; methodology, N.M.; software, N.M. and J.Z.; validation, N.M. and K.S.; formal analysis, N.M.; investigation, J.Z.; resources, N.M.; data curation, N.M.; writing—original draft preparation, N.M.; writing—review and editing, N.M., J.Z. and K.S.; visualization, J.Z.; supervision, K.S.; project administration, K.S.; funding acquisition, K.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. U2039202 and the National Key Research and Development Program of China under Grant No. 2023YFC3007303.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this paper are earthquake catalog data, which can be downloaded from the China Earthquake Networks Center: https://news.ceic.ac.cn/index.html.

Acknowledgments

We thank the China Earthquake Networks Center for providing data download access.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Urlainis, A.; Shohet, I.M. A Comprehensive Approach to Earthquake-Resilient Infrastructure: Integrating Maintenance with Seismic Fragility Curves. Buildings 2023, 13, 2265. [Google Scholar] [CrossRef]
Zhang, L.; Si, L.; Yang, H.; Hu, Y.; Qiu, J. Precursory pattern based feature extraction techniques for earthquake prediction. IEEE Access 2019, 7, 30991–31001. [Google Scholar] [CrossRef]
Mignan, A.; Broccardo, M. Neural network applications in earthquake prediction (1994–2019): Meta-analytic and statistical insights on their limitations. Seismol. Res. Lett. 2019, 91, 2330–2342. [Google Scholar] [CrossRef]
Uyeda, S.; Kamogawa, M. The Prediction of Two Large Earthquakes in Greece. Eos Trans. AGU 2008, 89, 363. [Google Scholar] [CrossRef]
Rundle, J.B.; Donnellan, A.; Fox, G.; Crutchfield, J.P.; Granat, R. Nowcasting earthquakes: Imaging the earthquake cycle in California with machine learning. Earth Space Sci. 2021, 8, e2021EA001757. [Google Scholar] [CrossRef]
Panakkat, A.; Adeli, H. Neural network models for earthquake magnitude prediction using multiple seismicity indicators. Int. J. Neural Syst. 2007, 17, 13–33. [Google Scholar] [CrossRef]
Wang, J.H.; Jiang, H.K. Research progress in field of earthquake prediction by machine learning based on seismic data. J. Seismol. Res. 2023, 46, 173–187. [Google Scholar]
Ridzwan, N.S.M.; Yusoff, S.H.M. Machine learning for earthquake prediction: A review (2017–2021). Earth Sci. Inform. 2023, 16, 1133–1149. [Google Scholar] [CrossRef]
Sadhukhan, B.; Chakraborty, S.; Mukherjee, S. Predicting the magnitude of an impending earthquake using deep learning techniques. Earth Sci. Inform. 2023, 16, 803–823. [Google Scholar] [CrossRef]
Li, R.; Lu, X.; Li, S.; Yang, H.; Qiu, J.; Zhang, L. DLEP: A Deep Learning Model for Earthquake Prediction. In Proceedings of the International Joint Conference on Neural Networks, Glasgow, UK, 19–24 July 2020. [Google Scholar]
Asim, K.M.; Martínez-Álvarez, F.; Basit, A.; Iqbal, T. Earthquake magnitude prediction in Hindukush region using machine learning techniques. Nat. Hazards 2017, 85, 471–486. [Google Scholar] [CrossRef]
Wang, Q.; Guo, Y.; Yu, L.; Li, P. Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach. IEEE Trans. Emerg. Top. Comput. 2020, 8, 148–158. [Google Scholar] [CrossRef]
Rafiei, M.H.; Adeli, H. NEEWS: A novel earthquake early warning model using neural dynamic classification and neural dynamic optimization. Soil. Dyn. Earthq. Eng. 2017, 100, 417–427. [Google Scholar] [CrossRef]
Banna, M.H.; Ghosh, T.; Nahian, M.J.; Taher, K.A.; Kaiser, M.S.; Mahmud, M.; Hossain, M.S.; Andersson, K. Attention-Based Bi-Directional Long-Short Term Memory Network for Earthquake Prediction. IEEE Access 2021, 9, 56589–56603. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, Y. A Spatiotemporal Model for Global Earthquake Prediction Based on Convolutional LSTM. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5915712. [Google Scholar] [CrossRef]
Kavianpour, P.; Kavianpour, M.; Jahani, E.; Ramezani, A. A CNN-BiLSTM model with attention mechanism for earthquake prediction. J. Supercomput. 2023, 79, 19194–19226. [Google Scholar] [CrossRef]
Last, M.; Rabinowitz, N.; Leonard, G.; Ebrahimi, M. Predicting the Maximum Earthquake Magnitude from Seismic Data in Israel and Its Neighboring Countries. PLoS ONE 2016, 11, e0146101. [Google Scholar] [CrossRef]
Essam, Y.; Kumar, P.; Ahmed, A.N.; Murti, M.A.; El-Shafie, A. Exploring the reliability of different artificial intelligence techniques in predicting earthquakes for Malaysia. Soil. Dyn. Earthq. Eng. 2021, 147, 106826. [Google Scholar] [CrossRef]
Zhang, B.; Hu, Z.; Wu, P.; Huang, H.W.; Xiang, J.S. EPT: A data-driven transformer model for earthquake prediction. Eng. Appl. Artif. Intell. 2023, 123, 106176. [Google Scholar] [CrossRef]
Öncel Çekim, H.; Karakavak, H.N.; Özel, G.; Tekin, S. Earthquake magnitude prediction in Turkey: A comparative study of deep learning methods, ARIMA and singular spectrum analysis. Environ. Earth Sci. 2023, 82, 387. [Google Scholar] [CrossRef]
Cekim, H.O.; Tekin, S.; Özel, G. Prediction of the earthquake magnitude by time series methods along the East Anatolian Fault, Turkey. Earth Sci. Inform. 2021, 14, 1339–1348. [Google Scholar] [CrossRef]
Debnath, P.; Chittora, P.; Chakrabarti, T.; Chakrabarti, P.; Leonowicz, Z.; Jasinski, M.; Gono, R.; Jasińska, E. Analysis of Earthquake Forecasting in India Using Supervised Machine Learning Classifiers. Sustainability 2021, 13, 971. [Google Scholar] [CrossRef]
Khawaja, M.; Asim, A.; Idris, A.; Iqbal, T.; Martínez-Álvarez, F. Seismic indicators based earthquake predictor system using Genetic Programming and AdaBoost classification. Soil. Dyn. Earthq. Eng. 2018, 111, 1–7. [Google Scholar]
Shan, W.; Zhang, M.; Wang, M.; Chen, H.; Zhang, R.; Yang, G.; Tang, Y.; Teng, Y.; Chen, J. EPM–DCNN: Earthquake Prediction Models Using Deep Convolutional Neural Networks. Bull. Seismol. Soc. Am. 2022, 112, 2933–2945. [Google Scholar] [CrossRef]
Dragomiretskiy, K.; Zosso, D. Variational Mode Decomposition. IEEE Trans. Signal Process 2014, 62, 531–544. [Google Scholar] [CrossRef]
Banjade, T.P.; Liu, J.; Li, H.; Ma, J.M. Enhancing earthquake signal based on variational mode decomposition and S-G filter. J. Seismol. 2021, 25, 41–54. [Google Scholar] [CrossRef]
Sarlis, N.V.; Skordas, E.S.; Mintzelas, A.; Papadopoulou, K.A. Micro-scale, mid-scale, and macro-scale in global seismicity identified by empirical mode decomposition and their multifractal characteristics. Sci. Rep. 2018, 8, 9206. [Google Scholar] [CrossRef] [PubMed]
Granda, F.; Benítez, D.S.; Yépez, F. On the analysis of strong earthquake seismic signals using variational-mode decomposition. In Proceedings of the IEEE Conference on Electrical, Electronics Engineering, Information and Communication Technologies, Valparaiso, Chile, 13–27 November 2019; pp. 1–6. [Google Scholar]
Chi, C.; Li, C.; Han, Y.; Yu, Z.N.; Zhang, D. Pre-earthquake anomaly extraction from borehole strain data based on machine learning. Sci. Rep. 2023, 13, 20095. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Ma, J.; Li, C. Strong earthquake migration characteristics in the North-South seismic belt and their relationship with the South Asian seismic belt. Seismol. Geol. 2007, 01, 1–14. [Google Scholar]
Tian, W.X.; Zhang, Y.X. Earthquake prediction in the North-South seismic belt based on image information methods. Earthquake 2023, 43, 159–177. [Google Scholar]
Kang, L.X. Discussion on the basic characteristics and formation mechanism of the North-South seismic belt in China. Geod. Geodyn. 1991, 04, 76–85. [Google Scholar]
Deng, Q.D.; Zhang, P.Z.; Ran, Y.K.; Yang, X.P.; Min, W.; Chu, Q.Z. Basic characteristics of active tectonics in China. Sci. China Ser. D Earth Sci. 2002, 12, 1020–1030+1057. [Google Scholar] [CrossRef]
Xie, M.Y.; Meng, L.Y. Application of the maximum aftershock magnitude estimation method in the North-South seismic belt. Seismol. Res. 2022, 45, 424–433. [Google Scholar]
Kagan, Y.Y.; Jackson, D.D.; Rong, Y. A testable five-year forecast of moderate and large earthquakes in southern California based on smoothed seismicity. Seismol. Res. Lett. 2007, 78, 94–98. [Google Scholar] [CrossRef]
Kanamori, H. Quantification of Earthquakes. Nature 1978, 271, 411–414. [Google Scholar] [CrossRef]
Narayanakumar, S.; Raja, K. A BP artificial neural network model for earthquake magnitude prediction in Himalayas, India. Circuits Syst. 2016, 7, 3456–3468. [Google Scholar] [CrossRef]
Kail, R.; Burnaev, E.; Zaytsev, A. Recurrent convolutional neural networks help to predict location of earthquakes. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8019005. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Graves, A.; Jaitly, N. Bidirectional LSTM networks for improved phoneme classification and recognition. In Proceedings of the ICASSP, Vancouver, BC, Canada, 26–31 May 2013; pp. 4155–4159. [Google Scholar]
Tay, Y.; Luu, A.T.; Hui, S.C. Multi-pointer co-attention networks for recommendation. In Proceedings of the KDD, New York, NY, USA, 19–23 August 2018; ACM: New York, NY, USA, 2018; pp. 2309–2318. [Google Scholar]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Chen, Y.T. Earthquake prediction: Review and prospects. Sci. China Earth Sci. 2009, 39, 1633–1658. [Google Scholar]
Li, Y.; Xu, F.Y. Acoustic emission sources localization of laser cladding metallic panels using improved fruit fly optimization algorithm-based independent variational mode decomposition. Mech. Syst. Signal Pr. 2022, 166, 108514. [Google Scholar] [CrossRef]
Shi, G.; Qin, C.; Tao, J.; Liu, C. A VMD-EWT-LSTM-based multi-step prediction approach for shield tunneling machine cutterhead torque. Knowl. Based Syst. 2021, 228, 107213. [Google Scholar] [CrossRef]
Proaño, E.; Benítez, D.S.; Lara-Cueva, R.; Ruiz, M. On the use of variational mode decomposition for seismic event detection. In Proceedings of the 2018 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, Mexico, 14–16 November 2018; pp. 1–6. [Google Scholar]
MSE and RMSE: A Clear Guide to Understanding These Evaluation Metrics in Machine Learning. Available online: https://thecontentfarm.net/a-clear-guide-to-understanding-mse-rmse-evaluation-metrics/ (accessed on 2 October 2024). (In Chinese).
Choosing between MAE, MSE and RMSE. Available online: https://hmatalonga.com/blog/choosing-between-mae-mse-and-rmse/ (accessed on 29 March 2023). (In Chinese).
Zhang, J.; Sun, K.; Zhu, J.; Mao, N.; Ouzounov, D. Application of Model-Based Time Series Prediction of Infrared Long-Wave Radiation Data for Exploring the Precursory Patterns Associated with the 2021 Madoi Earthquake. Remote Sens. 2023, 15, 4748. [Google Scholar] [CrossRef]
Cui, Y.; Ouzounov, D.; Hatzopoulos, N.; Sun, K.; Zou, Z.; Du, J. Satellite observation of CH4 and CO anomalies associated with the Wenchuan Ms 8.0 and Lushan Ms 7.0 earthquakes in China. Chem. Geol. 2017, 469, 185–191. [Google Scholar] [CrossRef]
Xiong, B.; Li, X.; Wang, Y.Q.; Zhang, H.M.; Liu, Z.J.; Ding, F.; Zhao, B.Q. Prediction of ionospheric TEC over China based on long and short-term memory neural network. Chin. J. Geophys. 2022, 65, 2365–2377. [Google Scholar]
Varotsos, P.; Sarlis, N.; Skordas, E. Magnetic field variations associated with the SES before the 6.6 Grevena-Kozani earthquake. Proc. Jpn. Acad. Ser. B-Phys. Biol. Sci. 2001, 77, 93–97. [Google Scholar] [CrossRef]
Kang, C.L.; Han, Y.B.; Liu, D.F.; Cao, Z.Q. The OLR anomaly and mechanism before Tibet earthquake (M6.9). Prog. Geophys. 2008, 23, 1703–1708. [Google Scholar]
Senturk, E.; Saqib, M.; Adil, M.A. A multi-network based hybrid LSTM model for ionospheric anomaly detection: A case study of the Mw 7.8 Nepal earthquake. Adv. Space Res. 2022, 70, 440–455. [Google Scholar] [CrossRef]
Nazari, M.; Sakhaei, S.M. Successive Variational Mode Decomposition. Signal Process. 2020, 174, 107582. [Google Scholar] [CrossRef]
Miao, Y.; Zhang, B.; Li, C.; Lin, J.; Zhang, D. Feature Mode Decomposition: New Decomposition Theory for Rotating Machinery Fault Diagnosis. IEEE Trans. Ind. Electron. 2023, 70, 1949–1960. [Google Scholar] [CrossRef]
Daubechies, I. Orthonormal Bases of Compactly Supported Wavelets. Commun. Pure Appl. Math. 1988, 41, 909–996. [Google Scholar] [CrossRef]

Figure 1. A sketch map of the geological structure and the magnitude distribution in the study area.

Figure 2. Original magnitude diagram of the North–South Seismic Belt.

Figure 3. VMD results with a time window size of 12: (a) decomposition results for the first sample; (b) decomposition results for the second sample; (c) decomposition results for the entire dataset.

Figure 4. Flow chart of the experiment.

Figure 5. Basic structure of an LSTM.

Figure 6. Basic structure of a BiLSTM.

Figure 7. ATT-LSTM processing steps.

Figure 8. ATT-BiLSTM processing steps.

Figure 9. Earthquake prediction classification chart at different time windows using LSTM. “M” stands for the number of missed detections, “A” stands for the number of correct detections, and “F” stands for the number of false alarms. (a–d) represent the earthquake prediction classification results at time windows of 6, 12, 18, and 24, respectively.

Figure 10. Earthquake prediction classification chart at different time windows using BiLSTM. “M” stands for the number of missed detections, “A” stands for the number of correct detections, and “F” stands for the number of false alarms. (a–d) represent the earthquake prediction classification results at time windows of 6, 12, 18, and 24, respectively.

Figure 11. Earthquake prediction classification chart at different time windows using ATT-LSTM. “M” stands for the number of missed detections, “A” stands for the number of correct detections, and “F” stands for the number of false alarms. (a–d) represent the earthquake prediction classification results at time windows of 6, 12, 18, and 24, respectively.

Figure 12. Earthquake prediction classification chart at different time windows using ATT-BiLSTM. “M” stands for the number of missed detections, “A” stands for the number of correct detections, and “F” stands for the number of false alarms. (a–d) represent the earthquake prediction classification results at time windows of 6, 12, 18, and 24, respectively.

Figure 13. Prediction result graph of six modes for the ATT-BiLSTM model with a time window of 12. Blue represents the actual values, and red represents the predicted values. (a–e) and (f), respectively, illustrate the predicted and actual values of the model for modes 1 to 6.

Figure 14. Final comparison of the predicted results and the original magnitudes. The black color represents the actual magnitudes, the red color represents the final predicted results, and the gray area represents the region corresponding to the actual magnitudes ±0.5.

Figure 15. Prediction result chart without VMD.

Table 1. Seismological parameters calculated for the 12 months of 1971.

Time Period	Seismological Parameters
Time Period	T (Days)	M_mean	dE^1/2(10⁸ergs)	a	b	$η$	$Δ M$	Lat (°N)	Lon (°E)	M_max
1971.01	294	3.645	3.73	3.60	0.544	0.0036	−1.12	29.03	95.02	4.8
1971.02	321	3.633	3.03	3.71	0.578	0.0057	−0.94	25.27	99.5	5.8
1971.03	332	3.599	3.22	3.72	0.586	0.0035	−0.55	35.5	98.1	6.3
1971.04	330	3.614	4.44	3.63	0.557	0.0023	−0.21	22.8	101.1	6.7
1971.05	310	3.683	10.5	3.34	0.469	0.0041	−0.42	41.0	108.0	4.0
1971.06	300	3.589	10.3	3.34	0.480	0.0093	−0.26	25.12	105.48	4.9
1971.07	283	3.63	10.9	3.38	0.486	0.0059	−0.25	35.83	105.9	3.8
1971.08	290	3.639	10.7	3.39	0.487	0.0054	−0.25	28.8	103.6	5.8
1971.09	164	3.676	21.1	3.24	0.441	0.0072	−0.63	22.95	100.55	5.4
1971.10	149	3.552	8.58	3.49	0.525	0.0085	−0.84	38.0	102.08	4.5
1971.11	175	3.586	7.55	3.47	0.516	0.0077	−0.92	28.82	103.58	4.9
1971.12	163	3.615	8.39	3.46	0.509	0.0069	−0.99	39.98	96.57	4.5

Table 2. Summary results for time windows of 6 and 12. Ms ∈ [5,6): the magnitude is greater than or equal to 5 and less than 6; Ms ∈ [6,8]: the magnitude is greater than or equal to 6 and less than or equal to 8.

Time Window		6			12
Time Window		PA	FAR	MR	PA	FAR	MR
Ms ∈ [5,6)	LSTM	54.55%	0	45.45%	63.64%	0	36.36%
	BiLSTM	59.09%	0	40.91%	65.91%	0	34.09%
	ATT-LSTM	47.73%	0	52.27%	72.73%	0	27.27%
	ATT-BiLSTM	63.64%	0	36.36%	77.27%	0	22.73%
Ms ∈ [6,8]	LSTM	0	0	100%	0	0	100%
	BiLSTM	0	0	100%	6.25%	0	93.75%
	ATT-LSTM	0	0	100%	12.5%	0	87.5%
	ATT-BiLSTM	0	0	100%	12.5%	0	87.5%

Table 3. Summary results for time windows of 18 and 24. Ms ∈ [5,6): the magnitude is greater than or equal to 5 and less than 6; Ms ∈ [6,8]: the magnitude is greater than or equal to 6 and less than or equal to 8.

Time Window		18			24
Time Window		PA	FAR	MR	PA	FAR	MR
Ms ∈ [5,6)	LSTM	72.09%	0	27.91%	60%	0	40%
	BiLSTM	67.44%	0	32.56%	47.5%	2.5%	50%
	ATT-LSTM	62.79%	0	37.21%	57.5%	0	42.5%
	ATT-BiLSTM	65.12%	0	34.08%	62.5%	0	37.5%
Ms ∈ [6,8]	LSTM	6.25%	0	93.75%	0	0	100%
	BiLSTM	0	0	100%	0	0	100%
	ATT-LSTM	0	0	100%	0	0	100%
	ATT-BiLSTM	6.25%	0	93.75%	0	0	100%

Table 4. Calculation results during VMD.

Time Window	6			12
Time Window	MSE	RMSE	MAE	MSE	RMSE	MAE
LSTM	0.699	0.836	0.638	0.683	0.827	0.622
BiLSTM	0.666	0.816	0.623	0.717	0.847	0.650
Att-LSTM	0.718	0.846	0.645	0.710	0.842	0.639
Att-BiLSTM	0.708	0.841	0.666	0.678	0.824	0.635
	18			24
	MSE	RMSE	MAE	MSE	RMSE	MAE
LSTM	0.685	0.828	0.622	0.711	0.843	0.625
BiLSTM	0.674	0.821	0.623	0.745	0.863	0.658
Att-LSTM	0.679	0.824	0.609	0.681	0.826	0.621
Att-BiLSTM	0.695	0.834	0.622	0.690	0.831	0.638

Table 5. Calculation results without VMD.

Time Window	6			12
Time Window	MSE	RMSE	MAE	MSE	RMSE	MAE
LSTM	0.758	0.871	0.658	0.725	0.851	0.640
BiLSTM	0.782	0.884	0.670	0.706	0.840	0.655
Att-LSTM	0.707	0.841	0.632	0.705	0.840	0.625
Att-BiLSTM	0.738	0.859	0.652	0.724	0.851	0.641
	18			24
	MSE	RMSE	MAE	MSE	RMSE	MAE
LSTM	0.724	0.850	0.644	0.755	0.869	0.657
BiLSTM	0.760	0.871	0.673	0.725	0.851	0.653
Att-LSTM	0.731	0.855	0.653	0.700	0.836	0.637
Att-BiLSTM	0.734	0.857	0.649	0.694	0.833	0.636

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mao, N.; Sun, K.; Zhang, J. Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning. Appl. Sci. 2024, 14, 9001. https://doi.org/10.3390/app14199001

AMA Style

Mao N, Sun K, Zhang J. Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning. Applied Sciences. 2024; 14(19):9001. https://doi.org/10.3390/app14199001

Chicago/Turabian Style

Mao, Ning, Ke Sun, and Jingye Zhang. 2024. "Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning" Applied Sciences 14, no. 19: 9001. https://doi.org/10.3390/app14199001

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Monthly Maximum Magnitude Prediction in the North–South Seismic Belt of China Based on Deep Learning

Abstract

1. Introduction

2. Data and Data Preprocessing

2.1. Study Area

2.2. Data

2.3. Data Preprocessing

3. Methods

3.1. VMD

3.2. Deep Learning Model

3.2.1. LSTM

3.2.2. BiLSTM

3.2.3. Attention Mechanism

3.2.4. ATT-LSTM/ATT-BiLSTM

3.3. Evaluation Metrics

4. Experiment

4.1. VMD Processes the Data

4.2. Model Parameter Setting

5. Results

5.1. Analysis of LSTM Model Prediction Results

5.2. Analysis of BiLSTM Model Prediction Results

5.3. Analysis of ATT-LSTM Model Prediction Results

5.4. Analysis of ATT-BiLSTM Model Prediction Results

5.5. Overall PA, FAR, and MR

5.6. MSE, RMSE, and MAE

6. Discussion

6.1. Using Time Window Sampling for VMD

6.2. Comparative Analysis of the Effectiveness of VMD in Magnitude Prediction

6.3. Limitations and Future Development

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI