A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position

Cao, Ying; Wang, Xi; Zhu, Li; Wang, Hongwei; Wang, Xiaoning

doi:10.3390/su15118731

Open AccessArticle

A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position

by

Ying Cao

¹,

Xi Wang

¹,

Li Zhu

^2,*,

Hongwei Wang

³ and

Xiaoning Wang

⁴

¹

The School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing 100044, China

²

The State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing 100044, China

³

The National Research Center of Railway Safety Assessment, Beijing Jiaotong University, Beijing 100044, China

⁴

The School of Data Science and Media Intelligence, Communication University of China, Beijing 100024, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(11), 8731; https://doi.org/10.3390/su15118731

Submission received: 16 January 2023 / Revised: 3 May 2023 / Accepted: 15 May 2023 / Published: 29 May 2023

(This article belongs to the Special Issue Application of Big Data in Energy-Efficient Management of Rail Systems)

Download

Browse Figures

Versions Notes

Abstract

:

The train dynamics modeling problem is a challenging task due to the complex dynamic characteristics and complicated operating environment. The flexible formations, the heavy carriage load, and the nonlinear feature of air braking further increase the difficulty of modeling the dynamics of heavy haul trains. In this study, a novel data-driven train dynamics modeling method is designed by combining the attention mechanism (AM) with the gated recursive unit (GRU) neural network. The proposed learning network consists of the coding, decoding, attention, and context layers to capture the relationship between the train states with the control command, the line condition, and other influencing factors. To solve the data insufficiency problem for new types of heavy haul trains to be deployed, the model agnostic meta-learning (MAML) framework is adopted to achieve knowledge transferring from tasks supported by large amounts of field data to data-insufficient tasks. Effective knowledge transfer can enhance the efficiency of data resource utilization, reduce data requirements, and lower computational costs, demonstrating considerable potential in the application of sustainable development. The simulation results validate the effectiveness of the proposed MAML-based method in enhancing accuracy.

Keywords:

dynamics modeling; gated recurrent unit; model agnostic meta-learning; attention mechanism

1. Introduction

1.1. Background

Nowadays, automatic train operation (ATO) systems are widely used in train control systems, such as the Chinese Train Control System (CTCS) and the European Train Control System (ETCS) [1]. As a crucial part of the ATO system, the train dynamics model plays an important role in ensuring the performance of ATO algorithms [2]; at the same time, the train dynamics modeling is key for the monitoring of infrastructures [3]. The train composition has a direct impact on the dynamic performance of high-speed trains and subways [4,5]. Compared with high-speed trains and subways, the dynamic characteristics of heavy haul trains are more complex due to the flexible formations, the heavy carriage load and the application of air braking in steep lines [6,7], all of which make the dynamic modeling of heavy haul trains a challenging task [8,9].

Existing studies of train dynamics models use mechanics to resume the vehicle motion model on the track, and such physically driven models usually involve the estimation and identification of parameters. In [10,11], the parameters of acceleration and deceleration during vehicle travel were estimated, and in [12], a nonlinear function was proposed, taking into account the acceleration, deceleration and frictional resistance of the train. An efficient and comprehensive alternative model of set-valued friction based on Coulomb’s law was defined in [13] to quantify the train braking dynamics [14], and others identified the friction coefficient of the track as well as the components of drag and gravity to represent the internal train forces. The main drawback of the physically driven model is that the accurate identification of the model parameters is very complex due to the complexity of the train operating environment [15,16]. The presence of non-linear factors brings strong constraints when solving problems, which requires linearization in the modeling process [17,18,19,20], as well as extensive experience and repeated validation by railroad engineers in field experiments [21]. In addition, the robustness and generalization capability of the model is limited due to some theoretical assumptions and capricious scenarios. In practice, by changing a certain condition, railroad engineers must dynamically adjust a series of model parameters. Due to the above problems, it is more difficult to use traditional dynamic modeling methods for heavy haul trains with complex operating conditions.

1.2. Literature Review

In recent years, data-driven techniques based on neural networks have been used with success in a variety of challenging prediction problems [22,23,24,25]; they have performed even better for urban rail train speed prediction.

It is worth mentioning that some data-driven algorithms have been introduced in the field of train operation and control modeling [26,27]. In [28], three data-driven train operation models were proposed, and the heuristic train parking algorithm was used to improve the models to ensure parking accuracy. Yin et al. used three data-driven methods to build train control models based on explicit train operation data of the Beijing subway over a three-year period and compared the different models [29]. Li et al. developed a long and short-term memory network to construct a training dynamics model in a nonparametric manner [30]. Wang et al. developed a hybrid learning model based on deep learning to model the velocity trajectory within a finite horizon [31]. Based on the above inspiration, we try to use data-driven techniques for modeling heavy haul train dynamics. Existing deep learning techniques excel at using back propagation algorithms to discover complex structures in large data sets [32]. At the same time, we find that when heavy haul trains of different load classes perform their transportation tasks between the exact origin and destination, the on-board recording system records the status of the vehicle and the driving instructions in real time. We can obtain specific historical operation data of heavy haul trains through the relevant railroad departments. This provides the primary conditions for using data-driven techniques to solve heavy haul train modeling problems. The actual operating data of heavy haul trains are highly complex and irregular, making it difficult to achieve high-accuracy predictions using traditional methods, while deep learning, a common means of dealing with uncertainty, shows advantages in the face of complex data and uncertain inputs [33,34]. Meanwhile, we note that the dynamics modeling problem input variables are time ordered and correlated between variables, which indicates that dynamics modeling is a time-series-related problem. Further, because realistic train dynamics modeling is influenced by many factors, implementing heavy haul train dynamics modeling is a time-series-correlated prediction problem with multivariate inputs.

The gated recurrent unit (GRU) is often used on issues related to time sequence because they can handle long-term dependencies well [35]. Wang et al. proposed a GRU network-based method for short-term PV generation forecasting [36]. Dang et al. proposed a novel GRU-based neural network framework to predict stock price movements based on historical financial information combined with a sentiment dictionary by verifying that the model is helpful for the stock industry [37]. In recent years, the attention mechanism (AM) has proven successful in many research tasks and has been implemented by building neural networks based on the study. The attention mechanism allows for better feature extraction by focusing attention on the differences in the input features. Yang et al. proposed a deep-learning-based attentional bearing fault diagnosis model that is more explanatory than other models by introducing an attentional mechanism in the bearing fault diagnosis [38]. Ran et al. placed the AM in the output layer of a long and short-term neural network to improve network performance and used the network for travel time prediction [39]. Hamdan et al. applied the self AM to a single-target model for text generation to improve the realism of synthetic single-target images and to compensate for the shortcomings of previous models [40]. Therefore, to solve the problem of modeling heavy haul train dynamics, a multivariate time series prediction model based on the GRU is developed. Then, an attention mechanism is introduced into the encoding–decoding framework. The introduced attention mechanism can quantify the importance of each specific time step in the data sequence features, improving the drawback of traditional GRU distraction.

Heavy haul trains have a variety of train configurations, train types, and load types; these elements are paired together to form multiple styles of heavy haul trains. Due to the limitations of the actual operation of heavy haul trains, the study of new types of heavy haul train dynamics modeling problems is often faced with the problem of insufficient data. However, deep learning efforts require large datasets to support learning [32]. Therefore, it is essential to understand how to model the dynamics of new types of heavy haul trains by drawing on the experience of trains already operating in large numbers. Meta-learning has received much attention for its excellent ability to solve few-shot learning problems. Few-shot learning originally came from machine vision, which aims to identify new data types from a minimal number of labeled examples [41]. Data augmentation and regularization techniques can alleviate the problem of model overfitting due to limited data by expanding the original dataset, but data augmentation techniques perform well on specific datasets and are not universally applicable. After that, knowledge-migrating few-shot learning gradually entered the picture, usually by allowing neural networks to learn the network’s prior knowledge in a meta-assisted learning phase with tasks supported by a large amount of data, which is tasked with learning transferable knowledge that can be good model initialization parameters [42], feature embeddings [43,44], optimization strategies [45] or update functions and learning rules [46,47]. The learned network architecture with prior knowledge is then used in the fine-tuning phase for our target task [48,49]. The biggest feature of meta-learning is to train the model in advance on the learning task with sufficient data, and then only use a small amount of training sample data to fine-tune the model to adapt to the new learning task. In particular, model agnostic meta-learning (MAML) is of interest because of its unrestricted training model.

1.3. Contributions

Motivated by the above discussion, we can determine that this study aims to design a novel neural network dynamic model that can implement multivariate time series prediction for a limited amount of data to build a heavy haul train dynamic model. Firstly, to improve the accuracy of the heavy haul train dynamics model, a multivariate time series prediction model with the GRU network under the attention mechanism, namely AMGRU, is designed and implemented for the multivariate input problem of heavy haul trains. Moreover, to address the issue of insufficient data during the actual model building process, a model-independent meta-learner is constructed to allow it to learn the initialization parameters of the AMGRU network model between source tasks to adapt to the destination task quickly. Specifically, the main contributions of this work are presented as follows:

1. By introducing an attention mechanism into the GRU network, the GRU network can focus on the differences between multiple input time series features and thus quickly concentrate on the main influences of dynamics modeling. In this way, the shortcomings of traditional GRU networks in terms of attention scattering are improved, and the accuracy of model training is enhanced.

2. A meta-learner for inter-task transfer learning is constructed to improve the accuracy of building dynamic models under low data conditions. At the same time, a MAML framework is introduced, which focuses more on the potential of model initialization parameters than traditional transfer learning. A learning AMGRU network with prior knowledge using the MAML framework will quickly adapt to tasks supported by small amounts of data, which is more reliable than traditional generative data-enhancement methods.

1.4. Scope and Assumptions

The focus of the study is primarily on accurately predicting the speed and position of the train at the next moment, based on its current operating state and real-time track conditions.

In this study, the following three assumptions are made: First, in this study, the train is abstracted as a mass point. The research employs data-driven methods to construct the train model, and due to the characteristics of the dataset, the train is abstracted as a mass point, with a greater focus on the longitudinal motion of the train. Second, in this study, the interaction forces between the track system and the train are not taken into consideration. The data are collected from the onboard micro-computer, and the data related to the interaction forces between the track system and the train are not included in the collected dataset. Third, the track system is not considered a beam system in this research. Due to the lack of data related to the interaction forces between the track system and the train, further information, such as material properties and support conditions, cannot be obtained. Consequently, the track is not considered a beam system.

The rest of this paper is organized as follows. Section 2 gives the detailed design process of the AMGRU network. Section 3 describes the design process of the meta-learner, and the identification of the meta-task. Section 4 gives numerical examples to illustrate the effectiveness of the proposed approach. Finally, Section 5 concludes the study.

2. The GRU Network Based on the AM

In this section, we first present the problem to be solved, then design and implement an AMGRU network to achieve the dynamic modeling of the heavy haul train.

The information we obtain can be divided into two parts concerning the problem of modeling heavy haul train dynamics. The first part is a lot of historical field data collected by the on-board equipment relating to the state of the heavy train in real time, including position, speed, the direction of travel, running time, pipe pressure, and motor speed. The second part is the track surface information, including the track surface’s gradient, curvature, speed limits, etc. Using the two parts of information as the basic information for modeling, we define the variable

E_{t}

which represents the train operating state at time t.With the combined effect of non-linear factors, it can be expressed as

E_{t} = [x_{t}, v_{t}, g_{t}, z_{t}, l_{t}, q_{t}, p_{t}, w_{t}, d_{t}]

, where

x_{t}

represents the train position,

v_{t}

is the train speed,

g_{t}

and

z_{t}

are the train tube pressure state and the train motor speed rating, respectively,

q_{t}

and

p_{t}

are the track surface curvature and track surface gradient of the section the train is on, respectively,

w_{t}

indicates the total weight of the train,

d_{t}

indicates the direction of travel for the heavy haul train, and

l_{t}

indicates the current train speed limit. In the speed limit information, the impact of turnouts on train dynamics is also included. Speed restriction areas are set according to the specific types of the turnouts and the length of a turnout area which are represented as in the input sequence

E_{t}

. Define the variable

y_{t} = [x_{t}, v_{t}]

to represent the speed and position information of the train at moment t. Using

N

to denote the mapping function from the current train state to the next train state, we obtain

\begin{matrix} [x_{t + 1}, v_{t + 1}] = N [E_{t - n + 1} \dots E_{t + 2}, E_{t + 1}, E_{t}], \end{matrix}

(1)

where n denotes the length of the sliding window input sequence. Research has shown that it is feasible to regress

N

using historical data from trains [29].

Therefore, after the above discussion, it is clear that our task is to build a heavy haul train dynamics model, where train speed and position information can be predicted from information, such as train position, speed, wind cylinder pressure, motor speed, rail slope, curvature, section speed limit, load and direction of travel. Afterward, the discrete signal generated by the deep learning model is turned into a continuous signal using a zero-order hold model to obtain the final dynamics model.

Because various factors influence the actual operation of heavy haul trains, we define the heavy haul train dynamics modeling problem as a time-series problem with multivariate inputs. We first think of using GRU network modeling to solve problems related to time series. Although traditional GRU networks can help capture long-term dependencies, they do not do so by giving different levels of attention to features. This results in the standard GRU not detecting which part of the influences are significant in the dynamics modeling problem. Failure to differentiate the importance of the consequences will directly affect the effectiveness of the model training. To address this problem and achieve the more accurate modeling of heavy haul train dynamics, we propose an AMGRU model to improve the accuracy of model building by enhancing the ability to focus on different influencing factors. At the same time, to make the model building more accurate, the AMGRU model is formulated in more detail as a network architecture containing an encoding layer, an attention mechanism layer, a context layer, and a decoding layer.

2.1. GRU Networks

Recurrent neural networks (RNNs) are often used to complete the task of time series forecasting because they have looped internal storage units. However, as the depth of the inner ring increases, the gradient is dominated by the near gradient, which makes it difficult for the model to learn long-distance dependencies [50]. The GRU and long short-term memory (LSTM) are excellent variants of the RNN, both of which can improve the network’s ability to learn long-term dependencies by using feedback connection teaching to solve the gradient disappearance problem [51]. Studies show that the learning effects of the LSTM and GRU are comparable [52,53], and GRU uses fewer parameters. Assuming that the input vector of a problem has m dimensions and the hidden state has m dimensions, the network parameters required for modeling with RNN are

n^{2} + m n + n

, those for modeling with the LSTM are

4 (n^{2} + m n + n)

, and those for modeling with the GRU are

3 (n^{2} + m n + n)

[54]. During the training process, the different gate structures of the GRU can store long sequences of information that have a large impact on the modeling process of heavy haul train dynamics to be learned, and the useless information can be filtered in time to improve the learning performance of the network model.

According to [52,53], and combined with the connections shown in Figure 1, the update process of the GRU network is as follows: First, the gating states of the reset

R_{t}

and update

Z_{t}

gates are obtained by using the sigmoid function of the previous transmission state

H_{t - 1}

, and the current input

E_{t}

. The gating status is indicated as follows:

\begin{matrix} R_{t} & = & σ (W_{R} \cdot [H_{t - 1}, E_{t}]), \end{matrix}

(2)

\begin{matrix} Z_{t} & = & σ (W_{Z} \cdot [H_{t - 1}, E_{t}]) . \end{matrix}

(3)

Afterward, the “reset” data are obtained by resetting the gating, splicing the product of

H_{t - 1}

and

R_{t}

with

E_{t}

, and then using a hyperbolic tangent activation function to deflate the data to a range of −1 to 1 to obtain the hidden variable

{\tilde{H}}_{t}

. Here,

{\tilde{H}}_{t}

mainly contains the

E_{t}

data of the current input. The “memorizing the current state” function is achieved by selectively adding

{\tilde{H}}_{t}

to the current hidden state. The above process is expressed using the following equation:

\begin{matrix} {\tilde{H}}_{t} & = & tanh (W \cdot [R_{t} \cdot H_{t - 1}, E_{t}]), \end{matrix}

(4)

Finally, we introduce one of the most critical steps of GRU, which we can call the “update memory” phase, where the update gate

Z_{t}

represents the coefficients of the hidden variables, and as

Z_{t}

gets larger, more of the hidden variables are retained in the final output:

\begin{matrix} H_{t} = (1 - Z_{t}) \cdot H_{t - 1} + Z_{t} \cdot {\tilde{H}}_{t} . \end{matrix}

(5)

The state of the GRU cell

H_{t}

is generated by a linear combination of the hidden variable

{\tilde{H}}_{t}

and the historical information

H_{t - 1}

. The linear combination mode inherits from the storage state update mode of the LSTM [54], which can well balance the transformation effect and calculation amount of the network. In Equations (2)–(5),

W_{R}

,

W_{Z}

, and W are the weight matrices,

σ

is the Sigmoid function, and the tanh function is the hyperbolic sine function. They are defined in Equations (6) and (7), respectively:

\begin{matrix} σ (x) & = & \frac{1}{1 + e^{- x}}, \end{matrix}

(6)

\begin{matrix} tanh (x) & = & \frac{e^{x} - e^{- x}}{e^{x} + e^{- x}} . \end{matrix}

(7)

2.2. Attention Networks

The attention mechanism has recently shown success in many tasks [55,56,57]. It is achieved by constructing neural networks based on the corresponding functions. The attentional neural network model helps select only the previous layers’ crucial inputs for each subsequent step in the model.

Once we have calculated the importance of each coding vector, we normalize these vectors using the softmax function and multiply each coding vector according to its weight to obtain a time-dependent input code, which is then fed to each step of the decoder GRU.

Several types of attention depend on how the information is selected. The mechanism of weighting the average of all the input information when selecting the data is called “soft attention.” The mechanism that selects only the information at a specific position in the input sequence, such as choosing a message at random or selecting the message with the highest probability, is called “hard attention.” The difference is that using a soft attention model becomes smooth and differentiable, but the computational cost becomes high as the input increases. Hard attention reduces the computation cost, but because the model is usually non-trivial, it usually requires more sophisticated techniques to train.

In our approach, we use soft attention as the implementation method for attention. Unlike other common soft attention mechanisms, the output vector is not obtained directly from the last hidden state. Instead, it is embedded into the GRU network and weighted across all hidden states to focus attention on the more important hidden state information within the entire input sequence before obtaining the output vector.

Let

H \in R^{d \times N}

be a matrix consisting of hidden vectors that the GRU produced, where d denotes the hidden layer size, and N denotes the length of the data. Use v to represent the attentional aspect of the embedding, and

e_{N} \in R^{N}

is a vector of 1 s.

α

denotes a vector of attention on the feature H, and r denotes the final output of the attentional neural network, representing the weighted sum H of features based under attention. The attentional neural network is shown in Figure 2.

Inspired by [38], the transfer function can then be expressed in Equations (8)–(10) as follows:

\begin{matrix} M & = & tanh ([\begin{matrix} W_{h} H \\ W_{v} v_{a} \otimes e_{n} \end{matrix}]), \end{matrix}

(8)

\begin{matrix} α & = & s o f t m a x (w^{T} M), \end{matrix}

(9)

\begin{matrix} r & = & H α^{T}, \end{matrix}

(10)

where

M \in R^{(d + d_{a}) \times n}

,

r \in R^{d}

,

W_{h} \in R^{d \times d}

,

W_{v} \in R^{d_{a} \times d_{a}}

and

w \in R^{d + d_{a}}

are projection parameters. The ⊗ operator means that v will be repeated N times.

2.3. Attention-Based GRU Networks

Based on the encoder–decoder architecture, we design the AMGRU. Compared with the traditional GRU structure, the AMGRU network can capture the key factors affecting the dynamics modeling, and its structure is more complex. The network structure of AMGRU is shown in Figure 3.

The input data for the first n moments given on any task are fed into the AMGRU network. Take the input data

E_{t} = [x_{t}, v_{t}, t_{t}, g_{t}, z_{t}, l_{t}, e_{t}, p_{t}, q_{t}, d_{t}]

at time t, where

x_{t}

,

v_{t}

and

t_{t}

represent the train position, train speed, and pipe pressure, and

g_{t}

,

z_{t}

,

l_{t}

,

e_{t}

,

p_{t}

,

q_{t}

and

d_{t}

represent the cylinder pressure size, cylinder pressure state, speed limit value, motor speed, slope value of the road surface, curvature of the road section and the up and down direction information influencing factors in that order. The hidden state of the i-th influencing factor data at moment t is calculated by GRU network coding, after which the attention mechanism layer is set up to compare the hidden state

h_{t}^{i}

of the i-th influencing factor at moment t with the output

q_{t - 1}^{i}

at moment

t - 1

; the degree of matching between the two is calculated by the alignment model to further derive the attention score. The higher the degree of matching, the higher the attention score of the current influencing factor. The attention score calculation is as follows, Equation (11):

\begin{matrix} m_{t}^{i} = η (q_{t}^{i - 1}, h_{t}^{i}), \end{matrix}

(11)

where the

η ()

function is the alignment model, which calculates the degree of matching, and m denotes the attention score of the ith influencing factor at moment t. After obtaining the attention scores of each influencing factor, each influencing factor’s attention distribution weight vector is obtained by normalising the attention scores through Equation (12) using the softmax function:

\begin{matrix} α_{t}^{i} = \frac{exp (m_{t}^{i})}{\sum_{j = 1}^{j} exp (m_{t}^{j})}, \end{matrix}

(12)

where j represents the number of influencing factors. We set up a context layer, where the weight vector of attention distribution weights of the obtained influencers and the hidden states of the influencers are weighted and summed to obtain the context vector

C_{t}

, i.e., the attention values of all influencers:

\begin{matrix} C_{t}^{} = \sum_{i = 1}^{n} α_{t}^{i} h_{t}^{i}, \end{matrix}

(13)

where n is the number of influencing factors.

Afterwards, the same process is performed on the data of the first n moments except for moment t to obtain the attention values of the influencing factors at moment n. The decoding layer is set up, and the context vectors obtained at the first n moments are fed into the fully connected layers (FCs) of the decoding layer to predict the position and velocity information at

n + 1

moments. The AMGRU uses Equations (11)–(13) to perform transitions between states in order to achieve the model construction. Furthermore, to prevent overfitting and ensure the generalization ability of the model, during the training process of the AMGRU network, Dropout is used to randomly set some of the neuron outputs to 0, so that the network can train different subsets in each iteration, thereby reducing the AMGRU network’s reliance on certain specific features and improving the generalization ability of the model.

3. The Meta-Learner Design

The actual situation of heavy haul train operations shows a large amount of data on the operation of heavy haul trains in existing varieties. However, there are little data on the process of operating heavy trains in new types. Still, we must conduct exploratory research on heavy haul trains in new types, which motivates us to design an effective deep learning scheme to improve the prediction accuracy of heavy haul train dynamics models in new types by learning a priori knowledge from existing varieties of heavy haul train tasks supported by a large amount of data available and then migrating them to new types of heavy haul trains. To address the problem that insufficient training data for dynamic modeling of new types of heavy haul trains limit the prediction accuracy of the learning algorithm, we use the MAML framework. Firstly, the meta-learner trains the model initialization parameters on a task supported by a large amount of data. The trained model has maximum performance on a task supported by a small amount of data after updating the parameters in one or more gradient steps. This is in line with the goal of meta-learning, which is to train the model in advance on a learning task with a sufficient amount of data, and later fine-tune the model to suit the new learning task using only a small amount of training sample data.

3.1. Design of the Meta-Task

Traditional machine learning is trained by feeding the model data, which can be divided into a training set, a test set and a validation set, depending on the purpose of the data. Meta-learning differs from traditional machine learning in that it gives the model many tasks, which can be divided into training and testing tasks, called the support set and query set in meta-learning. The differences are described in the following Table 1.

As there are various formations, train types, and load types for heavy haul trains, we want to draw on the experience of heavy haul trains already operating in large numbers to model the dynamics of heavy haul trains in new types. Therefore, for the selection of the meta-tasks, we use the heavy haul train dynamics modeling task with a large amount of operational data as the source of the dataset for the training stage and the new train modeling task with a small amount of operational data as the source of the dataset for the testing stage. The training process of the MAML is shown in Figure 4. In the training phase, the network model is trained on a dataset consisting of existing types of heavy haul train modeling tasks to obtain the optimal network model for the training phase, denoted as

{w^{*}}_{t r a i n}

, which has the “prior knowledge” of the network model. Later, in the testing phase, the model is further adapted using data points sampled from the new types of heavy haul train modeling task to obtain the network model

{w^{*}}_{n e w}

, and the final model performance is evaluated by a loss function as

L_{t e s t} ({w^{*}}_{n e w})

.

3.2. The AMGRU Network Based on the MAML Framework

Because the MAML framework does not have specific model constraints, we can choose an AMGRU model for modeling the dynamics that fits better with the data of the heavy haul train. Our desired function is to pre-train the regression model with a learning task consisting of existing types of heavy haul train data, eventually using only a small number of training tasks to achieve a model of heavy haul train dynamics in new types.

The AMGRU model is represented using a function

f_{θ}

with

θ

arguments, aiming to adapt to a new task n with stochastic gradient descent (SGD). When adapting to a new task n, the model’s parameters will change from

θ

to

{θ_{n}}^{'}

employing gradient descent. This process can be expressed as

\begin{matrix} \begin{matrix} min_{θ} {L_{T}}_{t e s t (n)} (f_{{θ_{n}}^{'}}) = {L_{T}}_{t e s t (n)} (f_{θ} - α \nabla_{θ} {L_{T}}_{t r a i n (n)} (f_{θ})), \end{matrix} \end{matrix}

(14)

and the training process of the MAML based on AMGRU with SGD is shown in Figure 5.

The optimization between tasks uses stochastic gradient descent, and the optimization process uses the following equation:

\begin{matrix} θ \leftarrow θ - β \nabla_{θ} \sum_{T_{n} \sim P (T)} {L_{T}}_{n} (f_{{θ_{n}}^{'}}), \end{matrix}

(15)

where

β

is the direct learning rate of the task. The two commonly used loss functions are cross entropy and mean squared error (MSE), and for the regression task, the loss function chosen is MSE, with a loss of the following form:

\begin{matrix} {L_{T}}_{n} (f_{ψ}) = \sum_{{i n}^{m}, {o u t}^{m} \sim P (T)} {∥f_{ψ} ({i n}^{m}) - {o u t}^{m}∥}_{2}^{2}, \end{matrix}

(16)

where

{i n}^{m}

,

{o u t}^{m}

are an input/output pair sampled from task

T_{n}

. In the case of a classification task, the loss function is a cross-entropy loss:

\begin{matrix} \begin{matrix} {L_{T}}_{n} (f_{ψ}) = \sum_{{i n}^{m}, {o u t}^{m} \sim P (T)} {{o u t}^{m} log f}_{ψ} ({i n}^{m}) + (1 - {o u t}^{m}) log (1 - f_{ψ} ({i n}^{m}) . \end{matrix} \end{matrix}

(17)

Equations (14)–(17) describe the principle of the MAML algorithm [43]. In the general case, the complete Algorithm 1 is outlined below:

Algorithm 1 The principle of meta-learning algorithm

: Step 1. Define the within-task learning rate as $α$ and the between-task learning rate as $β$ ;
: Step 2. Initialize tasks $P (T)$ , random initialization parameters $θ$ , $θ$ including $W_{G}$ , $W_{A}$ , and $W_{F}$ , which are the weight sets for the GRU, AM, and FC networks, respectively;
: while not done do
: Sample batch of tasks $T_{n} \sim P (T)$ ;
: for all $T_{n}$ do
: step 2.1 Sample K datapoints $ℜ = \{{i n}^{m}, {o u t}^{m}\}$ from $T_{n}$ ;
: step 2.2 Evaluate $\nabla_{θ} {L_{T}}_{n} (f_{θ})$ using ℜ and ${L_{T}}_{n}$ in the Equation (16);
: step 2.3 Compute adapted parameters with gradient descent: $θ_{n}^{'} = θ - α \nabla_{θ} {L_{T}}_{n} (f_{θ})$ ;
: step 2.4 Sample datapoints ${ℜ_{n}}^{'} = \{{i n}^{m}, {o u t}^{m}\}$ from $T_{n}$ for the meta-update;
: end for
: step 2.5 Update $θ \leftarrow θ - β \nabla_{θ} \sum_{T_{n} \sim P (T)} {L_{T}}_{n} (f_{{θ_{n}}^{'}})$ using each ${ℜ_{n}}^{'}$ and ${L_{T}}_{n}$ in the Equation (16);
: end while
: Step 3. Output the weight sets $W_{G}$ , $W_{A}$ , and $W_{F}$ as the parameters for the AMGRU model.

4. Experiment Results

In this section, we use actual heavy haul train data to verify the validity of the dynamics model. Specifically, we use field data from the operation of an 11,600-ton Harmony heavy haul train from Shenchinan Station to Suning Station. The collected field data are first selected to remove irrelevant information, after which the data are normalized. In this part of the numerical experiment, we repeat each experiment 50 times and calculate the mean and standard deviation of MAE (mean absolute error), RMSE (root mean square error) and

R^{2}

(R-Square) respectively, to evaluate the experimental results:

\begin{matrix} R M S E & = & \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{t}^{i} - {\hat{y}}_{t}^{i})}^{2}}, \end{matrix}

(18)

\begin{matrix} M A E & = & \frac{1}{N} \sum_{i = 1}^{N} |y_{t}^{i} - {\hat{y}}_{t}^{i}|, \end{matrix}

(19)

\begin{matrix} R^{2} & = & 1 - \frac{\sum_{i} (y_{t}^{i} - {\hat{y}}_{t}^{i})}{\sum_{i} ({\bar{y}}_{t}^{i} - y_{t}^{i})} \end{matrix}

(20)

where N is the total number of predicted moments,

{\hat{y}}_{t}^{i}

is the predicted value of

y_{t}^{i}

, and

{\bar{y}}_{t}^{i}

is the average value of

y_{t}^{i}

.

In the first set of experiments, we focus on comparing models with and without attention injection under a small amount of training sample conditions, and the AM network shows its superiority even under low data conditions. We also conduct comparative experiments between the GRU and other baseline network models and find that the GRU network is vastly superior in solving timing-related problems. In the second set of experiments, we compare the experimental results of AMGRU using the MAML framework with those of AMGRU without the MAML framework. The experimental results show that migration learning through meta-task can effectively improve the accuracy of prediction models under deficient data conditions. The following is a description of the environment used in the experiments: computer operating system, Windows 10; graphics processor, GeForce GTX 1060; programming environment, Python 3.6.0; frameworks, Tensorflow 2.6.2 and Keras 2.6.0; and data operations, Numpy 1.19.5.

4.1. Data Processing

Before modeling the dynamics of heavy haul trains, data preprocessing is required. Firstly, data cleaning is performed to correct the raw data. Then, based on the features, the data are selected to remove irrelevant data that are not useful for modeling the train dynamics. For example, we removed data from the train operation that exceeded the planned timetable according to the train operation characteristics. Finally, for missing data, linear interpolation is used to fill in the gaps to ensure data completeness and obtain the data shown in Table 2. The sample data are arranged chronologically from top to bottom, with each piece of data containing the current state of the train at the moment. In the experiment, the train state of the first n moments was sent to the AMGRU for training, and the running state of the train at the next moment was predicted, and then the time was swiped down one moment in turn until the last moment prediction data were obtained.

At the same time, we note that in our input data, different types of data take a wide range of values, which distributes the neural network model weights unevenly and affects the training effect; therefore, to eliminate the influence of different magnitudes of data, it is necessary to normalize the data before training. Considering the influence of extreme anomalous data in the experimental data, this experiment adopts the

Z - S c o r e

normalization method, which indirectly avoids the effect of outliers and extreme values through centralization. Each training sample s consists of position x, speed v, cylinder pressure g, motor speed z, speed limit l, gradient p of the track surface, curvature q, total train weight w and up/down information d. In all the information signals, except for the total train weight and up/down information, other signals show strong uncertainty and are nonlinear. Among them, the total train weight signal w has the nominal attribute and is represented by a constant; the up/down information signal d has the binary attribute and is represented by 0 for downward and 1 for upward, which are discrete signals; the rest of the signals have numerical properties, and the speed limit l signal is discrete, and the others are continuous. Meanwhile, it should be noted that in the motor speed signal z, positive numbers indicate traction, and negative numbers indicate braking; the value corresponds to the speed of the motor. In the gradient p signal, a positive number indicates that it is now uphill, a negative number indicates that it is now downhill, and the value indicates the size of the slope. We use

D^{s} = \{D_{i}^{s} |i = 1, \dots, 9\} = \{x_{t}, v_{t}, g_{t}, z_{t}, l_{t}, q_{t}, p_{t}, w_{t}, d_{t}\}

to denote the sth data sample, i denotes the ith element of the d data sample, and the

Z - S c o r e

normalization rule is given by

\begin{matrix} {\tilde{D}}_{i}^{s} & = & \frac{D_{i}^{s} - {\bar{D}}_{i}}{ϖ_{i}}, i \in \{1, \dots, 9\}, s \in S, \end{matrix}

(21)

\begin{matrix} {\bar{D}}_{i} & = & \frac{1}{|S|} \sum_{s \in S} D_{i}^{s}, \end{matrix}

(22)

\begin{matrix} ϖ_{i} & = & \sqrt{\frac{1}{|S|} \sum_{s \in S} {(D_{i}^{s} - {\bar{D}}_{i})}^{2}} . \end{matrix}

(23)

4.2. Performance of the AMGRU Model

In this section, we first determine the parameters of the AMGRU network by grid search to obtain its specific network structure and then compare it with other networks using field data to predict the experimental results to demonstrate the better performance of AMGRU in modeling heavy haul train dynamics.

4.2.1. The AMGRU Model Parameters Selection

The deep learning network structure parameters directly affect the accuracy of the dynamics model building, so we conduct experiments using field data for the network structure determination. It is shown in the literature [58,59] that the network model structure parameters are mainly the network depth and the number of neurons on each layer of the network. In our experiments, we do this by setting the network parameters using a manual method. The experiments in [58] show that the depth of the network model, and the number of neurons in each layer depend on each other and together affect the performance of the network model. Generally, the numbers of neurons in layers of a neural network are 128, 64, 32 and 16 [59].

Therefore, to obtain the proper depth of the network model more intuitively, we first pre-determine the number of neurons per layer of the network to 64 during the numerical experiments. The model performance is evaluated by using a combined form of both RMSE and MAE loss metrics with standard deviation. The results of the network depth experiments are shown in Table 3. The experimental results show that as the number of layers in the network increases, the results first become better and then worse. The best performance of the network model is shown by the actual experimental results when the number of layers of the GRU network is two-layer.

Afterward, we determine the number of neurons per layer of the network based on the two-layer network. We determine that the setting range of neurons is from 16 to 128. To facilitate finding the number of neurons most suitable for dynamic modeling, we start with 16 and increase by 4 neurons in each experiment to 128. Further, we obtain 225 (i.e.,

15 \times 15

) possible combinations of neurons. We train and test the model using a small number of samples and record the values of the performance metrics MAE and RMSE in each experiment. Because the network performance is similar under different metrics, we only register the model’s prediction performance with the different number of neurons under the RMSE metric, visualized in Figure 6.

By observing the experimental results, the performance of the network is best when the first layer of the neural network has 128 neurons and the second layer has 32 neurons. The step size is to determine the length of the input sequence of the sliding window. The larger the step size of the neural network model, the more input data of the previous state are associated with the model, and the accuracy and training time of the model will change accordingly. Wang et al. argue that the step size of the neural network models has a great influence on the performance of the models [59]. According to the existing research, it is feasible to determine the optimal step size of the model through numerical experiments [29]. Therefore, we designed a numerical experiment to determine the optimal step size, and before the experiment, we predefined a range of test steps from 4 to 22. During the experiments, the step size is increased by two units each time, and the performance of the network model is recorded for each experiment. By comparing the results, the final step size for the network model is determined to be 12. The detailed simulation results are shown in Table 4.

In summary of the series of experiments, the prediction model’s specific network structure and parameters are obtained.

4.2.2. Comparison with Other Prediction Methods

Several standard deep learning networks are selected as baseline models to validate the advantages of the designed models in implementing heavy haul train dynamics model building. As with the experimental models, all baseline models are trained using data collected in the field, and all models are characterized as follows:

Model 1: The RNN integrated with the FC. Model 1 consists of an RNN network for encoding and an FC network for decoding. RNNs are very effective for learning data with sequence characteristics, time-dependent mining sequences, and recording short-term information. Based on this feature, we use this network as the baseline model and set the RNN to two layers, where the number of neurons in the first layer is 128 and the second layer is 32.

Model 2: The RNN integrated with the AM and the FC. The Model 2 network means that the AM layer is integrated with the Model 1 network. The RNN layer is used for encoding, the AM layer for attention injection, and the FC for decoding. The network depth and the number of neurons are set in the same way as in the Model 1 network.

Model 3: The LSTM combined with the FC. A two-layer LSTM network is set up, where the first layer is 128 neurons and the second layer is 32 neurons. The LSTM network layer is responsible for encoding, the FC layer for decoding. The data for training the model are a small number of heavy haul train running data in new types.

Model 4: The LSTM combined with the FC and the AM. The LSTM network layer is responsible for encoding, the AM network for attention injection and finally the FC layer for decoding. Training uses data consistent with the model 3.

Model 5: The GRU combined with the FC. The Model 3 network consists of a GRU network and an FC network, where the GRU is used for encoding, and the FC is used for decoding. To verify the superiority of the attention mechanism, we construct Model 3, which is identical to the AMGRU network parameters, except for the variation of the attention mechanism, with the number of GRU layers set to two and the number of neurons to 128 and 32, respectively.

Model 6: The GRU combined with the FC and the AM. This paper proposes a network integrating the GRU and FC layers, where the GRU network layer is responsible for coding, followed by attention injection by the AM layer and finally decoding by the FC layer. To test the effectiveness of the proposed task-based meta-learner in improving the performance of few-shot learning, we still use the AMGRU model for our experiments. However, the model is only trained with insufficient data, and no meta-learner is utilized for prior knowledge learning and migration.

Model 7: The AMGRU network is based on the MAML framework. We add the MAML for mobile learning between meta-tasks to the AMGRU network for learning.

Model 8: The classical dynamic model for overloaded trains is based on Newton’s second law and takes into account factors such as traction, resistance, inertia, and gravity. By modeling the acceleration of the train, it predicts the speed and position changes of the train under different operating conditions [60].

First, we perform the first set of experiments focusing on the superiority of AM networks in modeling the dynamics of heavy haul trains. Figure 7 shows the dynamics modeling of the Model 5 and the Model 6 using insufficient data from the field. The results show that (1) both Model 5 and the Model 6 can predict position and velocity better. Again, with limited data, Model 6 performs better due to its ability to identify and focus on the primary target, especially when frequent train operation changes are seen in the partially magnified portion of Figure 7a. The use of insufficient data for training impacts the robustness and flexibility of the dynamics model. (2) The superiority of Model 6 can be seen in the first partial enlargement view in Figure 7b. The second partial enlargement view shows that neither model performs well in prediction, but Model 6 learns trends more accurately than Model 5. Although the addition of the AM network has led to some improvement in the accuracy of the heavy haul train dynamics modeling, we still need to address the lack of accuracy due to the lack of data.

We then perform a second set of experiments to observe the performance of the MAML framework in solving few-shot learning problems. In the Model 7 training, we first train the model using heavy haul train tasks with an existing type of a large amount of data. We then migrate it to heavy haul train tasks in a new type with only a small amount of data. We use three network prediction models, Model 5, Model 6, and Model 7, to model the dynamics of high-load heavy haul trains simultaneously, and the performance of the three models is shown in Figure 7. Our proposed Model 7 can transfer a priori knowledge learned from a large number of data tasks to data-poor tasks, thus improving the accuracy of dynamics modeling for data-poor tasks.

For all the methods mentioned above, we record the MAE, RMSE and

R^{2}

indices to evaluate the accuracy of the different models, and the results are shown in Table 5. From the comparison of the prediction results of the Model 1, the Model 3, and the Model 5 networks, we can see that the GRU network can handle the field timing data of heavy haul trains better. In addition to the comparison of the prediction results of Model 5 and Model 6, we can also see the superiority of AM from the prediction performance of Model 1 and Model 2. The AMGRU network can handle the field timing data of heavy haul trains better than Model 1–Model 6 in terms of dynamics modeling. However, it can be clearly seen that the prediction results of Model 1–Model 6 are not as accurate as Model 7 due to the effect of insufficient training data, which also indicates that the MAML framework we designed can effectively learn prior knowledge and can be used to solve few-shot learning problems.

Based on the numerical experimental results, it can be seen that the proposed method using the MAML framework and neural network can better capture the nonlinearity and complexity in the dynamics of heavy haul trains, showing higher accuracy compared to other baseline neural network models and classical dynamic models. Moreover, our proposed method can quickly adapt to the new dynamic modeling problems of newly introduced train types through rapid fine-tuning, which greatly enhances its efficiency and generalization ability. These results indicate that our method performs equally or better in terms of accuracy, efficiency, and generalization ability compared to other state-of-the-art methods for heavy haul train dynamic modeling.

To increase the model interpretability, we extracted the attention weights of 20 time steps during the model training process and visualized them using a heatmap. The results are shown in Figure 8. The horizontal axis of the heatmap represents the time steps, and the vertical axis represents the input sequence, where g represents the train tube pressure state, z represents the train motor speed rating, q and p represent the track surface curvature and track surface gradient of the section the train is on, respectively, w indicates the total weight of the train, d indicates the direction of travel for the heavy haul train, and l indicates the current train speed limit. The color of each data bar shows the distribution of attention weights at each time step, where a higher weight indicates a larger impact on the dynamic model, and a lighter color indicates a smaller impact on the dynamic model. From Figure 8, we can see that the weights of the input sequences d and w have consistently remained low. This is because in the process of modeling the dynamics of a train, the weight w and the direction of travel d have relatively stable attention weights across different time steps, which is consistent with the actual situation.

5. Conclusions

This paper presents a method for building a heavy haul train dynamics model based on historical data. Combining GRU and AM, the AMGRU model is proposed and implemented, and its parameter training process is also described in detail in the paper. Designing the coding layer, the attention mechanism layer, the context layer, and the decoding layer to improve the training efficiency of the model according to the characteristics of the heavy haul train dynamics. At the same time, the embedding of AM enables the decoding layer to identify and focus on the main target from the complex content. To address the problem that insufficient heavy haul train data under the new combination affect the model’s prediction accuracy, we entirely use the condition that there are sufficient heavy haul train data in an existing type. Using the MAML framework, the AMGRU network is made to learn sufficiently on the task with adequate data with a priori knowledge for subsequent migration so that the network can quickly understandable the dynamics of a new type of heavy haul train and efficiently achieve high accuracy modeling.

Simulation experiments show that the proposed AMGRU method based on MAML can establish a high-precision heavy haul train dynamics model, and shows great advantages in the experimental process compared with some other baseline models. In the long run, the idea of using existing task data to solve new task problems reflected in this paper provides a new way of thinking for train dynamics modeling. However, some key issues still warrant further study. Although our experiments show that the AMGRU network has good learning performance under a large number of experiments, it can still achieve local optimization under some specific parameter settings. Improving the flexibility and robustness of train dynamics modeling based on neural network is a topic worthy of further study. In dynamic modeling, the track model is an extremely important factor. Therefore, as a direction for future improvement, track and wheel models should be included in the model to more comprehensively and accurately predict the dynamic behavior of trains.

Author Contributions

Conceptualization, X.W. (Xi Wang) and Y.C.; methodology, Y.C. and L.Z.; software, H.W.; validation, X.W. (Xiaoning Wang), H.W. and X.W. (Xi Wang); formal analysis, X.W. (Xiaoning Wang); investigation, X.W. (Xiaoning Wang); resources, L.Z.; data curation, H.W. and L.Z.; writing—original draft preparation, Y.C.; writing—review and editing, X.W. (Xi Wang); visualization, X.W. (Xiaoning Wang); supervision, X.W. (Xi Wang); project administration, L.Z.; funding acquisition, X.W. (Xi Wang). All authors have read and agreed to the published version of the manuscript.

Funding

This paper was partially supported by the Beijing Natural Science Foundation (Nos. L201006, L221005), by the National Natural Science Foundation of China (No. 62073024), by the Technological Research and Development Program of China Railway Corporation under Grant P2022X013, and by the State Key Laboratory of Rail Traffic Control and Safety through Beijing Jiaotong University under Contract RCS2022K008.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The availability of these data is limited. These data are not authorized to be made public.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yin, J.; Tang, T.; Yang, L.; Xun, J.; Huang, Y.; Gao, Z. Research and development of automatic train operation for railway transportation systems: A survey. Transp. Res. Part C Emerg. Technol. 2017, 85, 548–572. [Google Scholar] [CrossRef]
Katrakazas, C.; Quddus, M.; Chen, W.H.; Deka, L. Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions. Transp. Res. Part C Emerg. Technol. 2015, 60, 416–442. [Google Scholar] [CrossRef]
Wang, Z.L.; Yang, J.P.; Shi, K.; Xu, H.; Qiu, F.Q.; Yang, Y.B. Recent advances in researches on vehicle scanning method for bridges. Int. J. Struct. Stab. Dyn. 2022, 2230005. [Google Scholar] [CrossRef]
Su, S.; She, J.; Li, K.; Wang, X.; Zhou, Y. A Nonlinear Safety Equilibrium Spacing-Based Model Predictive Control for Virtually Coupled Train Set Over Gradient Terrains. IEEE Trans. Transp. Electrif. 2022, 8, 2810–2824. [Google Scholar] [CrossRef]
Wang, X.; Hu, M.; Wang, H.; Dong, H.; Ying, Z. Formation Control for Virtual Coupling Trains with Parametric Uncertainty and Unknown Disturbances. IEEE Trans. Circuits Syst. II Express Briefs, 2023; early access. [Google Scholar] [CrossRef]
Chou, M.; Xia, X.; Kayser, C. Modelling and model validation of heavy-haul trains equipped with electronically controlled pneumatic brake systems. Control. Eng. Pract. 2007, 15, 501–509. [Google Scholar] [CrossRef]
Wang, X.; Li, S.; Tang, T.; Yang, L. Event-triggered predictive control for automatic train regulation and passenger flow in metro rail systems. IEEE Trans. Intell. Transp. Syst. 2020, 23, 1782–1795. [Google Scholar] [CrossRef]
Wang, X.; Li, S.; Tang, T. Robust efficient cruise control for heavy haul train via the state-dependent intermittent control. Nonlinear Anal. Hybrid Syst. 2020, 38, 100918. [Google Scholar] [CrossRef]
Wang, X.; Li, S.; Tang, T.; Wang, X.; Xun, J. Intelligent operation of heavy haul train with data imbalance: A machine learning method. Knowl.-Based Syst. 2019, 163, 36–50. [Google Scholar] [CrossRef]
Bokare, P.S.; Maurya, A.K. Acceleration-deceleration behaviour of various vehicle types. Transp. Res. Procedia 2017, 25, 4733–4749. [Google Scholar] [CrossRef]
Fadhloun, K.; Rakha, H.; Loulizi, A.; Abdelkefi, A. Vehicle dynamics model for estimating typical vehicle accelerations. Transp. Res. Rec. 2015, 2491, 61–71. [Google Scholar] [CrossRef]
Wang, J.; Rakha, H.A. Longitudinal train dynamics model for a rail transit simulation system. Transp. Res. Part C Emerg. Technol. 2018, 86, 111–123. [Google Scholar] [CrossRef]
Oprea, R.A.; Cruceanu, C.; Spiroiu, M.A. Alternative friction models for braking train dynamics. Veh. Syst. Dyn. 2013, 51, 460–480. [Google Scholar] [CrossRef]
Wu, Q.; Cole, C.; Luo, S.; Spiryagin, M. A review of dynamics modelling of friction draft gear. Veh. Syst. Dyn. 2014, 52, 733–758. [Google Scholar] [CrossRef]
Khmelnitsky, E. On an optimal control problem of train operation. IEEE Trans. Autom. Control. 2000, 45, 1257–1266. [Google Scholar] [CrossRef]
Dong, H.R.; Gao, S.G.; Ning, B.; Li, L. Extended fuzzy logic controller for high speed train. Neural Comput. Appl. 2013, 22, 321–328. [Google Scholar] [CrossRef]
Wang, X.; Su, S.; Cao, Y.; Qin, L.; Liu, W. Robust Cruise Control for the Heavy Haul Train Subject to Disturbance and Actuator Saturation. IEEE Trans. Intell. Transp. Syst. 2023; early access. [Google Scholar] [CrossRef]
Cao, Y.; Zhang, Z.; Cheng, F.; Su, S. Trajectory optimization for high-speed trains via a mixed integer linear programming approach. IEEE Trans. Intell. Transp. Syst. 2022, 23, 17666–17676. [Google Scholar] [CrossRef]
Liu, Y.; Zhou, Y.; Su, S.; Xun, J.; Tang, T. An analytical optimal control approach for virtually coupled high-speed trains with local and string stability. Transp. Res. Part C Emerg. Technol. 2021, 125, 102886. [Google Scholar] [CrossRef]
Wang, X.; Su, S.; Cao, Y.; Wang, X. Robust control for dynamic train regulation in fully automatic operation system under uncertain wireless transmissions. IEEE Trans. Intell. Transp. Syst. 2022, 23, 20721–20734. [Google Scholar] [CrossRef]
Liu, X.; Xun, J.; Ning, B.; Wang, C. Braking process identification of high-speed trains for automatic train stop control. ISA Trans. 2021, 111, 171–179. [Google Scholar] [CrossRef] [PubMed]
Ai, Y.; Li, Z.; Gan, M.; Zhang, Y.; Yu, D.; Chen, W.; Ju, Y. A deep learning approach on short-term spatiotemporal distribution forecasting of dockless bike-sharing system. Neural Comput. Appl. 2019, 31, 1665–1677. [Google Scholar] [CrossRef]
Li, J.; Dai, Q.; Ye, R. A novel double incremental learning algorithm for time series prediction. Neural Comput. Appl. 2019, 31, 6055–6077. [Google Scholar] [CrossRef]
Zheng, J.; Fu, X.; Zhang, G. Research on exchange rate forecasting based on deep belief network. Neural Comput. Appl. 2019, 31, 573–582. [Google Scholar] [CrossRef]
Zou, W.; Xia, Y. Back propagation bidirectional extreme learning machine for traffic flow time series prediction. Neural Comput. Appl. 2019, 31, 7401–7414. [Google Scholar] [CrossRef]
Su, S.; Qu, J.; Cao, Y.; Li, R.; Wang, G. Adversarial training lattice lstm for named entity recognition of rail fault texts. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21201–21215. [Google Scholar] [CrossRef]
Su, S.; Liu, W.; Zhu, Q.; Li, R.; Tang, T.; Lv, J. A cooperative collision-avoidance control methodology for virtual coupling trains. Accid. Anal. Prev. 2022, 173, 106703. [Google Scholar] [CrossRef]
Zhang, C.Y.; Chen, D.; Yin, J.; Chen, L. Data-driven train operation models based on data mining and driving experience for the diesel-electric locomotive. Adv. Eng. Inform. 2016, 30, 553–563. [Google Scholar] [CrossRef]
Yin, J.; Su, S.; Xun, J.; Tang, T.; Liu, R. Data-driven approaches for modeling train control models: Comparison and case studies. ISA Trans. 2020, 98, 349–363. [Google Scholar] [CrossRef]
Li, Z.; Tang, T.; Gao, C. Long short-term memory neural network applied to train dynamic model and speed prediction. Algorithms 2019, 12, 173. [Google Scholar] [CrossRef]
Wang, X.; Li, S.; Cao, Y.; Xin, T.; Yang, L. Dynamic speed trajectory generation and tracking control for autonomous driving of intelligent high-speed trains combining with deep learning and backstepping control methods. Eng. Appl. Artif. Intell. 2022, 115, 105230. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Dulebenets, M.A. An Adaptive Polyploid Memetic Algorithm for scheduling trucks at a cross-docking terminal. Inf. Sci. 2021, 565, 390–421. [Google Scholar] [CrossRef]
Ning, F.; Jiang, G.; Lam, S.K.; Ou, C.; He, P.; Sun, Y. Passenger-centric vehicle routing for first-mile transportation considering request uncertainty. Inf. Sci. 2021, 570, 241–261. [Google Scholar] [CrossRef]
Zhao, R.; Wang, D.; Yan, R.; Mao, K.; Shen, F.; Wang, J. Machine health monitoring using local feature-based gated recurrent unit networks. IEEE Trans. Ind. Electron. 2017, 65, 1539–1548. [Google Scholar] [CrossRef]
Wang, Y.; Liao, W.; Chang, Y. Gated recurrent unit network-based short-term photovoltaic forecasting. Energies 2018, 11, 2163. [Google Scholar] [CrossRef]
Minh, D.L.; Sadeghi-Niaraki, A.; Huy, H.D.; Min, K.; Moon, H. Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 2018, 6, 55392–55404. [Google Scholar] [CrossRef]
Yang, Z.B.; Zhang, J.P.; Zhao, Z.B.; Zhai, Z.; Chen, X.F. Interpreting network knowledge with attention mechanism for bearing fault diagnosis. Appl. Soft Comput. 2020, 97, 106829. [Google Scholar] [CrossRef]
Ran, X.; Shan, Z.; Fang, Y.; Lin, C. An LSTM-based method with attention mechanism for travel time prediction. Sensors 2019, 19, 861. [Google Scholar] [CrossRef]
Alshehri, H.A.; Junath, N.; Panwar, P.; Shukla, K.; Rahin, S.A.; Martin, R.J. Self-Attention-Based Edge Computing Model for Synthesis Image to Text through Next-Generation AI Mechanism. Math. Probl. Eng. 2022, 2022, 4973535. [Google Scholar] [CrossRef]
Sun, X.; Wang, B.; Wang, Z.; Li, H.; Li, H.; Fu, K. Research progress on few-shot learning for remote sensing image interpretation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 2387–2402. [Google Scholar] [CrossRef]
Schweighofer, N.; Doya, K. Meta-learning in reinforcement learning. Neural Netw. 2003, 16, 5–9. [Google Scholar] [CrossRef] [PubMed]
Snell, J.; Swersky, K.; Zemel, R. Prototypical networks for few-shot learning. Adv. Neural Inf. Process. Syst. 2017, 30, 4080–4090. [Google Scholar]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3630–3638. [Google Scholar]
Ravi, S.; Larochelle, H. Optimization as a Model for Few-Shot Learning. 2016. Available online: https://openreview.net/forum?id=rJY0-Kcll (accessed on 4 May 2016).
Bengio, S.; Bengio, Y.; Cloutier, J.; Gescei, J. On the optimization of a synaptic learning rule. In Optimality in Biological and Artificial Networks; Routledge: London, UK, 2013; pp. 281–303. [Google Scholar]
Andrychowicz, M.; Denil, M.; Colmenarejo, S.G.; Hoffman, M.W.; Pfau, D.; Schaul, T.; Shillingford, B.; de Freitas, N. Learning to learn by gradient descent by gradient descent. Adv. Neural Inf. Process. Syst. 2016, 29, 3988–3996. [Google Scholar]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D.D. A survey of transfer learning. J. Big Data 2016, 3, 1–40. [Google Scholar] [CrossRef]
Zhuang, F.; Qi, Z.; Duan, K.; Xi, D.; Zhu, Y.; Zhu, H.; Xiong, H.; He, Q. A comprehensive survey on transfer learning. Proc. IEEE 2020, 109, 43–76. [Google Scholar] [CrossRef]
Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU neural network methods for traffic flow prediction. In Proceedings of the 2016 31st Youth Academic Annual Conference of Chinese Association of Automation (YAC), Wuhan, China, 1–13 November 2016; pp. 324–328. [Google Scholar]
Dey, R.; Salem, F.M. Gate-variants of gated recurrent unit (GRU) neural networks. In Proceedings of the 2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, 6–9 August 2017; pp. 1597–1600. [Google Scholar]
Luong, M.T.; Pham, H.; Manning, C.D. Effective approaches to attention-based neural machine translation. arXiv 2015, arXiv:1508.04025. [Google Scholar]
Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. Adv. Neural Inf. Process. Syst. 2014, 27, 2204–2212. [Google Scholar]
Liu, G.; Guo, J. Bidirectional LSTM with attention mechanism and convolutional layer for text classification. Neurocomputing 2019, 337, 325–338. [Google Scholar] [CrossRef]
Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y.; Wong, W.K.; Woo, W.C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
Wang, X.; Xin, T.; Wang, H.; Zhu, L.; Cui, D. A generative adversarial network based learning approach to the autonomous decision making of high-speed trains. IEEE Trans. Veh. Technol. 2022, 71, 2399–2412. [Google Scholar] [CrossRef]
Liu, W.; Su, S.; Tang, T.; Cao, Y. Study on longitudinal dynamics of heavy haul trains running on long and steep downhills. Veh. Syst. Dyn. 2022, 60, 4079–4097. [Google Scholar] [CrossRef]

Figure 1. The GRU unit structure.

Figure 2. The attention network structure.

Figure 3. The architecture of the AMGRU model.

Figure 4. The MAML framework.

Figure 5. The MAML-based AMGRU network training process.

Figure 6. Performance indicators with different nodes on each layer, and lines of different colors represent RMSE values of networks with different numbers of neurons.

Figure 7. Simulation results of different methods. (a) shows the speed prediction effect of different neural network models. (b) shows the position prediction effect of different neural network models.

Figure 8. Attention weight distribution.

Table 1. Machine learning versus meta-learning.

	Purpose	Input
Machine Learning	Find the function f.	Training data
Meta Learning	Find the function F. F can output a function f that can be used for a new task.	Training tasks and their corresponding data

Table 2. The raw data of the experiment.

Time Unit	Distance (km)	Speed (km/h)	Cylinder Pressure	Speed Limit (km/h)	Motor Speed	Curvature	Gradient	Weight	Up/ Down
0	0	7	600	45	500	0	1.5	11,600	1
1	0.094	7	600	45	500	0	1.5	11,600	1
2	0.136	7	600	45	500	0	1.5	11,600	1
259	151.32	60	600	45	−420	0	−5	11,600	1
260	151.48	60	600	45	−420	0	−5	11,600	1
261	157.85	60	600	45	−420	800	−5	11,600	1
933	408.273	2	550	800	−8.9	70	−40	11,600	1
3934	408.274	2	550	800	−8.9	70	0	11,600	1
3935	408.276	0	550	800	−8.9	70	0	11,600	1

Table 3. Experimental results for the number of network layers with different metrics.

Depth	MAE	RMSE	$R^{2}$
One-layer	0.747 ± 0.0231	0.959 ± 0.0173	0.779 ± 0.0843
Two-layer	0.683 ± 0.0018	0.947 ± 0.0002	0.847 ± 0.0701
Three-layer	0.695 ± 0.0513	0.975 ± 0.0407	0.831 ± 0.0731
Four-layer	0.704 ± 0.1052	1.017 ± 0.0579	0.822 ± 0.0937

Table 4. Simulation results with different step sizes.

Step Size	MAE	RMSE	$R^{2}$
22	0.702 ± 0.1137	0.910 ± 0.1028	0.854 ± 0.0873
20	0.701 ± 0.1241	0.901 ± 0.1485	0.833 ± 0.0240
18	0.695 ± 0.1129	0.924 ± 0.1174	0.798 ± 0.0507
16	0.681 ± 0.1322	0.914 ± 0.1478	0.785 ± 0.0699
14	0.929 ± 0.1247	1.217 ± 0.1317	0.831 ± 0.0398
12	0.615 ± 0.0821	0.874 ± 0.0907	0.865 ± 0.0122
10	0.741 ± 0.1425	0.955 ± 0.1604	0.849 ± 0.0398
8	1.154 ± 0.1333	1.352 ± 0.1542	0.841 ± 0.0529
6	0.739 ± 0.1268	0.935 ± 0.1209	0.827 ± 0.0155
4	1.476 ± 0.1221	1.643 ± 0.1325	0.795 ± 0.0347

Table 5. The comparison results of different methods.

Model	Numerical Data	MAE	RMSE	$R^{2}$
Model 1	RNN(128,32)	1.902 ± 0.1329	2.203 ± 0.1497	0.718 ± 0.0449
Model 2	RNN(128,32) + AM	1.870 ± 0.1024	2.050 ± 0.1218	0.823 ± 0.0781
Model 3	LSTM(128,32)	1.871 ± 0.1157	2.173 ± 0.1409	0.727 ± 0.0643
Model 4	LSTM(128,32) + AM	0.622 ± 0.0947	0.901 ± 0.0898	0.831 ± 0.0732
Model 5	GRU(128,32)	1.964 ± 0.1047	2.131 ± 0.1421	0.743 ± 0.0296
Model 6	GRU(128,32) + AM	0.615 ± 0.0983	0.874 ± 0.0769	0.865 ± 0.0107
Model 7	Model 6 + MAML	0.523 ± 0.0591	0.759 ± 0.0694	0.913 ± 0.0057
Model 8	Classical dynamics	3.432 ± 0.1277	4.0131 ± 0.2964	0.634 ± 0.2071

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, Y.; Wang, X.; Zhu, L.; Wang, H.; Wang, X. A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position. Sustainability 2023, 15, 8731. https://doi.org/10.3390/su15118731

AMA Style

Cao Y, Wang X, Zhu L, Wang H, Wang X. A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position. Sustainability. 2023; 15(11):8731. https://doi.org/10.3390/su15118731

Chicago/Turabian Style

Cao, Ying, Xi Wang, Li Zhu, Hongwei Wang, and Xiaoning Wang. 2023. "A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position" Sustainability 15, no. 11: 8731. https://doi.org/10.3390/su15118731

APA Style

Cao, Y., Wang, X., Zhu, L., Wang, H., & Wang, X. (2023). A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position. Sustainability, 15(11), 8731. https://doi.org/10.3390/su15118731

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Meta-Learning-Based Train Dynamic Modeling Method for Accurately Predicting Speed and Position

Abstract

1. Introduction

1.1. Background

1.2. Literature Review

1.3. Contributions

1.4. Scope and Assumptions

2. The GRU Network Based on the AM

2.1. GRU Networks

2.2. Attention Networks

2.3. Attention-Based GRU Networks

3. The Meta-Learner Design

3.1. Design of the Meta-Task

3.2. The AMGRU Network Based on the MAML Framework

4. Experiment Results

4.1. Data Processing

4.2. Performance of the AMGRU Model

4.2.1. The AMGRU Model Parameters Selection

4.2.2. Comparison with Other Prediction Methods

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI