1. Introduction
Heavy-haul train transportation has been widely valued around the world due to its advantages such as large capacity, high efficiency and low transportation cost. When heavy-haul trains run on long and steep downhill railway sections, cyclic air braking is needed for controlling the train speed [
1]. At present, the utilization of air braking mainly relies on the driving experience of the driver. However, existing braking methods based on drivers’ experience cannot meet the safety and efficiency requirements of heavy-haul train operation [
2]. Therefore, it is of great importance to develop an intelligent control strategy for the cyclic braking of heavy-haul trains running on long and steep downhill sections [
3].
To date, air braking methods for heavy-haul trains running on long and steep downhill railway sections have been studied in-depth by many researchers. The main solutions can be classified into imitation learning methods based on expert data, numerical solution methods based on an optimal train control models, and reinforcement learning based on a Markov decision process.
In terms of imitation learning based on expert data, it is necessary to provide data on experts’ driving courses during the training stage to simulate an expert’s driving behavior in a supervised manner. For example, Ref. [
4] combined expert data with the concept of generative adversarial learning and proposed a representative generative adversarial imitation learning algorithm. Ref. [
5] integrated expert data with reinforcement learning to train a new intelligent driving control strategy. The reinforcement learning algorithm supervised by the expert model was able to make the operation more efficient and stable. However, this type of imitation learning method did not directly establish safety assessments and constraints, and the expert data it used failed to fully cover special emergency traffic situations.
In terms of research on numerical solutions based on train optimal control models, Pontryagin’s Maximum Principle (PMP) was employed in ref. [
6] to ascertain the most effective driving approach for controlling trains through the use of generalized motion equations. By considering line conditions, such as different slopes and different speed limits, a version of the key equation for train traction energy consumption was proposed, and it was used to calculate the speed of train cyclic braking. Ref. [
7] established an optimized model based on a train dynamic model, and integrated the artificial bee colony algorithm to find the appropriate switching points of different states and develop an optimal operation strategy for heavy-haul trains running on long and steep downhill railway sections. Ref. [
8] combined an approximate model of data-learning systems with model predictive control to solve planning problems with safety constraints. Ref. [
9] considered the energy consumption and comfort of a heavy-haul train when determining the optimal strategy for cyclic braking. Two methods were implemented, including a pseudo-spectral method and a mixed-integer linear programming method, to develop an optimal driving strategy for the train. However, this type of method requires comprehensive data to build the model. Moreover, the accuracy of the model has a strong influence. Therefore, the problem of model bias under complex uncertainty scenarios is prominent.
Reinforcement learning algorithms based on a Markov decision process have been increasingly applied to the intelligent control of trains in recent years. Ref. [
10] proposed an optimal model of operation energy consumption leveraging a Q-learning algorithm. Then, a cyclic braking strategy was developed based on the train status value function to solve the problems of train punctuality and energy-saving operation optimization. Ref. [
11] designed an intelligent train control method using a reinforcement learning algorithm based on policy gradient. The performance of the agent was continuously optimized to realize the self-learning process of the controller. An improved Q-learning algorithm was proposed in Reference [
12]. The target reward for energy consumption and time are updated in different ways. In Reference [
13], a Q-SARSA algorithm was proposed by combining Q-learning and the SARSA update rules, which considerably improved the efficiency of subway operations by combining deep fully connected neural networks with it. These methods can achieve cyclic braking, energy savings and emission reduction in a heavy-haul train without the need for a pre-designed reference speed curve. However, the state space in this algorithm is discretized, which results in slow convergence of the optimization algorithm during the learning process and the curse of dimensionality problem. Thereafter, ref. [
14] proposed a reinforcement learning algorithm based on two-stage action sampling to solve the combinatorial optimization problem in heavy-haul railway scheduling. This approach not only alleviates the curse of dimensionality but also naturally satisfies the optimization objective and complex constraints. Ref. [
15] proposes a reinforcement learning method for multi-objective speed trajectory optimization to simultaneously achieve energy efficiency, punctuality and accurate parking. Ref. [
16] employed a double-switch Q-network (DSQ network) architecture to achieve fast approximation of the action value function and enhance the parameter sharing of states and actions. However, the methods used in refs. [
14,
15,
16] still cannot make full use of the large amount of unstructured data generated during train operation, that is, the data utilization rate is reduced. Furthermore, in the design of a train model, the environmental characteristics of heavy-haul trains running on long and steep downhill railway sections are not considered. Therefore, it is difficult to apply such models to the optimal control of heavy-haul trains running on long and steep downhill sections.
Based on the above works, an intelligent algorithm utilizing DQN for the cyclic braking of heavy-haul trains is proposed in this paper to solve the problem of the cyclic air braking of heavy-haul trains traversing lengthy and steep downhill railway sections. The main contributions of this work are as follows:
(1) A model with operation constraints is constructed for heavy-haul trains equipped with a conventional pneumatic braking system and traversing lengthy and steep downhill railway sections. In addition, the performance indexes of a train running on a long and steep downhill section are introduced to evaluate the control performance of the heavy-haul train.
(2) The action value functions are approximated based on a neural network. In accordance with the Q-learning algorithm, the neural network is combined with reinforcement learning to avoid the occurrence of the curse of dimensionality inherent in the Q-learning algorithm. Therefore, this method is suitable for solving the train control problem characterized by continuous state space.
(3) A prioritized experience replay mechanism is proposed. The samples are prioritized and selected according to their importance so that important and high-reward samples are selected and trained more frequently. Therefore, the learning efficiency of the algorithm is improved for important samples, which accelerates the convergence speed of the algorithm. As a result, the performance of the algorithm can be improved.
The rest of this paper is organized as follows: In
Section 1, the design of the control model for heavy-haul trains is presented. The constraints of the train operation and the performance indexes of the train control are introduced into the model. The train control problems of a heavy-haul train running on a long and steep downhill section are described in detail. In
Section 2, a cyclic air braking method based on the DQN algorithm is established for heavy-haul trains. In
Section 3, the validity and robustness of the proposed method are verified via simulation. Finally, the conclusions of this study are summarized in
Section 4.
5. Conclusions
In this article, we explore the optimal working condition (braking and braking release) transition during cyclic braking when a heavy-haul train is running on a long and steep downhill section. Aiming at obtaining the shortest air brake distance and the highest operating efficiency, various constraints of the actual operation are considered at the same time, including factors such as the air-refilling time of the auxiliary reservoir, operating speed and operation action switch. The main conclusions are as follows:
(1) To achieve the optimization of multiple objects, a mathematical model is established for a heavy-haul train running on a long and steep downhill section. An intelligent cyclic braking system design based on the DQN algorithm is introduced for the train to adapt to a variety of complex operation environments and line conditions. To improve the convergence speed of the algorithm, the priority experience replay mechanism is used instead of ordinary experience replay. By prioritizing experiences, the agent can choose experiences with a higher priority for learning. In this way, the agent can more quickly learn important experiences that are more conducive to the rapid convergence of the algorithm. As a result, it improves the performance of the control algorithm.
(2) To verify the performance of the proposed DQN algorithm, comparative simulations were carried out and tested with different parameters. The simulation results show that the DQN algorithm proposed in this article exhibits better optimization performance and can effectively generate train driving speed curves that fulfill the specified constraints. This provides a valuable reference for the application of cyclic braking in heavy-haul trains running on long and steep downhill sections.
This study primarily focuses on the intelligent control of a cyclic air braking strategy for heavy-haul trains. However, during the research phase, we failed to comprehensively consider all environmental factors that could affect braking effectiveness. In particular, weather conditions (such as temperature, humidity and wind speed), track conditions (such as track flatness and friction coefficient) and variations in train load were not within the scope of our research. In future, we plan to conduct in-depth research on these environmental factors to evaluate the braking performance of heavy-haul trains more comprehensively under different conditions. In addition, the proposed DQN algorithm could be further improved and a more efficient network structure could be developed. This, in turn, could improve the performance of cyclic air braking with respect to heavy-haul trains.