A Stock Market Decision-Making Framework Based on CMR-DQN

Chen, Xun; Wang, Qin; Hu, Chao; Wang, Chengqi

doi:10.3390/app14166881

Open AccessArticle

A Stock Market Decision-Making Framework Based on CMR-DQN

¹

School of Information and Communication Engineering, Hainan University, Haikou 570228, China

²

School of Electronic Infromation, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 6881; https://doi.org/10.3390/app14166881

Submission received: 13 July 2024 / Revised: 2 August 2024 / Accepted: 4 August 2024 / Published: 6 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the dynamic and uncertain stock market, precise forecasting and decision-making are crucial for profitability. Traditional deep neural networks (DNN) often struggle with capturing long-term dependencies and multi-scale features in complex financial time series data. To address these challenges, we introduce CMR-DQN, an innovative framework that integrates discrete wavelet transform (DWT) for multi-scale data analysis, temporal convolutional network (TCN) for extracting deep temporal features, and a GRU–LSTM–Attention mechanism to enhance the model’s focus and memory. Additionally, CMR-DQN employs the Rainbow DQN reinforcement learning strategy to learn optimal trading strategies in a simulated environment. CMR-DQN significantly improved the total return rate on six selected stocks, with increases ranging from 20.37% to 55.32%. It also demonstrated substantial improvements over the baseline model in terms of Sharpe ratio and maximum drawdown, indicating increased excess returns per unit of total risk and reduced investment risk. These results underscore the efficiency and effectiveness of CMR-DQN in handling multi-scale time series data and optimizing stock market decisions.

Keywords:

deep reinforcement learning; algorithmic trading; deep Q network; decision optimization

1. Introduction

In the current global financial market, the ultimate goal pursued by investors is not only the appreciation of capital but also determining how to effectively manage and control investment risks during this process. The core of this goal lies in finding the optimal balance between risk and reward to ensure that investments can bring sustainable and long-term returns. These methods are not only time-consuming and labor-intensive but are also prone to the influence of human emotions and biases when processing and analyzing large-scale financial data, thus affecting the objectivity and effectiveness of decision-making. Especially when faced with rapidly changing market information, human traders find it difficult to react accurately in real-time, which poses significant challenges to traditional trading methods in the modern financial market [1]. In this context, algorithmic trading, as an innovative trading method, automates trading decision-making and execution through computer programs and brings new possibilities to modern financial transactions with its efficient and objective characteristics.

Financial decision-making problems have traditionally been modeled using stochastic processes and techniques from stochastic control. The selection of models often aims to balance simplicity with practical applicability. Simple models yield tractable and implementable strategies that can be expressed in closed-form or resolved through traditional numerical methods. However, these models may oversimplify market behaviors, potentially resulting in suboptimal strategies that could lead to financial losses. Conversely, models that aim to capture more realistic market features tend to be complex and often prove to be mathematically and computationally intractable using classical stochastic control tools. In recent years, the surge of financial data on transactions, quotes, and order flows in electronic order-driven markets has revolutionized data processing and statistical modeling techniques in finance, posing new theoretical and computational challenges [2].

The core driving force for the exploration and application of algorithmic trading lies in the use of advanced computing technology to solve the limitations of traditional trading methods. In the practice of algorithmic trading, rule-based methods and machine-learning-based strategies are the two main directions. Rule-based algorithmic trading relies on an in-depth understanding of market rules through the setting of a fixed set of trading rules to automate buying and selling operations. The advantage of this approach is that it is straightforward and easy to implement, but at the same time, it is also inflexible and may not be able to adapt in time in the face of rapid changes in the market [3]. In contrast, machine-learning-based algorithmic trading is more capable of data processing and adaptability by training models to identify and learn complex patterns in the market. Deep learning models, in particular, are able to extract valuable features from large amounts of historical data automatically to support more accurate trading decisions [4]. However, the ultimate goal of both rule-based and machine learning methods is to ensure the high complexity and uncertainty of the financial markets for strategy execution, and developing trading strategies that can effectively capture market opportunities while adapting to their dynamic changes remains a major challenge in the current algorithmic trading field [5]. In the past decade, deep learning and reinforcement learning (RL) technology have shown great potential and application value in the optimization of financial trading systems. Deep learning provides a new perspective for the analysis and prediction of financial markets through its ability to identify complex patterns and features in massive amounts of data automatically. Reinforcement learning (RL), particularly its advanced branch known as deep reinforcement learning (DRL), has emerged as a leading technology for solving sequential decision-making problems by simulating the learning mechanism to identify optimal action strategies through continuous interaction with the environment [6]. DRL’s notable breakthroughs, such as AlphaGo’s victory over the World Go Champion, have validated its capability to handle complex decision-making challenges. Compared to classical stochastic control methods, RL offers innovative approaches that leverage information through iterative interactions between the agent and the system. The financial industry has seen numerous successes with RL algorithms in areas like order execution, market making, and portfolio optimization, generating significant interest and accelerating the development of RL technologies. This progress aims to enhance trading decisions in various financial markets, particularly when participants have limited information about the market and other competitors.

In recent years, deep learning and reinforcement learning technologies have made remarkable breakthroughs and successes in many fields; however, there are still many challenges in applying these cutting-edge technologies to the field of financial trading, especially in the development of efficient trading algorithms. The high degree of uncertainty in financial markets, the non-stationary nature of data, and the need to make precise decisions in a very short period of time have greatly increased the complexity of designing and implementing effective trading strategies. In addition, an ideal financial trading system not only needs to pursue maximum returns but also fully considers the importance of risk control in the strategy to ensure that the trading strategy adopted has a high degree of robustness and reliability [7]. The recent study by Mugerman [8] sheds light on strategic decision-making and dynamics in competitive environments, which are directly relevant to financial markets. By integrating these insights, this manuscript can better position itself within the broader context of economics and finance research, increasing its appeal to a wider audience.

Despite these challenges, initial progress has been made in the application of deep reinforcement learning (DRL) techniques, especially Deep Q networks (DQNs) [9] and its variants, in financial transactions. Researchers began to try to use DQN technology to model and analyze the financial market and designed intelligent algorithms to find and use trading opportunities in the market automatically. The goal of these algorithms is to maximize return on investment and Sharpe ratios intelligently in a volatile market environment, thereby improving the overall performance of trading strategies [10].

DQN and its variants make decisions directly from high-dimensional raw data by learning optimal action strategies, which provides a new perspective for analysis and forecasting in financial markets. Especially when working with financial time series data, DQN is able to identify and exploit hidden patterns and trends in historical information to predict future market dynamics. Compared with traditional financial analysis methods, this model-based learning method can more flexibly adapt to market changes and provide more accurate and stable trading decision support.

The main contribution of this paper is to propose an innovative deep reinforcement learning (DRL) algorithm, the Composite Multi-Role Deep Q Network (CMR-DQN), which is designed to improve the accuracy and efficiency of stock market forecasting and decision-making. By combining the discrete wavelet transform (DWT), the temporal convolutional network (TCN), and the GRU–LSTM–Attention mechanism, CMR-DQN optimizes the processing power of traditional deep neural networks (DNNs) in many aspects, especially in the ability to parse complex financial time series data. This algorithmic framework not only enhances the performance of the model in capturing long-term dependencies and multi-scale features but also effectively improves the quality of stock market decision-making through advanced learning strategies.

In summary, our research makes the following major contributions:

First, the CMR-DQN model proposed in this paper innovatively solves the limitations of traditional deep neural networks in processing complex financial time series data by combining discrete wavelet transform (DWT), time convolutional network (TCN), and GRU–LSTM–Attention mechanism. This combination not only demonstrates excellent capabilities in multi-scale analysis but also significantly improves the accuracy of stock market forecasts and the quality of decision-making through fine feature extraction and efficient use of key information points.

Second, CMR-DQN introduces the Rainbow DQN reinforcement learning strategy to achieve self-learning and strategy optimization in a simulated stock trading environment. The adoption of this strategy not only enhances the model’s adaptability to market dynamics but also ensures that the model can continue to optimize trading decisions in the face of changing market conditions to achieve long-term profitability and risk control.

Third, in addition to the fundamental indicators, such as Close, Open, High, Low, and Volume, our CMR-DQN algorithm incorporates a diverse set of technical indicators, including RSI, ROC, CCI, and MACD. Notably, our innovation extends to the utilization of EXPMA and VMACD indicators, which enrich the depth and sophistication of our trading model.

Finally, this study also incorporates six stocks from both international and domestic markets and thoroughly demonstrates the superior performance of CMR-DQN through comparisons with traditional strategies as well as various strategies based on deep reinforcement learning.

2. Related Work

The core driving force for the exploration and application of algorithmic trading lies in the use of advanced computing technology to solve the limitations of traditional trading methods. In the practice of algorithmic trading, rule-based methods and machine-learning-based strategies are the two main directions. Rule-based algorithmic trading relies on an in-depth understanding of market rules through the setting of a fixed set of trading rules to automate buying and selling operations. The advantage of this approach is that it is straightforward and easy to implement, but at the same time, it is also inflexible and may not be able to adapt in time in the face of rapid changes in the market.

2.1. Deep Reinforcement Learning in Trading

In the field of financial transactions, the revolutionary application of deep reinforcement learning (DRL) is attracting increasing attention from industry and academia. DRL combines the data processing capabilities of deep learning with the decision-making capabilities of reinforcement learning, showing great potential in solving complex financial decision-making problems. Since the first successful application of the Deep Q Network (DQN) algorithm by Mnih and Silver and its performance at the human level or even beyond humans in multiple tasks, DRL technology has made remarkable progress in the research and application of financial trading strategies [11,12]. In addition to the above-mentioned work, DRL technology is also being applied to a wider range of financial transactions. For example, Li demonstrated the ability of DRLs to manage market shocks and transaction costs by effectively solving the optimal execution problem in high-frequency trading by employing an improved DRL model [13]. At the same time, Ma et al. proposed a parallelized deep reinforcement learning (DRL) framework for stock trading, which combines multiple neural network models and reinforcement learning strategies to analyze real-time market data and long-term historical trends simultaneously [14]. In addition, DRLs have also shown their unique value in risk control and management.

Liu and his team have developed a new algorithmic trading strategy based on deep reinforcement learning (DRL), which combines financial market data, news sentiment analysis, and technical indicators. They employ an actor–critic approach and a duel network architecture that effectively captures the key features of time series data and candlestick charts with the aim of optimizing the trading decision-making process [15]. Similarly, Fan and Peng further explored the application of DRL in financial stock investment management by combining the methods of DQN and LSTM networks, demonstrating its potential to improve investment efficiency and decision-making quality [16]. The DADE-DQN model proposed by Huang demonstrates the effectiveness of DRL in improving the performance of stock trading strategies by introducing dual-action selection and dual-environment mechanisms, as well as combining long short-term memory (LSTM) and attention mechanisms [17]. This innovative DRL application not only optimizes the trading decision-making process but also improves the overall return of the strategy.

2.2. Optimization of Deep Reinforcement Learning Algorithms

Advancements in DRL technology are a key driver of continuous innovation in the financial trading space. With the continuous evolution of algorithms, learning mechanisms, and network structures, DRLs provide new perspectives and tools for solving complex financial decision-making problems. For example, the dual DQN algorithm proposed by Hasselt et al. solves the problem of possible overestimation of the action value function in the traditional Q learning process and significantly improves the accuracy of estimation and the performance of the model by introducing the dual learning goal [18]. In addition, the preferential experience replay technique introduced by Schaul et al. improved the learning efficiency and stability of the model in complex environments by assigning a higher resampling priority to important learning experiences [19].

The introduction of the dueling network structure is an important innovation in the structure of the DRL model. The work by Hessel et al. demonstrated unprecedented data efficiency and final performance on the Atari 2600 benchmark by extending and combining six independent DQN algorithms into an ensemble model, the Rainbow DQN [20]. Wang et al. optimized the model’s estimation of the value of behavior in different states by dividing the network into two independent paths representing the state-value function and the state-action dominance function, further enhancing the fine-grained control in the decision-making [21]. These technological advances not only improve the theoretical robustness of the model but also provide practical benefits for practical applications such as the development of financial trading strategies. In addition, the Deep Recurrent Q Network (DRQN) enables the DRL model to better process and memorize sequence information by introducing recurrent neural networks (RNNs), especially LSTM units, which is particularly important for financial transaction scenarios rich in time series data [22]. Chen researched a deep reinforcement learning algorithm based on Duel Deep Cyclic Q Network (Dueling DRQN) to improve the accuracy of stock price forecasting by combining the advantages of the recurrent neural network model in processing sequence data with the ability of the Q algorithm to learn stock price patterns and trends adaptively [23]. Ye and Schuller proposed a trading model that mimicks human trader behavior, which aims to improve the consistency of machine trading algorithms with human behavior by combining supervised learning, single-step and multi-step Q learning, and imitation learning and using discrete wavelet transform (DWT) to process data and verify its performance beyond the baseline model for a variety of U.S. stocks through actual backtesting [24]. Shah employ a range of integrated methods, including rainbow DQN, GRU, and LSTM, for real-time stock market forecasting and trading signal generation. They measure the effectiveness of these technologies by looking at forecast accuracy and return on investment (ROI). When comparing deep learning models, especially GRU and LSTM, Rainbow DQNs have been found to perform best among models such as Rainbow DQN, DQN, and dual DQN [25].

2.3. Stock Data Denoising

In conventional methods, Fourier transform (FT) is commonly used for analyzing the frequency domain information of signals [26]. However, it is not suitable for directly applying to non-stationary signals such as stock price data [27]. Wavelet transform (WT), on the other hand, overcomes the drawbacks of FT in analyzing and reconstructing non-stationary signals [28]. Researchers have successfully applied the discrete wavelet transform (DWT) method to stock prediction tasks, achieving promising results [29]. Wang introduced a neural network architecture named the multiscale long short-term memory network (mLSTM), which integrates wavelet transform (WT) and long short-term memory (LSTM). The mLSTM incorporates the DWT method into the deep learning framework to obtain multiscale subsequences of different frequencies, which are independently modeled by LSTM [30].

In addition, by introducing methods such as dilated convolutions and residual connections into convolutional neural networks (CNNs), temporal convolutional networks (TCNs) have gained popularity in sequence modeling tasks. Bai, Kolter, and Koltun demonstrated that TCNs have surpassed the performance of RNNs in a range of sequence modeling tasks [31]. Gone proposed a multi-stage TCN-LSTM hybrid attentive network (MSHAN) for stock trend prediction, leveraging historical stock data, technical indicators, and weighted social media information to outperform baselines and enhance trading profits through accurate directional movement prediction of daily stock prices [32]. Dechun Wen introduced MWDINet, a stock price prediction framework integrating wavelet decomposition, Hull moving average, and autocorrelation correction modules. Experimental results demonstrate its superiority over existing models, showcasing its potential for accurate stock price forecasting [33].

3. Models and Methods

In the financial markets, traders face the daunting task of reducing risk while simultaneously increasing profitability. Inspired by the DQN algorithm, we propose a novel approach called CMR-DQN (Compound Multiscale Representation Deep Q Network) aimed at addressing this challenge.

CMR-DQN is an extension of the DQN algorithm specifically designed for financial trading problems. Typically, DQN-based reinforcement learning agents require a large number of training events to learn the optimal strategy and maximize cumulative rewards. However, in stock trading, due to high data noise, randomness, and limited real stock data, this may lead to overfitting on the test set and degraded performance. Our goal is to overcome overfitting and effectively explore the best strategies with limited data. The model flowchart is illustrated in Figure 1.

3.1. DWT-TCN

DWT-TCN (denoising wavelet transform temporal convolutional network) is a deep learning model designed specifically for denoising tasks in time series data. It integrates the advantages of discrete wavelet transform (DWT) and temporal convolutional network (TCN). Its core objective is to effectively capture both long-term and short-term dependencies in time series data to enhance denoising performance.

The processing pipeline of the model involves three key steps. First, the input time series data undergo processing in the wavelet transform layer, where it is decomposed into approximation coefficients and detail coefficients of different scales to obtain feature representations. Subsequently, these feature representations are passed to the temporal convolutional network for further processing. The temporal convolutional network consists of stacked convolutional layers and activation functions, aimed at extracting long-term and short-term dependencies in the time series data. Finally, the output of TCN is processed through a global average pooling layer to aggregate features, generating the final denoised time series data. The entire process aims to denoise time series data efficiently by combining the processing capabilities of wavelet transform and temporal convolutional networks. The specific process is illustrated in Figure 2.

3.1.1. Discrete Wavelet Transform (DWT)

The discrete wavelet transform (DWT) is a commonly used technique in signal processing [34], which provides a method for multiscale analysis of signals by decomposing them into different scales of approximation parts

A_{j}

and detail parts

D_{j}

. In the DWT layer, the input time series data

x (t)

is decomposed into different levels of approximation coefficients

A_{j + 1}

and detail coefficients

D_{j + 1}

, expressed as Equation (1):

A_{j + 1}, D_{j + 1} = D W T (x (t)) .

(1)

These coefficients represent the time–frequency features of the signal at different scales. Through this decomposition, we can better understand the patterns of change in time series data at different time scales, thus more effectively capturing the data’s characteristic information.

3.1.2. Temporal Convolutional Network (TCN)

TCN represents a significant advancement in deep learning-based forecasting, building upon enhancements to CNN architectures. While CNNs find widespread use in image processing, adapting them effectively for time series prediction necessitated tailored refinements. Through the integration of specific requirements and features of time series prediction, researchers successfully achieved superior results with TCN [35]. By incorporating dilated convolutions alongside causal convolutions, TCN expands the receptive field and refines accuracy. Its structural diagram is depicted in Figure 2.

3.2. Rainbow DQN

Rainbow DQN (Rainbow Deep Q Network) stands as an innovative advancement in deep reinforcement learning, aiming to overcome the limitations of standard Deep Q Network (DQN) algorithms [18]. It integrates various enhancements such as prioritized experience replay, double Q-learning, dueling architecture, multi-step learning, distributional RL, and noisy networks, offering improved sample efficiency, faster learning, and enhanced generalization capabilities. Rainbow DQN is particularly notable for its application in predicting buy/sell signals for stock prices using integrated deep learning techniques. This comprehensive approach makes Rainbow DQN a potent tool for addressing a wide range of reinforcement learning challenges. The specific working diagram of Rainbow DQN is shown in Figure 3.

Prioritized Experience Replay Buffer: Tom proposed using prioritized experience replay buffer instead of the ordinary replay buffer, assigning higher priority to important experiences based on the TD error

(δ)

[19]. The temporal difference error, as in Equation (2), measures the discrepancy between predicted and actual outcomes in reinforcement learning.

δ_{t} = R_{t + 1} + γ m a x_{a} \hat{q} (S_{t + 1}, a, w) - \hat{q} (S_{t}, A_{t}, w)

(2)

The priority of

p_{t}

: the experience is considered to be its magnitude and is stored in the buffer, as shown in Equation (3).

p_{t} = ∣ δ_{t} ∣ + e

(3)

Dueling Architecture: Sampling experience tuples with probabilities

P_{(i)}

during batch creation operates akin to the SoftMax function, facilitating a reduction in the frequency of batch updates. To address experience tuples that are starved for attention, reintroducing uniform random sampling with a constant value

α

can be beneficial. Adjustments to the update rule are necessary to counteract biases stemming from the uneven distribution of priority values, as shown in Equation (4).

P_{(i)} = {p_{i}}^{a} / (Σ_{k} {p_{k}}^{a})

(4)

Dueling networks introduced a novel approach utilizing two separate streams in the neural network architecture [21]. One stream, known as the value stream

(V (s))

, is dedicated to directly estimating the value function for states. This is particularly effective because states often exhibit minimal variation. The other stream, termed the advantage stream

(A (s, a))

, focuses on estimating the advantage for each action, thereby capturing the differences in action outcomes within each state. By aggregating the outputs from both streams, the model can derive action values

(Q (s, a))

effectively. This innovative framework enhances the network’s ability to understand and represent the nuanced relationships between states and actions. The architecture and formulae are represented as shown in Figure 4 and Equation (5).

Q (s, a) = V (s) + {A (s, a) - 1 / | A | [Σ_{a} A (s, a)]}

(5)

Double DQN: The main difference between double DQN and traditional DQN lies in how they calculate the target Q-values [36]. Compared to standard DQN, double DQN is better equipped to overcome the problem of overestimation present in DQN. This is because DQN relies solely on the current network when selecting the optimal action, which can lead to an overestimation of Q-values, thereby affecting training effectiveness and stability. In contrast, Double DQN addresses this issue by utilizing the target network to evaluate the Q-values of the optimal actions, effectively reducing the impact of overestimation and enhancing training stability and effectiveness. Specifically, the formula for calculating the target Q-values is as shown in Equation (6).

Y_{t} = R_{t + 1} + γ Q_{θ^{'}} (S_{t + 1}, \arg \max_{a} Q_{θ} (S_{t + 1}, a)),

(6)

where

Y_{t}

represents the target Q-value,

R_{t + 1}

is the reward from the environment,

γ

is the discount factor,

S_{t + 1}

is the next state,

Q_{θ^{'}}

is the output of the target network,

Q_{θ}

is the output of the current network, and

{\arg \max}_{a} Q_{θ} (S_{t + 1}, a)

denotes selecting the action with the highest Q-value according to the current network. This approach helps reduce overestimation, enhancing learning stability and making the algorithm more robust.

Distributional RL: Distributional reinforcement learning (distributional RL) is a class of value-based reinforcement learning (RL) algorithms. Classical value-based RL methods attempt to model the cumulative return using expected values, represented by value functions

V (x)

or action-value functions

Q (x, a)

. However, in this modeling process, complete distributional information is largely lost. As illustrated in Equation (7), value distributional reinforcement learning aims to address this issue by modeling the distribution

Z (x, a

) of the random variable representing the cumulative return, rather than only modeling its expectation [37]. This approach provides a more comprehensive understanding of uncertainty and variability in the RL problem, enabling agents to make more informed decisions.

Q (x_{t}, a_{t}) = E Z (x_{t}, a_{t}) = E [\sum_{i = 1}^{\infty} γ_{t + i} R (x_{t + i}, a_{t + i})]

(7)

NoisyNets: In Rainbow DQN, the NoisyNets module introduces noise parameters into the weights and biases of the neural network to increase its exploration and stability [38]. NoisyNets parameterizes noise to introduce randomness into the network, aiding in the better exploration of the state space during training and enhancing convergence speed and generalization capability Specifically, for a linear layer with inputs x and outputs y, and weights W and biases b, NoisyNets add parameterized noise to these weights and biases. The noise parameters are denoted by

μ

Σ

, where

ϵ

is a zero-mean noise vector with a fixed statistical distribution. The formal representation is as shown in Equation (8).

y \overset{def}{=} (μ_{W} + Σ_{W} ⊙ ϵ_{W}) x + μ_{b} + Σ_{b} ⊙ ϵ_{b}

(8)

Here,

μ_{W}

and

μ_{b}

represent the mean parameters for the weights and biases, respectively;

Σ_{W}

and

Σ_{b}

denote the standard deviation parameters;

ϵ_{W}

and

ϵ_{b}

represent noise vectors with zero mean and fixed statistical properties; and ⊙ represents element-wise multiplication. By adjusting

μ

and

Σ

, the distribution and intensity of the noise can be controlled, thus affecting the exploration and generalization capabilities of the network.

Multi-Step Q-Learning: Multi-Step Q-Learning introduces multi-Step TD targets, which accumulate rewards from multiple future time steps to estimate the long-term value of state-action pairs more accurately [39]. These targets cover rewards from the current time step t up to t+n and use a discount factor

γ

to control the importance of future rewards. Based on these targets, multi-step Q-learning updates Q-values to more accurately reflect the influence of long-term rewards. The update rule is as shown in Equation (9):

Q (S_{t}, A_{t}) \leftarrow Q (S_{t}, A_{t}) + α [G_{t : t + n} - Q (S_{t}, A_{t})] .

(9)

Q (S_{t}, A_{t})

represents the estimated Q-value after taking action

A_{t}

in state

s_{t}

,

α

is the learning rate controlling the magnitude of updates, and

G_{t : t + n}

is the multi-step return from time step t to

t + n

as in Equation (10):

G_{t : t + n} = R_{t + 1} + γ R_{t + 2} + γ^{2} R_{t + 3} + \dots + γ^{n - 1} R_{t + n} + γ^{n} Q (S_{t + n}, A_{t + n}) .

(10)

By employing this approach, multi-step Q-Learning enhances the learning efficiency and performance of agents, enabling them to better cope with the complex dynamics of their environment.

3.3. GRU-LSTM-Attention

In Rainbow DQN, we use the GRU–LSTM–Attention model instead of the original DNN as the Q-value estimator. This model combines GRU, LSTM, and attention mechanisms to better process sequence data. It can effectively capture long-term dependencies in sequence data and improve the model’s ability to understand sequence data. Within the model architecture, the outputs of the GRU layer serve as inputs to the LSTM layer, and the outputs and final state of the LSTM layer are passed to the attention mechanism. Specifically, the output

h_{t}

from the GRU layer, representing the hidden state at the current time step, is used as input to the LSTM layer. Conversely, the outputs and final state

h_{n}

from the LSTM layer are utilized as inputs to the attention mechanism, where

h_{n}

denotes the hidden state at the last time step.

For GRU, the inputs

x_{t}

and the previous hidden state

h_{t - 1}

undergo a series of operations to produce the current hidden state

h_{t}

, as shown in Equation (11):

\begin{matrix} r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r}) \\ z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z}) \\ {\tilde{h}}_{t} = \tanh (W_{h} \cdot [r_{t} \cdot h_{t - 1}, x_{t}] + b_{h}) \\ h_{t} = (1 - z_{t}) \cdot h_{t - 1} + z_{t} \cdot {\tilde{h}}_{t} . \end{matrix}

(11)

For LSTM, the inputs

x_{t}

and the previous hidden state

h_{t - 1}

are processed to produce the current hidden state

h_{t}

and cell state

C_{t}

, as shown in Equation (12):

\begin{matrix} f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}) \\ i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) \\ {\tilde{C}}_{t} = \tanh (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C}) \\ C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t} \\ o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) \\ h_{t} = o_{t} \cdot \tanh (C_{t}) . \end{matrix}

(12)

The LSTM outputs and final state

h_{n}

are passed to the attention mechanism. The attention mechanism calculates attention scores

e_{t i}

between each LSTM output

h_{i}

and

h_{n}

. These scores are normalized to obtain attention weights

a_{t i}

, representing the importance of each LSTM output. Finally, the context vector

c_{t}

at time step t is computed as a weighted sum of the LSTM outputs, using the attention weights

a_{t i}

, as show in Equation (13):

\begin{matrix} e_{t i} = score (h_{i}, h_{n}) \\ α_{t i} = \frac{\exp (e_{t i})}{\sum_{j = 1}^{T} \exp (e_{t j})} \\ c_{t} = \sum_{i = 1}^{T} α_{t i} \cdot h_{i} . \end{matrix}

(13)

By embedding the GRU–LSTM–Attention model into Rainbow DQN, we leverage its ability to capture long-term dependencies and dynamically attend to important parts of the sequence, thereby enhancing the performance and effectiveness of Rainbow DQN.

3.4. CMR-DQN Training

In summary, Algorithm 1 presents the pseudocode of the entire CMR-DQN algorithm, which forms the foundation of our proposed trading strategy. Throughout the trading process, this strategy undergoes continual updates based on daily observations, ensuring ongoing learning and refinement of the trading approach.

Algorithm 1 CMR-DQN Algorithm

Data:Time series data denoised using DWT; Each trading signal represents one of three possible actions
Initialize CMR-DQN network parameters $θ$ and target network parameters $θ_{target}$
Initialize replay buffer D
Initialize exploration rate $ϵ$
Initialize data and other parameters
for $episode = 1$ to M do
Initialize state s
for $time step t = 1$ to $T_{\max}$ do
Choose action a based on current state s and exploration rate $ϵ$
if random number < exploration rate $ϵ$ then
Randomly choose action a
else
Choose action $a = argmax (Q (s, a; θ))$
end if
Execute action a, observe reward r and new state $s^{'}$
Store $(s, a, r, s^{'})$ tuple in replay buffer D
Sample minibatch of transitions $(s_{j}, a_{j}, r_{j}, s_{j}^{'})$ from replay buffer D
for each sample transition $(s_{j}, a_{j}, r_{j}, s_{j}^{'})$ do
if state $s_{j}^{'}$ is terminal state then
Set target value $y_{j} = r_{j}$
else
Set target value $y_{j} = r_{j} + γ \cdot \max (Q (s_{j}^{'}, a^{'}; θ_{target}))$
end if
end for
Compute loss L and update CMR-DQN network parameters $θ$ via gradient
descent: $L = \frac{1}{N} \sum {(y_{j} - Q (s_{j}, a_{j}; θ))}^{2}$
if the current time step is the time to update the target network then
Update target network parameters $θ_{target} = θ$
end if
end for
end for

4. Experiments and Results

In this section, we will provide a detailed overview of establishing the environment, setting up and executing experiments. We begin by exploring the process of selecting the dataset and associated features. Following this, we delve into a detailed discussion of experiment settings. Finally, we analyze and discuss the experimental results, evaluating the effectiveness of the proposed model across six different stock index datasets.

4.1. Datasets

To assess the performance and robustness of our model comprehensively, we have selected a range of stock indices from different regions and economic backgrounds as our experimental subjects. These indices include the Dow Jones Industrial Average (DJI), the CAC 40 Index (FCHI) from France, the Nasdaq Composite Index (IXIC), the Nikkei 225 (N225), and as two Chinese stocks: Ping An Bank (SZ000001, abbreviated as PA) and Ziguang Co., Ltd., Beijing, China (SZ000938, abbreviated as ZG). Our dataset spans from 1 July 2011 to 20 December 2021.

4.2. Trading Experimental Setup

The trading experiment setup includes the following key components:

Processing Initialization: We conducted normalization on each input variable to ensure consistent scaling and prevent gradient explosions. This normalization resulted in a mean of 0 and a standard deviation of 1 for the normalized data, thereby enhancing stability during the training process. In the CMR-DQN framework, model initialization is a critical step to ensure network weights start training from a reasonable point. Proper initialization facilitates faster convergence and helps avoid undesirable local optima;
Evaluation metrics: The performance metrics for the trading experiment include several key indicators. The average annualized return rate (AAR) measures the average annual return on investment. The maximum drawdown (MDD) evaluates the largest peak-to-trough decline in investment value over a specific period. The Sharpe ratio (SP) quantifies the risk-adjusted return of the investment strategy, reflecting its efficiency. The cumulative return (CR) represents the total return on investment over the entire trading period;
Comparison Experiment: To objectively evaluate the strengths and weaknesses of the CMR-DQN algorithm, we set the initial capital to 200,000 and compared it with other trading methods (such as B&H and DQN). Additionally, by comparing it with Rainbow DQN and CMR-DQN without the Rainbow components (i.e., DQN-GLA), we conducted an ablation study to further validate its performance.

During the training process of our CMR-DQN model, we utilized graphical representations to visualize different functions. Figure 5 illustrates both the variation of cumulative rewards and the fluctuation of the loss function during training. This reflects the model’s progress in learning and optimizing its decision-making strategies, as well as its performance in parameter optimization and data fitting, across six distinct datasets.

Observing these two plots enables us to understand the model’s performance in reward acquisition and parameter optimization simultaneously, while also allowing for a comparative analysis of its performance across diverse datasets. These visualizations provide comprehensive monitoring of the training process, aiding in the assessment of the model’s performance across various environments and guiding further adjustments in training strategies.

By observing the results from these two charts, it is evident that the CMR-DQN model has converged in both cumulative reward and loss functions, indicating stable and promising performance during training. This convergence not only showcases the model’s capability in exploring and exploiting potential returns in the stock market but also reflects its flexibility in dynamically adjusting and optimizing trading strategies. The model can adeptly adapt to evolving market conditions and adjust decisions promptly to maximize portfolio efficiency.

4.3. Feature Selection

In the CMR-DQN algorithm, we utilize a variety of features for stock prediction. These features encompass fundamental stock indicators, such as closing price, opening price, highest price, lowest price, and trading volume, along with a series of technical indicators including relative strength index (RSI), rate of change (ROC), commodity channel index (CCI), and moving average convergence divergence (MACD). Particularly noteworthy is our innovative inclusion of exponential moving average (EXPMA) and volume-weighted moving average convergence divergence (VMACD) in the feature selection process.

4.3.1. Exponential Weighted Moving Average (EXPMA)

The exponential moving average (EXPMA), also known as the daily average index, is a variant of the moving average line. It smooths out the fast and slow moving average lines at different speeds and assesses market buy and sell signals based on the crossover of these two lines. Compared to the traditional moving average (MA) indicator, EXPMA places more emphasis on daily price changes. While the traditional MA relies on the arithmetic average of price fluctuations N days ago, influenced by the highs and lows of N days prior, it may not immediately reflect real-time price changes. EXPMA, on the other hand, is a trend-based indicator that gives more weight to the daily price performance, thereby overcoming lag to some extent, making it an effective medium-term indicator. The calculation formula for EXPMA is as shown in Equation (14):

E X P M A = α \times C l o s e + (1 - α) \times E X P M A_{p r e v .}

(14)

Here,

C l o s e

represents the closing price of the day,

α

s the smoothing coefficient typically ranging from 0 to 1, and

E X P M A_{p r e v}

denotes the EXPMA value of the previous day. The primary purpose of EXPMA is to identify buy and sell opportunities and predict future price trends. However, it is essential to note that while EXPMA reflects daily price changes, it may not accurately predict price peaks and bottoms.

4.3.2. Volume-Weighted Moving Average Convergence Divergence (VMACD)

In financial market technical analysis, the moving average convergence divergence (MACD) indicator is commonly used. However, MACD typically relies solely on price data, particularly the closing price, without considering volume data. To address this limitation and incorporate volume data into the analysis, the volume-weighted moving average convergence and divergence (VMACD) indicator is introduced. The calculation formula for VMACD is as shown in Equation (15):

\begin{matrix} DIFF : EMA (VOL, SHORT) - EMA (VOL, LONG) \\ DEA : EMA (DIFF, M) \\ VMACD : DIFF - DEA . \end{matrix}

(15)

Here, DIFF represents the difference between two volume exponential moving averages (EMAs), one with a shorter period (SHORT) and the other with a longer period (LONG). DEA is the EMA of the DIFF values over a specified period (M). Finally, the VMACD line is derived by subtracting DEA from DIFF.

In practical applications, similar to MACD, VMACD signals potential market trends. However, VMACD may anticipate market movements a few days earlier than MACD. Specifically, when trading volume significantly deviates from its normal level and prices increase, it often signals the beginning of a market reversal. Crossing the zero line is also considered a signal of a market reversal.

It is essential to note that while VMACD provides valuable insights into market dynamics, it should be used in conjunction with price-based indicators for comprehensive analysis. Additionally, VMACD is commonly utilized alongside other indicators like KD and KDJ for enhanced decision-making in trading strategies.

4.4. Experimental Results

This section presents a comparison between the proposed CMR-DQN framework and five evaluation methods. The experimental results on six stock index datasets are shown in Table 1. Trading performance is assessed using four performance metrics: cumulative return (CR), Sharpe ratio (SR), annualized return (AR), and maximum drawdown (MDD).

Table 1 highlights the performance of different trading methods across six assets. Among the listed metrics, the CMR-DQN strategy achieved the highest scores, indicating its superior performance. Compared to other strategies, CMR-DQN attained significantly higher cumulative returns and demonstrated the smallest maximum drawdown, underscoring its robust profitability and risk mitigation across the six datasets. For instance, on the IXIC dataset, the CMR-DQN strategy achieved a cumulative return of 49.15%, outperforming all other methods. Its Sharpe ratio was 0.99, reflecting excellent risk-adjusted returns. Additionally, the CMR-DQN strategy had the lowest maximum drawdown of 16.35% on the IXIC dataset, further highlighting its effectiveness.

Similar observations can be made for the other assets listed in the table. In terms of cumulative returns, Sharpe ratio, and annualized returns, CMR-DQN consistently outperformed benchmark methods, demonstrating its superior performance. These results validate the proposed CMR-DQN framework’s effectiveness in generating profitable trading strategies with low risk.

Figure 6 displays the cumulative asset growth curves for various strategies, including B&H, DQN, Rainbow DQN, DQN-GLA, and the newly proposed CMR-DQN. It is evident from the graph that the performance of the CMR-DQN strategy surpasses that of the others. CMR-DQN not only significantly reduces loss risk but also achieves higher returns. Moreover, under the CMR-DQN strategy, total asset growth is smoother compared to the benchmark strategy.

The performance shown in Figure 6 aligns with the performance metrics in Table 1. Across different asset categories, CMR-DQN consistently outperforms other benchmark methods, demonstrating its effectiveness in generating profitable trading strategies. This strategy not only avoids substantial losses but also delivers additional returns compared to benchmark and alternative methods.

These findings further validate the superiority of the CMR-DQN framework in creating profitable trading strategies. By successfully balancing risk and return, CMR-DQN offers investors a more stable and potentially more profitable investment approach than traditional strategies and other deep reinforcement learning methods.

4.5. Discussion

This paper introduces and explores the CMR-DQN algorithm, a DQN extension designed to enhance stock trading strategies. Through a comprehensive performance evaluation of different trading methods across six assets, the results in Table 1 clearly demonstrate that the CMR-DQN strategy consistently excels across all assets. Notably, the CMR-DQN strategy consistently achieved the highest cumulative returns among all comparative methods, highlighting its advantages in optimizing trading decisions and capturing market opportunities.

The CMR-DQN strategy excels not only in returns but also in risk management. By maintaining consistently low maximum drawdowns, the CMR-DQN strategy effectively protects investors’ capital amid market volatility while achieving stable returns under controlled risk. This capability is repeatedly validated in this study, supported by mathematical models and empirical data. Specifically, the CMR-DQN strategy showcased its profitability and risk mitigation abilities across six datasets, which holds significant implications for policymakers, traders, and financial experts.

Additionally, we compared multiple deep reinforcement learning (DRL) strategies with traditional strategies (such as B&H, DQN, Rainbow DQN, and DQN-GLA) under the same conditions, further affirming the superiority of the CMR-DQN algorithm. Regardless of the trading objectives, the CMR-DQN algorithm consistently outperformed benchmark methods, achieving notable cumulative returns and Sharpe ratios. For instance, in the FCHI dataset, the CMR-DQN strategy achieved a cumulative return of 55.23% and a Sharpe ratio of 0.73, highlighting its ability to generate profitable trading strategies while effectively managing risk.

In conclusion, the CMR-DQN algorithm has demonstrated its effectiveness and superiority in stock trading strategies through empirical research. The algorithm not only excels in cumulative returns and risk management but also showcases strong adaptability across various market conditions. These results lay a solid foundation for further research and application of DRL-based trading strategies, providing valuable new tools and methodologies for traders and financial decision-makers.

5. Conclusions

The CMR-DQN model integrates discrete wavelet transform (DWT), time convolutional network (TCN), and a GRU–LSTM–Attention mechanism to enhance traditional deep neural networks for complex financial time series data analysis. This innovative approach excels in multi-scale analysis, significantly improving stock market predictions and decision-making by effectively extracting features and utilizing key information points. Additionally, CMR-DQN incorporates the Rainbow DQN reinforcement learning strategy, optimizing simulated stock trading through self-learning and dynamic strategy optimization. The model leverages various technical indicators, such as Close, Open, High, Low, Volume, RSI, ROC, CCI, MACD, and introduces EXPMA and VMACD indicators for more sophisticated trading decisions.

Comprehensive evaluations on six international and domestic stocks demonstrate the superior performance of CMR-DQN in stock market prediction and trading decisions. However, there is potential for improvement by diversifying feature indicators and refining complex trading mechanisms. Future research could explore additional technical indicators, market signals, and macroeconomic data to enhance the model’s sensitivity and accuracy. Further optimization of trading decision mechanisms, incorporating sophisticated strategy combinations and refined risk management models, could improve its stability and profitability. Extending the model’s application to broader markets and asset classes like futures and forex would also verify its generalizability and applicability. Continuous refinement and in-depth research could establish CMR-DQN as a vital tool for market analysis and trading strategy support.

Author Contributions

Data curation, C.W.; Writing—original draft, Q.W.; Supervision, X.C.; Funding acquisition, C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by Hainan University under Grant (Project No: KYQD(ZR)-21014), the National Key Research and Development Program of China (Project No: 2021YFC3340800), the National Natural Science Foundation of China (Project No: 62177046), and the High Performance Computing Center of Central South University (HPC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study were obtained from Yahoo Finance and are available at https://finance.yahoo.com/ (accessed on 1 August 2024).

Acknowledgments

The author expresses sincere gratitude to Chen for his valuable support in accessing the High-Performance Computing Center of Central South University (HPC) resources. The author deeply appreciates his generous contribution.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Prime, S. Forecasting the changes in daily stock prices in Shanghai Stock Exchange using Neural Network and Ordinary Least Squares Regression. Investig. Manag. Financ. Innov. 2020, 17, 292–307. [Google Scholar] [CrossRef]
Dixon, F.; Halperin, I.; Bilokon, P. Machine Learning in Finance: From Theory to Practice; Springer International Publishing: Cham, Switzerland, 2020; pp. 3–46. [Google Scholar]
Hambly, B.; Xu, R.; Yang, H. Recent advances in reinforcement learning in finance. Math. Financ. 2023, 33, 437–503. [Google Scholar] [CrossRef]
João, C.; Rui, N.; Nuno, H. Reinforcement learning applied to Forex trading. Appl. Soft Comput. 2018, 73, 783–794. [Google Scholar]
Koyano, S.; Ikeda, K. Online portfolio selection based on the posts of winners and losers in stock microblogs. In Proceedings of the 2017 IEEE Symposium Series on Computational Intelligence (SSCI), Honolulu, HI, USA, 27 November–1 December 2017; pp. 1–4. [Google Scholar]
Chou, J.S.; Nguyen, T. Forward Forecast of Stock Price Using Sliding-Window Metaheuristic-Optimized Machine-Learning Regression. IEEE Trans. Ind. Inform. 2018, 14, 3132–3142. [Google Scholar] [CrossRef]
Tsai, M.C.; Cheng, C.H.; Tsai, M.T.; Shiu, H.Y. Forecasting leading industry stock prices based on a hybrid time-series forecast model. PloS ONE 2019, 13, e0209922. [Google Scholar] [CrossRef] [PubMed]
Mugerman, Y.; Winter, E.; Yafeh, T. Herding and Divergent Behaviors in Competition: An Experimental Study. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
Volodymyr, M.; Koray, K.; David, S.; Alex, G.; Ioannis, A.; Daan, W.; Martin, A.R. Playing Atari with Deep Reinforcement Learning. arXiv 2013, arXiv:1312.5602. [Google Scholar]
Tsantekidis, A.; Passalis, N.; Toufa, A.; Saitas-Zarkias, K.; Chairistanidis, S.; Tefas, A. Price Trailing for Financial Trading Using Deep Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 2837–2846. [Google Scholar] [CrossRef] [PubMed]
Silver, D.; Huang, A.; Maddison, C.J.; Guez, A.; Sifre, L.; Van Den Driessche, G.; Schrittwieser, J.; Antonoglou, I.; Panneershelvam, V.; Lanctot, M.; et al. Mastering the game of Go with deep neural networks and tree search. Nature 2016, 529, 484–489. [Google Scholar] [CrossRef]
Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
Li, Y.; Zheng, W.; Zheng, Z. Deep Robust Reinforcement Learning for Practical Algorithmic Trading. IEEE Access 2019, 7, 108014–108022. [Google Scholar] [CrossRef]
Ma, C.; Zhang, J.; Liu, J.; Ji, L.; Gao, F. A parallel multi-module deep reinforcement learning algorithm for stock trading. Neurocomputing 2023, 449, 290–302. [Google Scholar] [CrossRef]
Liu, P.; Zhang, Y.; Bao, F.; Yao, X.; Zhang, C. Multi-type data fusion framework based on deep reinforcement learning for algorithmic trading. Appl. Intell. 2021, 53, 1683–1706. [Google Scholar] [CrossRef]
Jianjuan, F.; Shen, P. Financial Stock Investment Management Using Deep Learning Algorithm in the Internet of Things. Comput. Intell. Neurosci. 2022, 2022, 1687–5265. [Google Scholar]
Huang, Y.; Lu, X.; Zhou, C.; Song, Y. DADE-DQN: Dual Action and Dual Environment Deep Q-Network for Enhancing Stock Trading Strategy. Mathematics 2023, 11, 3626. [Google Scholar] [CrossRef]
Hasselt, H. Double Q-Learning; MIT Press: Cambridge, MA, USA, 2010; Volume 23, pp. 2613–2621. [Google Scholar]
Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
Hessel, M.; Modayil, J.; Van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Wang, Z.; Schaul, T.; Hessel, M.; Hasselt, H.V.; Lanctot, M.; Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 20–22 June 2015; Abstract Number 9. pp. 1995–2003. [Google Scholar]
Hausknecht, M.; Stone, P. Deep Recurrent Q-Learning for Partially Observable MDPs. arXiv 2015, arXiv:1511.06527. [Google Scholar]
Chen, X.; Wang, Q.; Yuxin, L.; Hu, C.; Wang, C.; Yan, Q. Stock Price Forecast Based on Dueling Deep Recurrent Q-network. In Proceedings of the 2023 IEEE 6th International Conference on Pattern Recognition and Artificial Intelligence (PRAI), Haikou, China, 18–20 August 2023; pp. 1091–1096. [Google Scholar]
Ye, Z.J.; Björn, W.S. Human-aligned trading by imitative multi-loss reinforcement learning. Expert Syst. Appl. 2023, 234, 120939. [Google Scholar] [CrossRef]
Raj, S.; Ashutosh, T.; Tej, B.; Uday, R. Real-Time Stock Market Forecasting Using Ensemble Deep Learning and Rainbow DQN. In Proceedings of the 3rd International Conference on Advances in Science & Technology (ICAST), Bahir Dar, Ethiopia, 2–4 October 2020. EngRN: Operations Research (Topic). [Google Scholar]
Ma, X.; Li, X.; Zhou, Y.; Zhang, C. Image smoothing based on global sparsity decomposition and a variable parameter. Comput. Vis. Media 2021, 7, 483–497. [Google Scholar] [CrossRef]
Sifuzzaman, M.; Islam, M.R.; Ali, M.Z. Application of Wavelet Transform and Its Advantages Compared to Fourier Transform. J. Phys. Sci. 2009, 13, 121–134. [Google Scholar]
Wang, L.; Gupta, S. Neural Networks and Wavelet De-Noising for Stock Trading and Prediction. Intell. Syst. Ref. Libr. 2013, 47, 229–247. [Google Scholar]
Fang, Y.; Fataliyev, K.; Wang, L.; Fu, X.; Wang, Y. Improving the genetic-algorithm-optimized wavelet neural network for stock market prediction. In Proceedings of the 2014 International Joint Conference on Neural Networks (IJCNN), Beijing, China, 6–11 July 2014; pp. 3038–3042. [Google Scholar]
Wang, J.; Wang, Z.; Li, J.; Wu, J. Multilevel Wavelet Decomposition Network for Interpretable Time Series Analysis. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery, London, UK, 19–23 August 2018. [Google Scholar]
Bai, S.; Kolter, J.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar]
Jiaying, G.; Hoda, E. Multi-Stage Hybrid Attentive Networks for Knowledge-Driven Stock Movement Prediction. In Proceedings of the International Conference on Neural Information Processing, Sanur, Bali, Indonesia, 8–12 December 2021. [Google Scholar]
Dechun, W.; Tianlong, Z.; Lexin, F.; Caiming, Z.; Xuemei, L. MWDINet: A multilevel wavelet decomposition interaction network for stock price prediction. Expert Syst. Appl. 2024, 238, 122091. [Google Scholar]
Michael, R.C. Wavelet Methods for Time Series Analysis. Technometrics 2001, 43, 491. [Google Scholar]
Chen, Z.; Zhu, Z.; Jiang, H.; Sun, S. Estimating daily reference evapotranspiration based on limited meteorological data using deep learning and classical machine learning methods. J. Hydrol. 2020, 591, 125286. [Google Scholar] [CrossRef]
Hasselt, H.V.; Guez, A.; Silve, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the AAAI’16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016. [Google Scholar]
Bellemare, M.G.; Dabney, W.; Rémi, A. A Distributional Perspective on Reinforcement Learning. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Fortunato, M.; Azar, M.G.; Piot, B.; Menick, J.; Osband, I.; Graves, A.; Mnih, V.; Munos, R.; Hassabis, D.; Pietquin, O.; et al. Noisy Networks for Exploration. arXiv 2017, arXiv:1706.10295. [Google Scholar]
Kristopher, D.A.; Fernando, H.; Zach, H.; Richard, S. Multi-step Reinforcement Learning: A Unifying Algorithm. arXiv 2017, arXiv:1703.01327. [Google Scholar]

Figure 1. The architecture of CMR-DQN framework.

Figure 2. The structural diagram of DWT-TCN.

Figure 3. Working diagram of Rainbow DQN.

Figure 4. Dueling architecture network.

Figure 5. The accumulation of rewards and the variation trend of the loss function during the training process of the CMR-DQN model on six datasets.

Figure 6. Results of Different Models on Six Datasets.

Table 1. The trading performance of six assets across different models.

Asset	Metric	B&H	DQN	Rainbow DQN	DQN-GLA	CMR-DQN
	CR[%]	−19.07	−24.46	8.16	14.62	28.22
	SR	−0.22	−0.28	0.11	0.16	0.29
PA	AR[%]	−33.24	43.29	13.76	25.15	50.48
	MDD[%]	56.42	59.97	33.98	29.84	20.92
	CR[%]	−15.58	12.13	0.63	23.23	31.26
	SR	−0.16	0.14	0.01	0.27	0.33
ZG	AR[%]	−27.52	18.03	1.04	40.97	48.30
	MDD[%]	52.35	31.97	39.70	23.86	18.73
	CR[%]	29.03	5.54	45.05	45.09	52.32
	SR	0.30	0.09	0.48	0.49	0.69
DJI	AR[%]	51.98	9.38	84.28	84.39	99.73
	MDD[%]	20.70	36.02	12.85	12.79	7.91
	CR[%]	−1.54	45.03	42.70	53.16	55.23
	SR	−0.02	0.48	0.46	0.72	0.73
FCHI	AR[%]	−2.03	84.29	79.42	101.5	106.04
	MDD[%]	41.03	12.88	13.61	6.52	6.18
	CR[%]	4.82	−0.62	17.54	8.81	20.37
	SR	0.03	−0.01	0.21	0.11	0.24
IXIC	AR[%]	8.17	−1.02	30.43	14.89	35.64
	MDD[%]	37.34	40.71	27.58	34.06	25.59
	CR[%]	11.52	35.86	36.32	37.02	41.96
	SR	0.13	0.35	0.38	0.41	0.45
N225	AR[%]	19.64	65.50	66.42	67.83	77.89
	MDD[%]	32.05	17.42	16.33	15.78	14.02

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, X.; Wang, Q.; Hu, C.; Wang, C. A Stock Market Decision-Making Framework Based on CMR-DQN. Appl. Sci. 2024, 14, 6881. https://doi.org/10.3390/app14166881

AMA Style

Chen X, Wang Q, Hu C, Wang C. A Stock Market Decision-Making Framework Based on CMR-DQN. Applied Sciences. 2024; 14(16):6881. https://doi.org/10.3390/app14166881

Chicago/Turabian Style

Chen, Xun, Qin Wang, Chao Hu, and Chengqi Wang. 2024. "A Stock Market Decision-Making Framework Based on CMR-DQN" Applied Sciences 14, no. 16: 6881. https://doi.org/10.3390/app14166881

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Stock Market Decision-Making Framework Based on CMR-DQN

Abstract

1. Introduction

2. Related Work

2.1. Deep Reinforcement Learning in Trading

2.2. Optimization of Deep Reinforcement Learning Algorithms

2.3. Stock Data Denoising

3. Models and Methods

3.1. DWT-TCN

3.1.1. Discrete Wavelet Transform (DWT)

3.1.2. Temporal Convolutional Network (TCN)

3.2. Rainbow DQN

3.3. GRU-LSTM-Attention

3.4. CMR-DQN Training

4. Experiments and Results

4.1. Datasets

4.2. Trading Experimental Setup

4.3. Feature Selection

4.3.1. Exponential Weighted Moving Average (EXPMA)

4.3.2. Volume-Weighted Moving Average Convergence Divergence (VMACD)

4.4. Experimental Results

4.5. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI