2. Literature Review
The development of algorithmic trading systems presents numerous challenges and opportunities. As discussed by Aldridge (2013), these challenges include the design and implementation of algorithms that can adapt to rapidly changing market conditions, manage large volumes of data, and execute trades with high precision and speed [
10]. Algorithmic trading has become a part of most major financial institutions’ strategies; hence, the role of automated trading systems in managing volatility and balancing portfolios has become increasingly critical. However, their use has also faced criticism for amplifying market volatility during the European energy crisis of 2022 [
11]. Such cases emphasise the importance of developing sophisticated algorithmic strategies and robust trading systems capable of adapting to market uncertainties.
The successful application of Q-Learning to financial trading was first introduced by Neuneier (1996) in their DM-USD4 currency pair and the DAX index trading strategy. This pioneering work showcased the potential of Q-Learning to enhance financial trading [
12,
13]. Moving forward, the success of Mnih et al.’s Deep Q-Network (DQN) algorithm popularised the use of experience replay in deep reinforcement learning research and applications [
12]. This can be seen in Jin and El-Saawy’s (2016) extension of Neuneier’s trading strategy [
14], where the author’s employed
-greedy exploration along with experience replay, reporting that their DQL agent surpassed both buy-and-hold and rebalance benchmarks. In addition to this, other studies have explored the application of DQN to various portfolio management applications. Jiang et al. (2018), for example, showed how DQN can be used to hedge portfolio risk [
15], while Duan et al. (2015) developed a deep learning system for event-driven stock prediction [
16]. Similarly, Zhu and Liu (2020) applied Q-learning, using consecutive days of stock price data to enhance predictive power and achieve stable returns [
17], while Dai and Zhang (2021) demonstrated that their reinforcement learning model outperformed both the buy-and-hold and MACD strategies in stock selection [
18].
A known issue with the traditional Q-Learning and DQN algorithms is the overestimation of Q-values, which can lead to suboptimal policies and reduced performance. Ning et al. (2018) demonstrated the benefits of a reduced overestimation bias in their approach to optimising trade execution using the Double Deep Q-Network (DDQN) algorithm, reporting that their model outperformed a standard benchmark with several stocks [
19].
To ensure that the algorithm can predict actions with better-than-random accuracy, it is essential to include features in the state representation that are predictive in nature [
20]. Nevmyvaka et al. (2006), for instance, emphasised the significance of including the agent’s current position in the state representation, as this facilitates the consideration of transaction costs [
21]. However, upon examining the existing literature, we found only a few studies employing value-based approaches to have incorporated the agent’s current position in state representation. A closer examination revealed that some studies either disregarded transaction costs, like the study of Bertoluzzo and Corazza (2012) [
22], or designed RL trading systems where position switching played no role, as in the study by Eilers et al. (2014), where positions closed after a maximum of two days [
12,
23]. In this study, therefore, we included the agent’s current trading position in the state representation.
To provide the agent with a comprehensive view of the environment, Eilers et al. modelled the state using eleven variables to depict market conditions, including daily open, high, low, and close values and technical indicators [
12,
23]. To avoid providing the agent with redundant information, a potential improvement to their approach could involve simplifying the state representation by using only the closing price. The closing price is typically highly correlated with the open, high and low prices of an asset, and therefore, it carries similar information about the output of the model. Contrary to this, we also found that oversimplifying the state representation can lead to suboptimal results. Sherstov and Stone’s (2004) model only included a stock’s recent price trend, resulting in their RL trading agent being outperformed by two benchmark models, particularly on days with high price fluctuations [
12,
24], suggesting that the state representation was too simplistic to capture stock price complexities, and perhaps enriching the state with additional information could potentially enhance the agent’s performance.
As we examined the existing literature, we found the agent’s state to be primarily modelled with price-based features. Consequently, price-based features can be viewed as the minimum information required for modelling the state, specifically incorporating the latest price history alongside a set of technical indicators to extract some insight into the most likely evolution of the stock price in the future. However, a problem noticed within research is that only a limited number of studies incorporate non-price-based data. Kaur (2017), as one of the few exceptions, included sentiment scores in the state and reported a Sharpe ratio increase from 1.4 to 2.4, highlighting the necessity of considering additional information on a particular stock beyond just the historical stock prices [
12,
25]. Expanding on this, Rong (2020) designed a method using deep reinforcement learning that uses time-series stock price data with the addition of news headlines for opinion mining [
26], while Li, Y., et al (2021) demonstrated an improved reinforcement learning model based on sentiment analysis by combining the deep Q-network with the sentiment quantitative indicator ARBR to build a high-frequency stock trading model for the share market. The authors report achieving a maximum annualised rate of return of 54.5% [
27].
The evaluation of algorithmic trading systems is a crucial aspect of the development and deployment process. As Pardo (2011) emphasised, it is imperative to ensure that one’s strategy compares favourably with other potential investments, and its performance must be compared to commercially available trading strategies to justify continued investment and use [
28]. One of the most critical measures of risk for a trading strategy is its maximum drawdown (the largest percentage drop in a portfolio’s value from a peak to a trough during a specific period), which is fundamental in understanding the risk profile of a trading strategy. In short, the profit and risk profile of a trading model must either outperform or be sufficiently appealing compared to other potential investments to justify its existence [
28].
Author Contributions
Conceptualization, L.T. and J.M.V.K.; methodology, L.T. and J.M.V.K.; software, L.T.; validation, L.T. and J.M.V.K.; formal analysis, L.T., J.M.V.K. and A.J.R.-S.; investigation, L.T., J.M.V.K. and D.M.-V.; resources, L.T. and P.J.E.-A.; data curation, L.T. and J.M.V.K.; writing—original draft preparation, L.T. and J.M.V.K.; writing—review and editing, L.T.; visualization, L.T., J.C.S.-R., E.R.-D. and D.M.-V.; supervision, P.J.E.-A. and A.J.R.-S.; project administration, J.C.S.-R. and E.R.-D. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
All research data are available in References at [
30,
31,
32,
33,
37,
38].
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Chan, E. Quantitative Trading: How to Build Your Own Algorithmic Trading Business; Wiley Trading; Wiley: Hoboken, NJ, USA, 2009; ISBN 9780470466261. [Google Scholar]
- Chan, E. Algorithmic Trading: Winning Strategies and Their Rationale; Wiley Trading; Wiley: Hoboken, NJ, USA, 2013; ISBN 9781118460146. [Google Scholar]
- Zimmermann, H. Intraday Trading with Neural Networks and Deep Reinforcement Learning; Imperial College London: London, UK, 2021. [Google Scholar]
- Maven. Machine Learning in Algorithmic Trading, Maven Securities. 2023. Available online: https://www.mavensecurities.com/machine-learning-in-algorithmic-trading/ (accessed on 17 May 2024).
- Spooner, T. Algorithmic Trading and Reinforcement Learning: Robust Methodologies for AI in Finance. Ph.D. Thesis, The University of Liverpool Repository, Liverpool, UK, 2021. Available online: https://livrepository.liverpool.ac.uk/3130139/ (accessed on 17 May 2024).
- Bellman, R. A Markovian Decision Process. J. Math. Mech. 1957, 6, 679–684. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep reinforcement learning with double Q-learning. arXiv 2016, arXiv:1509.06461. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Graves, A.; Antonoglou, I.; Wierstra, D.; Riedmiller, M. Playing Atari with deep reinforcement learning. arXiv 2015, arXiv:1312.5602. [Google Scholar]
- Zejnullahu, F.; Moser, M.; Osterrieder, J. Applications of reinforcement learning in Finance—Trading with a double deep Q-Network. arXiv 2022, arXiv:2206.14267. [Google Scholar]
- Aldridge, I. High-Frequency Trading: A Practical Guide to Algorithmic Strategies and Trading Systems; John Wiley & Sons: Hoboken, NJ, USA, 2013; Volume 604. [Google Scholar]
- Savcenko, K. The ‘A’ Factor: The Role of Algorithmic Trading during an Energy Crisis; S&P Global Commodity Insights: London, UK, 2022; Available online: https://www.spglobal.com/commodityinsights/en/market-insights/blogs/electric-power/110322-algorithm-trading-europe-energy-crisis (accessed on 25 July 2024).
- Fischer, T.G. Reinforcement Learning in Financial Markets—A Survey; FAU Discussion Papers in Economics; No. 12/2018; Friedrich-Alexander-Universität Erlangen-Nürnberg, Institute for Economics: Erlangen, Germany, 2018. [Google Scholar]
- Neuneier, R. Optimal asset allocation using adaptive dynamic programming. Advances in Neural Information Processing Systems. 1996, pp. 952–958. Available online: https://proceedings.neurips.cc/paper/1995/hash/3a15c7d0bbe60300a39f76f8a5ba6896-Abstract.html (accessed on 1 August 2024).
- Jin, O.; El-Saawy, H. Portfolio Management Using Reinforcement Learning; Working Paper; Stanford University: Stanford, CA, USA, 2016. [Google Scholar]
- Liang, Z.; Chen, H.; Zhu, J.; Jiang, K.; Li, Y. Adversarial deep reinforcement learning in portfolio management. arXiv 2018, arXiv:1808.09940. [Google Scholar]
- Ding, X.; Zhang, Y.; Liu, T.; Duan, J. Deep learning for event-driven stock prediction. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
- Zhu, K.; Liu, R. The selection of reinforcement learning state and value function applied to portfolio optimization. J. Fuzhou Univ. (Nat. Sci. Ed.) 2020, 48, 146–151. [Google Scholar]
- Dai, S.X.; Zhang, S.L. An application of reinforcement learning based approach to stock trading. Bus. Manag. 2021, 3, 23–27. [Google Scholar] [CrossRef]
- Ning, B.; Lin, F.H.T.; Jaimungal, S. Double deep Q-learning for optimal execution. arXiv 2018, arXiv:1812.06600. [Google Scholar] [CrossRef]
- Machine Learning Trading. Trading with Deep Reinforcement Learning. Dr Thomas Starke (2020) YouTube. Available online: https://www.youtube.com/watch?v=H-c49jQxGbs (accessed on 2 February 2024).
- Nevmyvaka, Y.; Feng, Y.; Kearns, M. Reinforcement learning for optimized trade execution. In Proceedings of the 23rd International Conference on Machine Learning, ACM, Pittsburgh, PA, USA, 25–29 June 2006; pp. 673–680. [Google Scholar]
- Bertoluzzo, F.; Corazza, M. Testing different reinforcement learning configurations for financial trading: Introduction and applications. Procedia Econ. Financ. 2012, 3, 68–77. [Google Scholar] [CrossRef]
- Eilers, D.; Dunis, C.L.; von Mettenheim, H.-J.; Breitner, M.H. Intelligent trading of seasonal effects: A decision support algorithm based on reinforcement learning. Decis. Support Syst. 2014, 64, 100–108. [Google Scholar] [CrossRef]
- Sherstov, A.A.; Stone, P. Three automated stock-trading agents: A comparative study. In Proceeedings of the International Workshop on Agent-Mediated Electronic Commerce; Springer: Berlin/Heidelberg, Germany, 2004; pp. 173–187. [Google Scholar]
- Kaur, S. Algorithmic Trading Using Sentiment Analysis and Reinforcement Learning; Working Paper; Stanford University: Stanford, CA, USA, 2017. [Google Scholar]
- Rong, Z.H. Deep reinforcement learning stock algorithm trading system application. J. Comput. Knowl. Technol. 2020, 16, 75–76. [Google Scholar]
- Li, Y.; Zhou, P.; Li, F.; Yang, X. An improved reinforcement learning model based on sentiment analysis. arXiv 2021, arXiv:2111.15354. [Google Scholar]
- Pardo, R. The Evaluation and Optimization of Trading Strategies; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Hu, G. Advancing algorithmic trading: A multi-technique enhancement of deep Q-network models. arXiv 2023, arXiv:2311.05743. [Google Scholar]
- Tesla, Inc. (TSLA). Stock Historical Prices Data—Yahoo Finance—finance.yahoo.com. Available online: https://finance.yahoo.com/quote/TSLA/history?p=TSLA (accessed on 2 April 2024).
- SEC.gov—EDGAR Full Text Search—sec.gov. Available online: https://www.sec.gov/edgar/search/#/q=(Annual%2520report)&dateRange=all&ciks=0001318605&entityName=Tesla%252C%2520Inc.%2520(TSLA)%2520(CIK%25200001318605) (accessed on 2 May 2024).
- Marketing Communications: Web//University of Notre Dame Loughran-McDonald master Dictionary W/Sentiment Word Lists//Software Repository for Accounting and Finance//University of Notre Dame, Software Repository for Accounting and Finance. Available online: https://sraf.nd.edu/loughranmcdonald-master-dictionary/ (accessed on 2 May 2024).
- Loughran-McDonald Master Dictionary w/Sentiment Word Lists//Software Repository for Accounting and Finance//University of Notre Dame—sraf.nd.edu. Available online: https://sraf.nd.edu/loughranmcdonald-master-dictionary/ (accessed on 2 May 2024).
- Brockman, G.; Cheung, V.; Pettersson, L.; Schneider, J.; Schulman, J.; Tang, J.; Zaremba, W. OpenAI Gym. arXiv 2016, arXiv:1606.01540. [Google Scholar]
- Carapuço, J.M.B. Reinforcement Learning Applied to Forex Trading, Scribd. 2017. Available online: https://www.scribd.com/document/449849827/Corrected-Thesis-JoaoMaria67923 (accessed on 12 February 2024).
- Young, T.W. Calmar ratio: A smoother tool. Futures 1991, 20, 40. [Google Scholar]
- Edgar Filing Documents for 0001564590-17-015705. Available online: https://www.sec.gov/Archives/edgar/data/1318605/000156459017015705/0001564590-17-015705-index.htm (accessed on 2 May 2024).
- Edgar Filing Documents for 0001564590-15-001031. Available online: https://www.sec.gov/Archives/edgar/data/1318605/000156459015001031/0001564590-15-001031-index.htm (accessed on 2 May 2024).
- Murphy, J.J. Technical Analysis of the Financial Markets: A Comprehensive Guide to Trading Methods and Applications; Penguin Publishing Group: New York, NY, USA, 1999; 228p, Available online: https://www.google.com/books/edition/_/5zhXEqdr_IcC?hl=en&gbpv=0 (accessed on 24 July 2024).
- Wilder, J.W., Jr. New Concepts in Technical Trading Systems; Trend Research: Edmonton, AB, Canada, 1978; 6p, Available online: https://archive.org/details/newconceptsintec00wild/page/n151/mode/2up (accessed on 24 July 2024).
- Jahn, M. What Is the Haurlan Index? Investopedia. 2022. Available online: https://www.investopedia.com/terms/h/haurlanindex.asp#:~:text=The%20Haurlan%20Index%20was%20developed,the%20New%20York%20Stock%20Exchange (accessed on 24 July 2024).
- Ushman, D. What Is the SMA Indicator (Simple Moving Average). TrendSpider Learning Center, 2023. Available online: https://trendspider.com/learning-center/what-is-the-sma-indicator-simple-moving-average/ (accessed on 24 July 2024).
- Livshin, I. Balance Of Power. Tech. Anal. Stock. Commod. 2001, 19, 18–32. Available online: https://c.mql5.com/forextsd/forum/90/balance_of_market_power.pdf (accessed on 24 July 2024).
- Mitchell, C. Aroon Oscillator: Definition, Calculation Formula, Trade Signals, Investopedia. 2022. Available online: https://www.investopedia.com/terms/a/aroonoscillator.asp (accessed on 24 July 2024).
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).