Feature Paper Special Issue: Reinforcement Learning

A special issue of Stats (ISSN 2571-905X).

Deadline for manuscript submissions: closed (30 September 2024) | Viewed by 14952

Special Issue Editors


E-Mail Website
Guest Editor
Department of Applied Mathematics and Statistics, State University of New York at Stony Brook, Stony Brook, NY 11794, USA
Interests: biostatistics; quantitative finance; errors in variable regression (EIV); structural equation modeling (SEM); experimental design; statistical learning
Machine Learning Team, Upstart Network Inc., San Carlos, CA 94070, USA
Interests: interpretable machine learning; natural language processing; computational linguistics; reinforcement learning; applied machine learning
Special Issues, Collections and Topics in MDPI journals
College of Business, Stony Brook University, Stony Brook, NY 11794, USA
Interests: business analytics; data mining; real estate/urban computing; economic bubbles and crises; asset pricing

Special Issue Information

Dear Colleagues,

Machine-learning methods can be classified into three general categories: unsupervised learning, supervised learning and reinforcement learning. Of the three, reinforcement learning holds the promise to create artificial intelligence that can surpass human capacity in specialized tasks such as gaming (e.g., Google AlphaGo, AlphaGo Zero, AlphaZero) or research (e.g., Google AlphaFold). 

Reinforcement learning spans diverse academic areas including statistics, operations research, computational mathematics and computer science. We are organizing this Special Issue to help promote the development of this research frontier, and we are honored to feature incoming papers from leaders in this field including Professor Richard Sutton, co-author of the first textbook [1] in this field, and Professor Jiaqiao Hu, co-developer of the Monte Carlo tree search algorithm [2] that lies behind the success of the Google AlphaGo. We welcome colleagues from all related fields to contribute to this Special Issue as authors and/or Guest Editors. Accepted papers will be published sequentially without delay and without publication fee, to help advance this important research topic.

[1] Sutton, Richard S.; Barto, Andrew G. (2018). Reinforcement Learning: An Introduction (2 ed.). MIT Press. ISBN 978-0-262-03924-6.

[2] Chang, Hyeong Soo; Fu, Michael C.; Hu, Jiaqiao; Marcus, Steven I. (2005). "An Adaptive Sampling Algorithm for Solving Markov Decision Processes". Operations Research. 53: 126–139.

Procedure

All submissions will be rigorously reviewed according to the Stats journal guidelines.

Authors of manuscripts that are not suitable for this Special Issue will be notified as soon as after consultation with the Editorial Board Members. Authors of these manuscripts may still consider submitting in other Special Issues or as a regular paper. Other manuscripts will be forwarded for review.

Manuscripts that are not selected as feature papers will be notified after the first round of reviews. The selection will be based on the review. Authors of those manuscripts that are not selected for the Special Issue may decide to revise and submit as a regular paper in Stats. Please note that authors of these manuscripts need to shoulder the publication fees.

Other manuscripts will be sent for a second round of reviews. However, this does not necessarily mean that a manuscript under the second round of reviews will be published as a feature paper. We will still seek comments and suggestions from reviewers.

Prof. Dr. Wei Zhu
Dr. Sourav Sen
Dr. Keli Xiao
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Stats is an international peer-reviewed open access quarterly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • reinforcement learning
  • dynamic programming
  • Markov decision process
  • multi-agent system

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (5 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

11 pages, 3767 KiB  
Article
Point Cloud Registration via Heuristic Reward Reinforcement Learning
by Bingren Chen
Stats 2023, 6(1), 268-278; https://doi.org/10.3390/stats6010016 - 6 Feb 2023
Cited by 1 | Viewed by 2276
Abstract
This paper proposes a heuristic reward reinforcement learning framework for point cloud registration. As an essential step of many 3D computer vision tasks such as object recognition and 3D reconstruction, point cloud registration has been well studied in the existing literature. This paper [...] Read more.
This paper proposes a heuristic reward reinforcement learning framework for point cloud registration. As an essential step of many 3D computer vision tasks such as object recognition and 3D reconstruction, point cloud registration has been well studied in the existing literature. This paper contributes to the literature by addressing the limitations of embedding and reward functions in existing methods. An improved state-embedding module and a stochastic reward function are proposed. While the embedding module enriches the captured characteristics of states, the newly designed reward function follows a time-dependent searching strategy, which allows aggressive attempts at the beginning and tends to be conservative in the end. We assess our method based on two public datasets (ModelNet40 and ScanObjectNN) and real-world data. The results confirm the strength of the new method in reducing errors in object rotation and translation, leading to more precise point cloud registration. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

14 pages, 384 KiB  
Article
An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes
by Isa Muqattash and Jiaqiao Hu
Stats 2023, 6(1), 99-112; https://doi.org/10.3390/stats6010006 - 1 Jan 2023
Cited by 1 | Viewed by 1840
Abstract
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in [...] Read more.
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

17 pages, 745 KiB  
Article
Do Deep Reinforcement Learning Agents Model Intentions?
by Tambet Matiisen, Aqeel Labash, Daniel Majoral, Jaan Aru and Raul Vicente
Stats 2023, 6(1), 50-66; https://doi.org/10.3390/stats6010004 - 28 Dec 2022
Cited by 1 | Viewed by 2379
Abstract
Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each [...] Read more.
Inferring other agents’ mental states, such as their knowledge, beliefs and intentions, is thought to be essential for effective interactions with other agents. Recently, multi-agent systems trained via deep reinforcement learning have been shown to succeed in solving various tasks. Still, how each agent models or represents other agents in their environment remains unclear. In this work, we test whether deep reinforcement learning agents trained with the multi-agent deep deterministic policy gradient (MADDPG) algorithm explicitly represent other agents’ intentions (their specific aims or plans) during a task in which the agents have to coordinate the covering of different spots in a 2D environment. In particular, we tracked over time the performance of a linear decoder trained to predict the final targets of all agents from the hidden-layer activations of each agent’s neural network controller. We observed that the hidden layers of agents represented explicit information about other agents’ intentions, i.e., the target landmark the other agent ended up covering. We also performed a series of experiments in which some agents were replaced by others with fixed targets to test the levels of generalization of the trained agents. We noticed that during the training phase, the agents developed a preference for each landmark, which hindered generalization. To alleviate the above problem, we evaluated simple changes to the MADDPG training algorithm which lead to better generalization against unseen agents. Our method for confirming intention modeling in deep learning agents is simple to implement and can be used to improve the generalization of multi-agent systems in fields such as robotics, autonomous vehicles and smart cities. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

14 pages, 2977 KiB  
Article
Deriving the Optimal Strategy for the Two Dice Pig Game via Reinforcement Learning
by Tian Zhu and Merry H. Ma
Stats 2022, 5(3), 805-818; https://doi.org/10.3390/stats5030047 - 17 Aug 2022
Cited by 2 | Viewed by 3731
Abstract
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game [...] Read more.
Games of chance have historically played a critical role in the development and teaching of probability theory and game theory, and, in the modern age, computer programming and reinforcement learning. In this paper, we derive the optimal strategy for playing the two-dice game Pig, both the standard version and its variant with doubles, coined “Double-Trouble”, using certain fundamental concepts of reinforcement learning, especially the Markov decision process and dynamic programming. We further compare the newly derived optimal strategy to other popular play strategies in terms of the winning chances and the order of play. In particular, we compare to the popular “hold at n” strategy, which is considered to be close to the optimal strategy, especially for the best n, for each type of Pig Game. For the standard two-player, two-dice, sequential Pig Game examined here, we found that “hold at 23” is the best choice, with the average winning chance against the optimal strategy being 0.4747. For the “Double-Trouble” version, we found that the “hold at 18” is the best choice, with the average winning chance against the optimal strategy being 0.4733. Furthermore, time in terms of turns to play each type of game is also examined for practical purposes. For optimal vs. optimal or optimal vs. the best “hold at n” strategy, we found that the average number of turns is 19, 23, and 24 for one-die Pig, standard two-dice Pig, and the “Double-Trouble” two-dice Pig games, respectively. We hope our work will inspire students of all ages to invest in the field of reinforcement learning, which is crucial for the development of artificial intelligence and robotics and, subsequently, for the future of humanity. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

15 pages, 2751 KiB  
Article
Quantitative Trading through Random Perturbation Q-Network with Nonlinear Transaction Costs
by Tian Zhu and Wei Zhu
Stats 2022, 5(2), 546-560; https://doi.org/10.3390/stats5020033 - 10 Jun 2022
Cited by 7 | Viewed by 3958
Abstract
In recent years, reinforcement learning (RL) has seen increasing applications in the financial industry, especially in quantitative trading and portfolio optimization when the focus is on the long-term reward rather than short-term profit. Sequential decision making and Markov decision processes are rather suited [...] Read more.
In recent years, reinforcement learning (RL) has seen increasing applications in the financial industry, especially in quantitative trading and portfolio optimization when the focus is on the long-term reward rather than short-term profit. Sequential decision making and Markov decision processes are rather suited for this type of application. Through trial and error based on historical data, an agent can learn the characteristics of the market and evolve an algorithm to maximize the cumulative returns. In this work, we propose a novel RL trading algorithm utilizing random perturbation of the Q-network and account for the more realistic nonlinear transaction costs. In summary, we first design a new near-quadratic transaction cost function considering the slippage. Next, we develop a convolutional deep Q-learning network (CDQN) with multiple price input based on this cost functions. We further propose a random perturbation (rp) method to modify the learning network to solve the instability issue intrinsic to the deep Q-learning network. Finally, we use this newly developed CDQN-rp algorithm to make trading decisions based on the daily stock prices of Apple (AAPL), Meta (FB), and Bitcoin (BTC) and demonstrate its strengths over other quantitative trading methods. Full article
(This article belongs to the Special Issue Feature Paper Special Issue: Reinforcement Learning)
Show Figures

Figure 1

Back to TopTop