Markov Decision Processes with Applications

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "Probability and Statistics".

Deadline for manuscript submissions: closed (31 May 2024) | Viewed by 1247

Special Issue Editors

Department of Mathematics, Toronto Metropolitan University, 350 Victoria Street, Toronto, ON M5B 2K3, Canada
Interests: bandit processes; markov decision processes; dynamic data science; financial time series

E-Mail Website
Guest Editor
Warren Centre for Actuarial Studies and Research, I.H. Asper School of Business, University of Manitoba, Winnipeg, MB R3T 5V4, Canada
Interests: risk management; mathematical finance; dynamic pricing; Markov decision processes; response adaptive clinical trials (ethics, statistical design and analysis); sequential design of experiments; bandit processes; biostatistics; reinforcement learning

Special Issue Information

Dear Colleagues,

Markov decision process (MDP), also known as stochastic dynamic programming, is a mathematical framework for dynamic and sequential decision-making under stochastic uncertainty. MDPs have been investigated extensively and informed sequential decision-making in a variety of application areas including reinforcement learning, finance, inventory control, scheduling, and clinical trials, just to name a few. The purpose of this Special Issue is to collect and present some state-of-the-art developments in the theory, methodology and applications of MDPs.

Dr. You Liang
Prof. Dr. Xikui Wang
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Markov processes
  • stochastic control
  • dynamic programming
  • decision making
  • reinforcement learning

Published Papers (2 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 475 KiB  
Article
To Exit or Not to Exit: Cost-Effective Early-Exit Architecture Based on Markov Decision Process
by Kyu-Sik Kim and Hyun-Suk Lee
Mathematics 2024, 12(14), 2263; https://doi.org/10.3390/math12142263 - 19 Jul 2024
Viewed by 386
Abstract
Recently, studies on early-exit mechanisms have emerged to reduce the computational cost during the inference process of deep learning models. However, most existing early-exit architectures simply determine early exiting based only on a target confidence level in the prediction, without any consideration of [...] Read more.
Recently, studies on early-exit mechanisms have emerged to reduce the computational cost during the inference process of deep learning models. However, most existing early-exit architectures simply determine early exiting based only on a target confidence level in the prediction, without any consideration of the computational cost. Such an early-exit criterion fails to balance accuracy and cost, making it difficult to use in various environments. To address this problem, we propose a novel, cost-effective early-exit architecture in which an early-exit criterion is designed based on the Markov decision process (MDP). Since the early-exit decisions within an early-exit model are sequential, we model them as an MDP problem to maximize accuracy as much as possible while minimizing the computational cost. Then, we develop a cost-effective early-exit algorithm using reinforcement learning that solves the MDP problem. For each input sample, the algorithm dynamically makes early-exit decisions considering the relative importance of accuracy and computational cost in a given environment, thereby balancing the trade-off between accuracy and cost regardless of the environment. Consequently, it can be used in various environments, even in a resource-constrained environment. Through extensive experiments, we demonstrate that our proposed architecture can effectively balance the trade-off in different environments, while the existing architectures fail to do so since they focus only on reducing their cost while preventing the degradation of accuracy. Full article
(This article belongs to the Special Issue Markov Decision Processes with Applications)
Show Figures

Figure 1

34 pages, 76174 KiB  
Article
Cooperative Multi-Agent Reinforcement Learning for Data Gathering in Energy-Harvesting Wireless Sensor Networks
by Efi Dvir, Mark Shifrin and Omer Gurewitz
Mathematics 2024, 12(13), 2102; https://doi.org/10.3390/math12132102 - 4 Jul 2024
Viewed by 495
Abstract
This study introduces a novel approach to data gathering in energy-harvesting wireless sensor networks (EH-WSNs) utilizing cooperative multi-agent reinforcement learning (MARL). In addressing the challenges of efficient data collection in resource-constrained WSNs, we propose and examine a decentralized, autonomous communication framework where sensors [...] Read more.
This study introduces a novel approach to data gathering in energy-harvesting wireless sensor networks (EH-WSNs) utilizing cooperative multi-agent reinforcement learning (MARL). In addressing the challenges of efficient data collection in resource-constrained WSNs, we propose and examine a decentralized, autonomous communication framework where sensors function as individual agents. These agents employ an extended version of the Q-learning algorithm, tailored for a multi-agent setting, enabling independent learning and adaptation of their data transmission strategies. We introduce therein a specialized ϵ-p-greedy exploration method which is well suited for MAS settings. The key objective of our approach is the maximization of report flow, aligning with specific applicative goals for these networks. Our model operates under varying energy constraints and dynamic environments, with each sensor making decisions based on interactions within the network, devoid of explicit inter-sensor communication. The focus is on optimizing the frequency and efficiency of data report delivery to a central collection point, taking into account the unique attributes of each sensor. Notably, our findings present a surprising result: despite the known challenges of Q-learning in MARL, such as non-stationarity and the lack of guaranteed convergence to optimality due to multi-agent related pathologies, the cooperative nature of the MARL protocol in our study obtains high network performance. We present simulations and analyze key aspects contributing to coordination in various scenarios. A noteworthy feature of our system is its perpetual learning capability, which fosters network adaptiveness in response to changes such as sensor malfunctions or new sensor integrations. This dynamic adaptability ensures sustained and effective resource utilization, even as network conditions evolve. Our research lays grounds for learning-based WSNs and offers vital insights into the application of MARL in real-world EH-WSN scenarios, underscoring its effectiveness in navigating the intricate challenges of large-scale, resource-limited sensor networks. Full article
(This article belongs to the Special Issue Markov Decision Processes with Applications)
Show Figures

Figure 1

Back to TopTop