Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (73)

Search Parameters:
Keywords = partially observable Markov decision process (POMDP)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
40 pages, 33004 KB  
Article
Sampling-Based Path Planning and Semantic Navigation for Complex Large-Scale Environments
by Shakeeb Ahmad and James Sean Humbert
Robotics 2025, 14(11), 149; https://doi.org/10.3390/robotics14110149 - 24 Oct 2025
Viewed by 236
Abstract
This article proposes a multi-agent path planning and decision-making solution for high-tempo field robotic operations, such as search-and-rescue, in large-scale unstructured environments. As a representative example, the subterranean environments can span many kilometers and are loaded with challenges such as limited to no [...] Read more.
This article proposes a multi-agent path planning and decision-making solution for high-tempo field robotic operations, such as search-and-rescue, in large-scale unstructured environments. As a representative example, the subterranean environments can span many kilometers and are loaded with challenges such as limited to no communication, hazardous terrain, blocked passages due to collapses, and vertical structures. The time-sensitive nature of these operations inherently requires solutions that are reliably deployable in practice. Moreover, a human-supervised multi-robot team is required to ensure that mobility and cognitive capabilities of various agents are leveraged for efficiency of the mission. Therefore, this article attempts to propose a solution that is suited for both air and ground vehicles and is adapted well for information sharing between different agents. This article first details a sampling-based autonomous exploration solution that brings significant improvements with respect to the current state of the art. These improvements include relying on an occupancy grid-based sample-and-project solution to terrain assessment and formulating the solution-search problem as a constraint-satisfaction problem to further enhance the computational efficiency of the planner. In addition, the demonstration of the exploration planner by team MARBLE at the DARPA Subterranean Challenge finals is presented. The inevitable interaction of heterogeneous autonomous robots with human operators demands the use of common semantics for reasoning across the robot and human teams making use of different geometric map capabilities suited for their mobility and computational resources. To this end, the path planner is further extended to include semantic mapping and decision-making into the framework. Firstly, the proposed solution generates a semantic map of the exploration environment by labeling position history of a robot in the form of probability distributions of observations. The semantic reasoning solution uses higher-level cues from a semantic map in order to bias exploration behaviors toward a semantic of interest. This objective is achieved by using a particle filter to localize a robot on a given semantic map followed by a Partially Observable Markov Decision Process (POMDP)-based controller to guide the exploration direction of the sampling-based exploration planner. Hence, this article aims to bridge an understanding gap between human and a heterogeneous robotic team not just through a common-sense semantic map transfer among the agents but by also enabling a robot to make use of such information to guide its lower-level reasoning in case such abstract information is transferred to it. Full article
(This article belongs to the Special Issue Autonomous Robotics for Exploration)
Show Figures

Figure 1

27 pages, 2189 KB  
Article
Miss-Triggered Content Cache Replacement Under Partial Observability: Transformer-Decoder Q-Learning
by Hakho Kim, Teh-Jen Sun and Eui-Nam Huh
Mathematics 2025, 13(19), 3217; https://doi.org/10.3390/math13193217 - 7 Oct 2025
Viewed by 290
Abstract
Content delivery networks (CDNs) face steadily rising, uneven demand, straining heuristic cache replacement. Reinforcement learning (RL) is promising, but most work assumes a fully observable Markov Decision Process (MDP), unrealistic under delayed, partial, and noisy signals. We model cache replacement as a Partially [...] Read more.
Content delivery networks (CDNs) face steadily rising, uneven demand, straining heuristic cache replacement. Reinforcement learning (RL) is promising, but most work assumes a fully observable Markov Decision Process (MDP), unrealistic under delayed, partial, and noisy signals. We model cache replacement as a Partially Observable MDP (POMDP) and present the Miss-Triggered Cache Transformer (MTCT), a Transformer-decoder Q-learning agent that encodes recent histories with self-attention. MTCT invokes its policy only on cache misses to align compute with informative events and uses a delayed-hit reward to propagate information from hits. A compact, rank-based action set (12 actions by default) captures popularity–recency trade-offs with complexity independent of cache capacity. We evaluate MTCT on a real trace (MovieLens) and two synthetic workloads (Mandelbrot–Zipf, Pareto) against Adaptive Replacement Cache (ARC), Windowed TinyLFU (W-TinyLFU), classical heuristics, and Double Deep Q-Network (DDQN). MTCT achieves the best or statistically comparable cache-hit rates on most cache sizes; e.g., on MovieLens at M=600, it reaches 0.4703 (DDQN 0.4436, ARC 0.4513). Miss-triggered inference also lowers mean wall-clock time per episode; Transformer inference is well suited to modern hardware acceleration. Ablations support CL=50 and show that finer action grids improve stability and final accuracy. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

36 pages, 1495 KB  
Review
Decision-Making for Path Planning of Mobile Robots Under Uncertainty: A Review of Belief-Space Planning Simplifications
by Vineetha Malathi, Pramod Sreedharan, Rthuraj P R, Vyshnavi Anil Kumar, Anil Lal Sadasivan, Ganesha Udupa, Liam Pastorelli and Andrea Troppina
Robotics 2025, 14(9), 127; https://doi.org/10.3390/robotics14090127 - 15 Sep 2025
Viewed by 2100
Abstract
Uncertainty remains a central challenge in robotic navigation, exploration, and coordination. This paper examines how Partially Observable Markov Decision Processes (POMDPs) and their decentralized variants (Dec-POMDPs) provide a rigorous foundation for decision-making under partial observability across tasks such as Active Simultaneous Localization and [...] Read more.
Uncertainty remains a central challenge in robotic navigation, exploration, and coordination. This paper examines how Partially Observable Markov Decision Processes (POMDPs) and their decentralized variants (Dec-POMDPs) provide a rigorous foundation for decision-making under partial observability across tasks such as Active Simultaneous Localization and Mapping (A-SLAM), adaptive informative path planning, and multi-robot coordination. We review recent advances that integrate deep reinforcement learning (DRL) with POMDP formulations, highlighting improvements in scalability and adaptability as well as unresolved challenges of robustness, interpretability, and sim-to-real transfer. To complement learning-driven methods, we discuss emerging strategies that embed probabilistic reasoning directly into navigation, including belief-space planning, distributionally robust control formulations, and probabilistic graph models such as enhanced probabilistic roadmaps (PRMs) and Canadian Traveler Problem-based roadmaps. These approaches collectively demonstrate that uncertainty can be managed more effectively by coupling structured inference with data-driven adaptation. The survey concludes by outlining future research directions, emphasizing hybrid learning–planning architectures, neuro-symbolic reasoning, and socially aware navigation frameworks as critical steps toward resilient, transparent, and human-centered autonomy. Full article
(This article belongs to the Section Sensors and Control in Robotics)
Show Figures

Figure 1

24 pages, 1233 KB  
Article
DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing
by Juan Sun, Shubin Zhang and Xinjie Yu
Entropy 2025, 27(8), 855; https://doi.org/10.3390/e27080855 - 11 Aug 2025
Viewed by 637
Abstract
Age of Information (AoI) is a newly introduced metric that quantifies the freshness and timeliness of data, playing a crucial role in applications reliant on time-sensitive information. Minimizing AoI through optimal scheduling is challenging, especially in energy-constrained Internet of Things (IoT) networks. In [...] Read more.
Age of Information (AoI) is a newly introduced metric that quantifies the freshness and timeliness of data, playing a crucial role in applications reliant on time-sensitive information. Minimizing AoI through optimal scheduling is challenging, especially in energy-constrained Internet of Things (IoT) networks. In this work, we begin by analyzing a simplified cognitive radio network (CRN) where a single secondary user (SU) harvests RF energy from the primary user and transmits status update packets when the PU spectrum is available. Time is divided into equal time slots, and the SU performs either energy harvesting, spectrum sensing, or status update transmission in each slot. To optimize the AoI within the CRN, we formulate the sequential decision-making process as a partially observable Markov decision process (POMDP) and employ dynamic programming to determine optimal actions. Then, we extend our investigation to evaluate the long-term average weighted sum of AoIs for a multi-SU CRN. Unlike the single-SU scenario, decisions must be made regarding which SU performs sensing and which SU forwards the status update packs. Given the partially observable nature of the PU spectrum, we propose an enhanced Deep Q-Network (DQN) algorithm. Simulation results demonstrate that the proposed policies significantly outperform the myopic policy. Additionally, we analyze the effect of various parameter settings on system performance. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

27 pages, 1523 KB  
Article
Reinforcement Learning-Based Agricultural Fertilization and Irrigation Considering N2O Emissions and Uncertain Climate Variability
by Zhaoan Wang, Shaoping Xiao, Jun Wang, Ashwin Parab and Shivam Patel
AgriEngineering 2025, 7(8), 252; https://doi.org/10.3390/agriengineering7080252 - 7 Aug 2025
Cited by 2 | Viewed by 1329
Abstract
Nitrous oxide (N2O) emissions from agriculture are rising due to increased fertilizer use and intensive farming, posing a major challenge for climate mitigation. This study introduces a novel reinforcement learning (RL) framework to optimize farm management strategies that balance [...] Read more.
Nitrous oxide (N2O) emissions from agriculture are rising due to increased fertilizer use and intensive farming, posing a major challenge for climate mitigation. This study introduces a novel reinforcement learning (RL) framework to optimize farm management strategies that balance crop productivity with environmental impact, particularly N2O emissions. By modeling agricultural decision-making as a partially observable Markov decision process (POMDP), the framework accounts for uncertainties in environmental conditions and observational data. The approach integrates deep Q-learning with recurrent neural networks (RNNs) to train adaptive agents within a simulated farming environment. A Probabilistic Deep Learning (PDL) model was developed to estimate N2O emissions, achieving a high Prediction Interval Coverage Probability (PICP) of 0.937 within a 95% confidence interval on the available dataset. While the PDL model’s generalizability is currently constrained by the limited observational data, the RL framework itself is designed for broad applicability, capable of extending to diverse agricultural practices and environmental conditions. Results demonstrate that RL agents reduce N2O emissions without compromising yields, even under climatic variability. The framework’s flexibility allows for future integration of expanded datasets or alternative emission models, ensuring scalability as more field data becomes available. This work highlights the potential of artificial intelligence to advance climate-smart agriculture by simultaneously addressing productivity and sustainability goals in dynamic real-world settings. Full article
(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)
Show Figures

Figure 1

21 pages, 4738 KB  
Article
Research on Computation Offloading and Resource Allocation Strategy Based on MADDPG for Integrated Space–Air–Marine Network
by Haixiang Gao
Entropy 2025, 27(8), 803; https://doi.org/10.3390/e27080803 - 28 Jul 2025
Cited by 1 | Viewed by 826
Abstract
This paper investigates the problem of computation offloading and resource allocation in an integrated space–air–sea network based on unmanned aerial vehicle (UAV) and low Earth orbit (LEO) satellites supporting Maritime Internet of Things (M-IoT) devices. Considering the complex, dynamic environment comprising M-IoT devices, [...] Read more.
This paper investigates the problem of computation offloading and resource allocation in an integrated space–air–sea network based on unmanned aerial vehicle (UAV) and low Earth orbit (LEO) satellites supporting Maritime Internet of Things (M-IoT) devices. Considering the complex, dynamic environment comprising M-IoT devices, UAVs and LEO satellites, traditional optimization methods encounter significant limitations due to non-convexity and the combinatorial explosion in possible solutions. A multi-agent deep deterministic policy gradient (MADDPG)-based optimization algorithm is proposed to address these challenges. This algorithm is designed to minimize the total system costs, balancing energy consumption and latency through partial task offloading within a cloud–edge-device collaborative mobile edge computing (MEC) system. A comprehensive system model is proposed, with the problem formulated as a partially observable Markov decision process (POMDP) that integrates association control, power control, computing resource allocation, and task distribution. Each M-IoT device and UAV acts as an intelligent agent, collaboratively learning the optimal offloading strategies through a centralized training and decentralized execution framework inherent in the MADDPG. The numerical simulations validate the effectiveness of the proposed MADDPG-based approach, which demonstrates rapid convergence and significantly outperforms baseline methods, and indicate that the proposed MADDPG-based algorithm reduces the total system cost by 15–60% specifically. Full article
(This article belongs to the Special Issue Space-Air-Ground-Sea Integrated Communication Networks)
Show Figures

Figure 1

31 pages, 1576 KB  
Article
Joint Caching and Computation in UAV-Assisted Vehicle Networks via Multi-Agent Deep Reinforcement Learning
by Yuhua Wu, Yuchao Huang, Ziyou Wang and Changming Xu
Drones 2025, 9(7), 456; https://doi.org/10.3390/drones9070456 - 24 Jun 2025
Viewed by 1184
Abstract
Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network [...] Read more.
Intelligent Connected Vehicles (ICVs) impose stringent requirements on real-time computational services. However, limited onboard resources and the high latency of remote cloud servers restrict traditional solutions. Unmanned Aerial Vehicle (UAV)-assisted Mobile Edge Computing (MEC), which deploys computing and storage resources at the network edge, offers a promising solution. In UAV-assisted vehicular networks, jointly optimizing content and service caching, computation offloading, and UAV trajectories to maximize system performance is a critical challenge. This requires balancing system energy consumption and resource allocation fairness while maximizing cache hit rate and minimizing task latency. To this end, we introduce system efficiency as a unified metric, aiming to maximize overall system performance through joint optimization. This metric comprehensively considers cache hit rate, task computation latency, system energy consumption, and resource allocation fairness. The problem involves discrete decisions (caching, offloading) and continuous variables (UAV trajectories), exhibiting high dynamism and non-convexity, making it challenging for traditional optimization methods. Concurrently, existing multi-agent deep reinforcement learning (MADRL) methods often encounter training instability and convergence issues in such dynamic and non-stationary environments. To address these challenges, this paper proposes a MADRL-based joint optimization approach. We precisely model the problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and adopt the Multi-Agent Proximal Policy Optimization (MAPPO) algorithm, which follows the Centralized Training Decentralized Execution (CTDE) paradigm. Our method aims to maximize system efficiency by achieving a judicious balance among multiple performance metrics, such as cache hit rate, task delay, energy consumption, and fairness. Simulation results demonstrate that, compared to various representative baseline methods, the proposed MAPPO algorithm exhibits significant superiority in achieving higher cumulative rewards and an approximately 82% cache hit rate. Full article
Show Figures

Figure 1

20 pages, 690 KB  
Article
Using Graph-Enhanced Deep Reinforcement Learning for Distribution Network Fault Recovery
by Yueran Liu, Peng Liao and Yang Wang
Machines 2025, 13(7), 543; https://doi.org/10.3390/machines13070543 - 23 Jun 2025
Viewed by 982
Abstract
Fault recovery in distribution networks is a complex, high-dimensional decision-making task characterized by partial observability, dynamic topology, and strong interdependencies among components. To address these challenges, this paper proposes a graph-based multi-agent deep reinforcement learning (DRL) framework for intelligent fault restoration in power [...] Read more.
Fault recovery in distribution networks is a complex, high-dimensional decision-making task characterized by partial observability, dynamic topology, and strong interdependencies among components. To address these challenges, this paper proposes a graph-based multi-agent deep reinforcement learning (DRL) framework for intelligent fault restoration in power distribution networks. The restoration problem is modeled as a partially observable Markov decision process (POMDP), where each agent employs graph neural networks to extract topological features and enhance environmental perception. To address the high-dimensionality of the action space, an action decomposition strategy is introduced, treating each switch operation as an independent binary classification task, which improves convergence and decision efficiency. Furthermore, a collaborative reward mechanism is designed to promote coordination among agents and optimize global restoration performance. Experiments on the PG&E 69-bus system demonstrate that the proposed method significantly outperforms existing DRL baselines. Specifically, it achieves up to 2.6% higher load recovery, up to 0.0 p.u. lower recovery cost, and full restoration in the midday scenario, with statistically significant improvements (p<0.05 or p<0.01). These results highlight the effectiveness of graph-based learning and cooperative rewards in improving the resilience, efficiency, and adaptability of distribution network operations under varying conditions. Full article
(This article belongs to the Section Machines Testing and Maintenance)
Show Figures

Figure 1

24 pages, 19686 KB  
Article
Enhancing Geomagnetic Navigation with PPO-LSTM: Robust Navigation Utilizing Observed Geomagnetic Field Data
by Xiaohui Zhang, Wenqi Bai, Jun Liu, Songnan Yang, Ting Shang and Haolin Liu
Sensors 2025, 25(12), 3699; https://doi.org/10.3390/s25123699 - 13 Jun 2025
Cited by 1 | Viewed by 914
Abstract
Geospatial navigation in GPS-denied environments presents significant challenges, particularly for autonomous vehicles operating in complex, unmapped regions. We explore the Earth’s geomagnetic field, a globally distributed and naturally occurring resource, as a reliable alternative for navigation. Since vehicles can only observe the geomagnetic [...] Read more.
Geospatial navigation in GPS-denied environments presents significant challenges, particularly for autonomous vehicles operating in complex, unmapped regions. We explore the Earth’s geomagnetic field, a globally distributed and naturally occurring resource, as a reliable alternative for navigation. Since vehicles can only observe the geomagnetic field along their traversed paths, they must rely on incomplete information to infer the navigation strategy; therefore, we formulate the navigation problem as a partially observed Markov decision process (POMDP). To address this POMDP, we employ proximal policy optimization with long short-term memory (PPO-LSTM), a deep reinforcement learning framework that captures temporal dependencies and mitigates the effects of noise. Using real-world geomagnetic data from the international geomagnetic reference field (IGRF) model, we validate our approach through experiments under noisy conditions. The results demonstrate that PPO-LSTM outperforms baseline algorithms, achieving smoother trajectories and higher heading accuracy. This framework effectively handles the uncertainty and partial observability inherent in geomagnetic navigation, enabling robust policies that adapt to complex gradients and offering a robust solution for geospatial navigation. Full article
Show Figures

Figure 1

20 pages, 1778 KB  
Article
Energy Management for Distributed Carbon-Neutral Data Centers
by Wenting Chang, Chuyi Liu, Guanyu Ren and Jianxiong Wan
Energies 2025, 18(11), 2861; https://doi.org/10.3390/en18112861 - 30 May 2025
Cited by 1 | Viewed by 829
Abstract
With the continuous expansion of data centers, their carbon emission has become a serious issue. A number of studies are committing to reduce the carbon emission of data centers. Carbon trading, carbon capture, and power-to-gas technologies are promising emission reduction techniques which are, [...] Read more.
With the continuous expansion of data centers, their carbon emission has become a serious issue. A number of studies are committing to reduce the carbon emission of data centers. Carbon trading, carbon capture, and power-to-gas technologies are promising emission reduction techniques which are, however, seldom applied to data centers. To bridge this gap, we propose a carbon-neutral architecture for distributed data centers, where each data center consists of three subsystems, i.e., an energy subsystem for energy supply, thermal subsystem for data center cooling, and carbon subsystem for carbon trading. Then, we formulate the energy management problem as a Decentralized Partially Observable Markov Decision Process (Dec-POMDP) and develop a distributed solution framework using Multi-Agent Deep Deterministic Policy Gradient (MADDPG). Finally, simulations using real-world data show that a cost saving of 20.3% is provided. Full article
Show Figures

Figure 1

27 pages, 5560 KB  
Article
A Stackelberg Trust-Based Human–Robot Collaboration Framework for Warehouse Picking
by Yang Liu, Fuqiang Guo and Yan Ma
Systems 2025, 13(5), 348; https://doi.org/10.3390/systems13050348 - 3 May 2025
Cited by 1 | Viewed by 1281
Abstract
The warehouse picking process is one of the most critical components of logistics operations. Human–robot collaboration (HRC) is seen as an important trend in warehouse picking, as it combines the strengths of both humans and robots in the picking process. However, in current [...] Read more.
The warehouse picking process is one of the most critical components of logistics operations. Human–robot collaboration (HRC) is seen as an important trend in warehouse picking, as it combines the strengths of both humans and robots in the picking process. However, in current human–robot collaboration frameworks, there is a lack of effective communication between humans and robots, which results in inefficient task execution during the picking process. To address this, this paper considers trust as a communication bridge between humans and robots and proposes the Stackelberg trust-based human–robot collaboration framework for warehouse picking, aiming to achieve efficient and effective human–robot collaborative picking. In this framework, HRC with trust for warehouse picking is defined as the Partially Observable Stochastic Game (POSG) model. We model human fatigue with the logistic function and incorporate its impact on the efficiency reward function of the POSG. Based on the POSG model, belief space is used to assess human trust, and human strategies are formed. An iterative Stackelberg trust strategy generation (ISTSG) algorithm is designed to achieve the optimal long-term collaboration benefits between humans and robots, which is solved by the Bellman equation. The generated human–robot decision profile is formalized as a Partially Observable Markov Decision Process (POMDP), and the properties of human–robot collaboration are specified as PCTL (probabilistic computation tree logic) with rewards, such as efficiency, accuracy, trust, and human fatigue. The probabilistic model checker PRISM is exploited to verify and analyze the corresponding properties of the POMDP. We take the popular human–robot collaboration robot TORU as a case study. The experimental results show that our framework improves the efficiency of human–robot collaboration for warehouse picking and reduces worker fatigue while ensuring the required accuracy of human–robot collaboration. Full article
Show Figures

Figure 1

21 pages, 9553 KB  
Article
Assisted-Value Factorization with Latent Interaction in Cooperate Multi-Agent Reinforcement Learning
by Zhitong Zhao, Ya Zhang, Siying Wang, Yang Zhou, Ruoning Zhang and Wenyu Chen
Mathematics 2025, 13(9), 1429; https://doi.org/10.3390/math13091429 - 27 Apr 2025
Cited by 1 | Viewed by 816
Abstract
With the development of value decomposition methods, multi-agent reinforcement learning (MARL) has made significant progress in balancing autonomous decision making with collective cooperation. However, the collaborative dynamics among agents are continuously changing. The current value decomposition methods struggle to adeptly handle these dynamic [...] Read more.
With the development of value decomposition methods, multi-agent reinforcement learning (MARL) has made significant progress in balancing autonomous decision making with collective cooperation. However, the collaborative dynamics among agents are continuously changing. The current value decomposition methods struggle to adeptly handle these dynamic changes, thereby impairing the effectiveness of cooperative policies. In this paper, we introduce the concept of latent interaction, upon which an innovative method for generating weights is developed. The proposed method derives weights from the history information, thereby enhancing the accuracy of value estimations. Building upon this, we further propose a dynamic masking mechanism that recalibrates history information in response to the activity level of agents, improving the precision of latent interaction assessments. Experimental results demonstrate the improved training speed and superior performance of the proposed method in both a multi-agent particle environment and the StarCraft Multi-Agent Challenge. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

30 pages, 3310 KB  
Article
Enhancing Scalability and Network Efficiency in IOTA Tangle Networks: A POMDP-Based Tip Selection Algorithm
by Mays Alshaikhli, Somaya Al-Maadeed and Moutaz Saleh
Computers 2025, 14(4), 117; https://doi.org/10.3390/computers14040117 - 24 Mar 2025
Cited by 4 | Viewed by 1627
Abstract
The fairness problem in the IOTA (Internet of Things Application) Tangle network has significant implications for transaction efficiency, scalability, and security, particularly concerning orphan transactions and lazy tips. Traditional tip selection algorithms (TSAs) struggle to ensure fair tip selection, leading to inefficient transaction [...] Read more.
The fairness problem in the IOTA (Internet of Things Application) Tangle network has significant implications for transaction efficiency, scalability, and security, particularly concerning orphan transactions and lazy tips. Traditional tip selection algorithms (TSAs) struggle to ensure fair tip selection, leading to inefficient transaction confirmations and network congestion. This research proposes a novel partially observable Markov decision process (POMDP)-based TSA, which dynamically prioritizes tips with lower confirmation likelihood, reducing orphan transactions and enhancing network throughput. By leveraging probabilistic decision making and the Monte Carlo tree search, the proposed TSA efficiently selects tips based on long-term impact rather than immediate transaction weight. The algorithm is rigorously evaluated against seven existing TSAs, including Random Walk, Unweighted TSA, Weighted TSA, Hybrid TSA-1, Hybrid TSA-2, E-IOTA, and G-IOTA, under various network conditions. The experimental results demonstrate that the POMDP-based TSA achieves a confirmation rate of 89–94%, reduces the orphan tip rate to 1–5%, and completely eliminates lazy tips (0%). Additionally, the proposed method ensures stable scalability and high security resilience, making it a robust and efficient solution for decentralized ledger networks. These findings highlight the potential of reinforcement learning-driven TSAs to enhance fairness, efficiency, and robustness in DAG-based blockchain systems. This work paves the way for future research into adaptive and scalable consensus mechanisms for the IOTA Tangle. Full article
Show Figures

Figure 1

22 pages, 2176 KB  
Article
Deep Reinforcement Learning-Based Multi-Agent System with Advanced Actor–Critic Framework for Complex Environment
by Zihao Cui, Kailian Deng, Hongtao Zhang, Zhongyi Zha and Sayed Jobaer
Mathematics 2025, 13(5), 754; https://doi.org/10.3390/math13050754 - 25 Feb 2025
Cited by 2 | Viewed by 2673
Abstract
The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry. However, previous game agents have struggled to execute multiple commands [...] Read more.
The development of artificial intelligence (AI) game agents that use deep reinforcement learning (DRL) algorithms to process visual information for decision-making has emerged as a key research focus in both academia and industry. However, previous game agents have struggled to execute multiple commands simultaneously in a single decision, failing to accurately replicate the complex control patterns that characterize human gameplay. In this paper, we utilize the ViZDoom environment as the DRL research platform and transform the agent–environment interactions into a Partially Observable Markov Decision Process (POMDP). We introduce an advanced multi-agent deep reinforcement learning (DRL) framework, specifically a Multi-Agent Proximal Policy Optimization (MA-PPO), designed to optimize target acquisition while operating within defined ammunition and time constraints. In MA-PPO, each agent handles distinct parallel tasks with custom reward functions for performance evaluation. The agents make independent decisions while simultaneously executing multiple commands to mimic human-like gameplay behavior. Our evaluation compares MA-PPO against other DRL algorithms, showing a 30.67% performance improvement over the baseline algorithm. Full article
(This article belongs to the Special Issue Application of Machine Learning and Data Mining, 2nd Edition)
Show Figures

Figure 1

19 pages, 21354 KB  
Article
Asymmetric Deep Reinforcement Learning-Based Spacecraft Approaching Maneuver Under Unknown Disturbance
by Shibo Shao, Dong Zhou, Guanghui Sun, Weizhao Ma and Runran Deng
Aerospace 2025, 12(3), 170; https://doi.org/10.3390/aerospace12030170 - 20 Feb 2025
Viewed by 1230
Abstract
Spacecraft approaching maneuver control normally uses traditional control methods such as Proportional–Integral–Derivative (PID) or Model Predictive Control (MPC), which require meticulous system design and lack robustness against unknown disturbances. To address these limitations, we propose an end-to-end asymmetric Deep Reinforcement Learning-based (DRL) spacecraft [...] Read more.
Spacecraft approaching maneuver control normally uses traditional control methods such as Proportional–Integral–Derivative (PID) or Model Predictive Control (MPC), which require meticulous system design and lack robustness against unknown disturbances. To address these limitations, we propose an end-to-end asymmetric Deep Reinforcement Learning-based (DRL) spacecraft approaching maneuver (ADSAM) algorithm, which significantly enhances the robutsness of the space-approaching maneuver under large-scale unknown disturbance and Partial Observation Markov Decision Processes (POMDPs). We present a numerical simulation environment with the linear Clohessy–Wiltshire (CW) model, incorporating the Runge–Kutta 4th order method (RK4) to ensure a more accurate and efficient state transition. Experimental results also demonstrate that the effectiveness of the proposed algorithm outperforms the-state-of-the-art methods. Full article
(This article belongs to the Special Issue Space Navigation and Control Technologies)
Show Figures

Figure 1

Back to TopTop