Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (454)

Search Parameters:
Keywords = deep reinforcement learning (DRL) method

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3341 KB  
Article
Research on Dynamic Energy Management Optimization of Park Integrated Energy System Based on Deep Reinforcement Learning
by Xinjian Jiang, Lei Zhang, Fuwang Li, Zhiru Li, Zhijian Ling and Zhenghui Zhao
Energies 2025, 18(19), 5172; https://doi.org/10.3390/en18195172 - 29 Sep 2025
Abstract
Under the background of energy transition, the Integrated Energy System (IES) of the park has become a key carrier for enhancing the consumption capacity of renewable energy due to its multi-energy complementary characteristics. However, the high proportion of wind and solar resource access [...] Read more.
Under the background of energy transition, the Integrated Energy System (IES) of the park has become a key carrier for enhancing the consumption capacity of renewable energy due to its multi-energy complementary characteristics. However, the high proportion of wind and solar resource access and the fluctuation of diverse loads have led to the system facing dual uncertainty challenges, and traditional optimization methods are difficult to adapt to the dynamic and complex dispatching requirements. To this end, this paper proposes a new dynamic energy management method based on Deep Reinforcement Learning (DRL) and constructs an IES hybrid integer nonlinear programming model including wind power, photovoltaic, combined heat and power generation, and storage of electric heat energy, with the goal of minimizing the operating cost of the system. By expressing the dispatching process as a Markov decision process, a state space covering wind and solar output, multiple loads and energy storage states is defined, a continuous action space for unit output and energy storage control is constructed, and a reward function integrating economic cost and the penalty for renewable energy consumption is designed. The Deep Deterministic Policy Gradient (DDPG) and Deep Q-Network (DQN) algorithms were adopted to achieve policy optimization. This study is based on simulation rather than experimental validation, which aligns with the exploratory scope of this research. The simulation results show that the DDPG algorithm achieves an average weekly operating cost of 532,424 yuan in the continuous action space scheduling, which is 8.6% lower than that of the DQN algorithm, and the standard deviation of the cost is reduced by 19.5%, indicating better robustness. Under the fluctuation of 10% to 30% on the source-load side, the DQN algorithm still maintains a cost fluctuation of less than 4.5%, highlighting the strong adaptability of DRL to uncertain environments. Therefore, this method has significant theoretical and practical value for promoting the intelligent transformation of the energy system. Full article
Show Figures

Figure 1

22 pages, 17573 KB  
Article
Robust UAV Path Planning Using RSS in GPS-Denied and Dense Environments Based on Deep Reinforcement Learning
by Kyounghun Kim, Joonho Seon, Jinwook Kim, Jeongho Kim, Youngghyu Sun, Seongwoo Lee, Soohyun Kim, Byungsun Hwang, Mingyu Lee and Jinyoung Kim
Electronics 2025, 14(19), 3844; https://doi.org/10.3390/electronics14193844 - 28 Sep 2025
Abstract
A wide range of research has been conducted on path planning and collision avoidance to enhance the operational efficiency of unmanned aerial vehicles (UAVs). The existing works have mainly assumed an environment with static obstacles and global positioning system (GPS) signals. However, practical [...] Read more.
A wide range of research has been conducted on path planning and collision avoidance to enhance the operational efficiency of unmanned aerial vehicles (UAVs). The existing works have mainly assumed an environment with static obstacles and global positioning system (GPS) signals. However, practical environments have often been involved with dynamic obstacles, dense areas with numerous obstacles in confined spaces, and blocked GPS signals. In order to consider these issues for practical implementation, a deep reinforcement learning (DRL)-based method is proposed for path planning and collision avoidance in GPS-denied and dense environments. In the proposed method, robust path planning and collision avoidance can be conducted by using the received signal strength (RSS) value with the extended Kalman filter (EKF). Additionally, the attitude of the UAV is adopted as part of the action space to enable the generation of smooth trajectories. Performance was evaluated under single- and multi-target scenarios with numerous dynamic obstacles. Simulation results demonstrated that the proposed method can generate smoother trajectories and shorter path lengths while consistently maintaining a lower collision rate compared to conventional methods. Full article
Show Figures

Figure 1

26 pages, 3838 KB  
Article
DRL-Based UAV Autonomous Navigation and Obstacle Avoidance with LiDAR and Depth Camera Fusion
by Bangsong Lei, Wei Hu, Zhaoxu Ren and Shude Ji
Aerospace 2025, 12(9), 848; https://doi.org/10.3390/aerospace12090848 - 20 Sep 2025
Viewed by 483
Abstract
With the growing application of unmanned aerial vehicles (UAVs) in complex, stochastic environments, autonomous navigation and obstacle avoidance represent critical technical challenges requiring urgent solutions. This study proposes an innovative deep reinforcement learning (DRL) framework that leverages multimodal perception through the fusion of [...] Read more.
With the growing application of unmanned aerial vehicles (UAVs) in complex, stochastic environments, autonomous navigation and obstacle avoidance represent critical technical challenges requiring urgent solutions. This study proposes an innovative deep reinforcement learning (DRL) framework that leverages multimodal perception through the fusion of LiDAR and depth camera data. A sophisticated multi-sensor data preprocessing mechanism is designed to extract multimodal features, significantly enhancing the UAV’s situational awareness and adaptability in intricate, stochastic environments. In the high-level decision-maker of the framework, to overcome the intrinsic limitation of low sample efficiency in DRL algorithms, this study introduces an advanced decision-making algorithm, Soft Actor-Critic with Prioritization (SAC-P), which markedly accelerates model convergence and enhances training stability through optimized sample selection and utilization strategies. Validated within a high-fidelity Robot Operating System (ROS) and Gazebo simulation environment, the proposed framework achieved a task success rate of 81.23% in comparative evaluations, surpassing all baseline methods. Notably, in generalization tests conducted in previously unseen and highly complex environments, it maintained a success rate of 72.08%, confirming its robust and efficient navigation and obstacle avoidance capabilities in complex, densely cluttered environments with stochastic obstacle distributions. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

15 pages, 891 KB  
Article
Reinforced Model Predictive Guidance and Control for Spacecraft Proximity Operations
by Lorenzo Capra, Andrea Brandonisio and Michèle Roberta Lavagna
Aerospace 2025, 12(9), 837; https://doi.org/10.3390/aerospace12090837 - 17 Sep 2025
Viewed by 324
Abstract
An increased level of autonomy is attractive above all in the framework of proximity operations, and researchers are focusing more and more on artificial intelligence techniques to improve spacecraft’s capabilities in these scenarios. This work presents an autonomous AI-based guidance algorithm to plan [...] Read more.
An increased level of autonomy is attractive above all in the framework of proximity operations, and researchers are focusing more and more on artificial intelligence techniques to improve spacecraft’s capabilities in these scenarios. This work presents an autonomous AI-based guidance algorithm to plan the path of a chaser spacecraft for the map reconstruction of an artificial uncooperative target, coupled with Model Predictive Control for the tracking of the generated trajectory. Deep reinforcement learning is particularly interesting for enabling spacecraft’s autonomous guidance, since this problem can be formulated as a Partially Observable Markov Decision Process and because it leverages domain randomization well to cope with model uncertainty, thanks to the neural networks’ generalizing capabilities. The main drawback of this method is that it is difficult to verify its optimality mathematically and the constraints can be added only as part of the reward function, so it is not guaranteed that the solution satisfies them. To this end a convex Model Predictive Control formulation is employed to track the DRL-based trajectory, while simultaneously enforcing compliance with the constraints. Two neural network architectures are proposed and compared: a recurrent one and the more recent transformer. The trained reinforcement learning agent is then tested in an end-to-end AI-based pipeline with image generation in the loop, and the results are presented. The computational effort of the entire guidance and control strategy is also verified on a Raspberry Pi board. This work represents a viable solution to apply artificial intelligence methods for spacecraft’s autonomous motion, still retaining a higher level of explainability and safety than that given by more classical guidance and control approaches. Full article
Show Figures

Figure 1

36 pages, 1495 KB  
Review
Decision-Making for Path Planning of Mobile Robots Under Uncertainty: A Review of Belief-Space Planning Simplifications
by Vineetha Malathi, Pramod Sreedharan, Rthuraj P R, Vyshnavi Anil Kumar, Anil Lal Sadasivan, Ganesha Udupa, Liam Pastorelli and Andrea Troppina
Robotics 2025, 14(9), 127; https://doi.org/10.3390/robotics14090127 - 15 Sep 2025
Viewed by 958
Abstract
Uncertainty remains a central challenge in robotic navigation, exploration, and coordination. This paper examines how Partially Observable Markov Decision Processes (POMDPs) and their decentralized variants (Dec-POMDPs) provide a rigorous foundation for decision-making under partial observability across tasks such as Active Simultaneous Localization and [...] Read more.
Uncertainty remains a central challenge in robotic navigation, exploration, and coordination. This paper examines how Partially Observable Markov Decision Processes (POMDPs) and their decentralized variants (Dec-POMDPs) provide a rigorous foundation for decision-making under partial observability across tasks such as Active Simultaneous Localization and Mapping (A-SLAM), adaptive informative path planning, and multi-robot coordination. We review recent advances that integrate deep reinforcement learning (DRL) with POMDP formulations, highlighting improvements in scalability and adaptability as well as unresolved challenges of robustness, interpretability, and sim-to-real transfer. To complement learning-driven methods, we discuss emerging strategies that embed probabilistic reasoning directly into navigation, including belief-space planning, distributionally robust control formulations, and probabilistic graph models such as enhanced probabilistic roadmaps (PRMs) and Canadian Traveler Problem-based roadmaps. These approaches collectively demonstrate that uncertainty can be managed more effectively by coupling structured inference with data-driven adaptation. The survey concludes by outlining future research directions, emphasizing hybrid learning–planning architectures, neuro-symbolic reasoning, and socially aware navigation frameworks as critical steps toward resilient, transparent, and human-centered autonomy. Full article
(This article belongs to the Section Sensors and Control in Robotics)
Show Figures

Figure 1

18 pages, 780 KB  
Article
Multi-Source Energy Storage Day-Ahead and Intra-Day Scheduling Based on Deep Reinforcement Learning with Attention Mechanism
by Enren Liu, Song Gao, Xiaodi Chen, Jun Li, Yuntao Sun and Meng Zhang
Appl. Sci. 2025, 15(18), 10031; https://doi.org/10.3390/app151810031 - 14 Sep 2025
Viewed by 632
Abstract
With the rapid integration of high-penetration renewable energy, its inherent uncertainty complicates power system day-ahead/intra-day scheduling, leading to challenges like wind curtailment and high operational costs. Existing methods either rely on inflexible physical models or use deep reinforcement learning (DRL) without prioritizing critical [...] Read more.
With the rapid integration of high-penetration renewable energy, its inherent uncertainty complicates power system day-ahead/intra-day scheduling, leading to challenges like wind curtailment and high operational costs. Existing methods either rely on inflexible physical models or use deep reinforcement learning (DRL) without prioritizing critical variables or synergizing multi-source energy storage and demand response (DR). This study develops a multi-time scale coordination scheduling framework to balance cost minimization and renewable energy utilization, with strong adaptability to real-time uncertainties. The framework integrates a day-ahead optimization model and an intra-day rolling model powered by an attention-enhanced DRL Actor–Critic network—where the attention mechanism dynamically focuses on critical variables to correct real-time deviations. Validated on an East China regional grid, the framework significantly enhances renewable energy absorption and system flexibility, providing a robust technical solution for the economical and stable operation of high-renewable power systems. Full article
(This article belongs to the Special Issue Control and Security of Industrial Cyber–Physical Systems)
Show Figures

Figure 1

27 pages, 1701 KB  
Article
A DRL Framework for Autonomous Pursuit-Evasion: From Multi-Spacecraft to Multi-Drone Scenarios
by Zhenyang Xu, Shuyi Shao and Zengliang Han
Drones 2025, 9(9), 636; https://doi.org/10.3390/drones9090636 - 10 Sep 2025
Viewed by 436
Abstract
To address the challenges of autonomous pursuit-evasion in aerospace, particularly in achieving cross-domain generalizability and handling complex terminal constraints, this paper proposes a generalizable deep reinforcement learning (DRL) framework. The core of the method is a self-play Proximal Policy Optimization (PPO) architecture enhanced [...] Read more.
To address the challenges of autonomous pursuit-evasion in aerospace, particularly in achieving cross-domain generalizability and handling complex terminal constraints, this paper proposes a generalizable deep reinforcement learning (DRL) framework. The core of the method is a self-play Proximal Policy Optimization (PPO) architecture enhanced by two key innovations. First, a dynamics-agnostic curriculum learning (CL) strategy is employed to accelerate training and enhance policy robustness by structuring the learning process from simple to complex. Second, a transferable prediction-based reward function is designed to provide dense, forward-looking guidance, utilizing forward-state projection to effectively satisfy mission-specific terminal conditions. Comprehensive simulations were conducted in both multi-spacecraft and multi-drone scenarios. In the primary spacecraft validation, the proposed method achieved a 90.7% success rate, significantly outperforming baseline algorithms like traditional PPO and Soft Actor-Critic (SAC). Furthermore, it demonstrated superior robustness, with a performance drop of only 8.3% under stochastic perturbations, a stark contrast to the over 18% degradation seen in baseline methods. The successful application in a multi-drone scenario, including an obstacle-rich environment, confirms the framework’s potential as a unified and robust solution for diverse autonomous adversarial systems. Full article
Show Figures

Figure 1

16 pages, 2943 KB  
Article
Robust Collision Avoidance for ASVs Using Deep Reinforcement Learning with Sim2Real Methods in Static Obstacle Environments
by Changgyu Han, Sekil Park and Joohyun Woo
J. Mar. Sci. Eng. 2025, 13(9), 1727; https://doi.org/10.3390/jmse13091727 - 8 Sep 2025
Viewed by 353
Abstract
When a policy trained with deep reinforcement learning (DRL) in simulation is deployed in the real world, its performance often deteriorates due to the Sim2Real gap. This study addresses this problem for Autonomous Surface Vessels (ASVs) by developing a robust collision-avoidance framework. We [...] Read more.
When a policy trained with deep reinforcement learning (DRL) in simulation is deployed in the real world, its performance often deteriorates due to the Sim2Real gap. This study addresses this problem for Autonomous Surface Vessels (ASVs) by developing a robust collision-avoidance framework. We integrate a MATLAB-based ship dynamics model with ROS and Gazebo, and employ the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. To enhance robustness and generalization, we combine domain randomization and curriculum learning. As a result, the trained agent consistently achieved a high success rate of over 90% in unseen environments, significantly outperforming a baseline TD3 agent and a conventional PID controller. This demonstrates that the proposed Sim2Real methods are highly effective for creating robust control policies for ASVs. For future work, we plan to validate the learned policy through real-world experiments. Full article
Show Figures

Figure 1

26 pages, 1127 KB  
Article
LSTM-Enhanced TD3 and Behavior Cloning for UAV Trajectory Tracking Control
by Yuanhang Qi, Jintao Hu, Fujie Wang and Gewen Huang
Biomimetics 2025, 10(9), 591; https://doi.org/10.3390/biomimetics10090591 - 4 Sep 2025
Viewed by 579
Abstract
Unmanned aerial vehicles (UAVs) often face significant challenges in trajectory tracking within complex dynamic environments, where uncertainties, external disturbances, and nonlinear dynamics hinder accurate and stable control. To address this issue, a bio-inspired deep reinforcement learning (DRL) algorithm is proposed, integrating behavior cloning [...] Read more.
Unmanned aerial vehicles (UAVs) often face significant challenges in trajectory tracking within complex dynamic environments, where uncertainties, external disturbances, and nonlinear dynamics hinder accurate and stable control. To address this issue, a bio-inspired deep reinforcement learning (DRL) algorithm is proposed, integrating behavior cloning (BC) and long short-term memory (LSTM) networks. This method can achieve autonomous learning of high-precision control policy without establishing an accurate system dynamics model. Motivated by the memory and prediction functions of biological neural systems, an LSTM module is embedded into the policy network of the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm. This structure captures temporal state patterns more effectively, enhancing adaptability to trajectory variations and resilience to delays or disturbances. Compared to memoryless networks, the LSTM-based design better replicates biological time-series processing, improving tracking stability and accuracy. In addition, behavior cloning is employed to pre-train the DRL policy using expert demonstrations, mimicking the way animals learn from observation. This biomimetic plausible initialization accelerates convergence by reducing inefficient early-stage exploration. By combining offline imitation with online learning, the TD3-LSTM-BC framework balances expert guidance and adaptive optimization, analogous to innate and experience-based learning in nature. Simulation experimental results confirm the superior robustness and tracking accuracy of the proposed method, demonstrating its potential as a control solution for autonomous UAVs. Full article
(This article belongs to the Special Issue Bio-Inspired Robotics and Applications 2025)
Show Figures

Figure 1

23 pages, 3818 KB  
Article
Energy Regulation-Aware Layered Control Architecture for Building Energy Systems Using Constraint-Aware Deep Reinforcement Learning and Virtual Energy Storage Modeling
by Siwei Li, Congxiang Tian and Ahmed N. Abdalla
Energies 2025, 18(17), 4698; https://doi.org/10.3390/en18174698 - 4 Sep 2025
Viewed by 742
Abstract
In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, [...] Read more.
In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, and strict constraint satisfaction required for real-world energy systems. To address these challenges, this paper proposes an energy policy-aware layered control architecture that combines Virtual Energy Storage System (VESS) modeling with a novel Dynamic Constraint-Aware Policy Optimization (DCPO) algorithm. The VESS is modeled based on the thermal inertia of building envelope components, quantifying flexibility in terms of virtual power, capacity, and state of charge, thus enabling BES to behave as if it had embedded, non-physical energy storage. Building on this, the BES control problem is structured using a hierarchical Markov Decision Process, in which the upper level handles strategic decisions (e.g., VESS dispatch, HVAC modes), while the lower level manages real-time control (e.g., temperature adjustments, load balancing). The proposed DCPO algorithm extends actor–critic learning by incorporating dynamic policy constraints, entropy regularization, and adaptive clipping to ensure feasible and efficient policy learning under both operational and comfort-related constraints. Simulation experiments demonstrate that the proposed approach outperforms established algorithms like Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Specifically, it achieves a 32.6% reduction in operational costs and over a 51% decrease in thermal comfort violations compared to DQN, while ensuring millisecond-level policy generation suitable for real-time BES deployment. Full article
(This article belongs to the Section C: Energy Economics and Policy)
Show Figures

Figure 1

25 pages, 1388 KB  
Article
Multi-Agent Deep Reinforcement Learning-Based HVAC and Electrochromic Window Control Framework
by Hongjian Chen, Duoyu Sun, Yuyu Sun, Yong Zhang and Huan Yang
Buildings 2025, 15(17), 3114; https://doi.org/10.3390/buildings15173114 - 31 Aug 2025
Viewed by 695
Abstract
Deep reinforcement learning (DRL)-based HVAC control has shown clear advantages over rule-based and model predictive methods. However, most prior studies remain limited to HVAC-only optimization or simple coordination with operable windows. Such approaches do not adequately address buildings with fixed glazing systems—a common [...] Read more.
Deep reinforcement learning (DRL)-based HVAC control has shown clear advantages over rule-based and model predictive methods. However, most prior studies remain limited to HVAC-only optimization or simple coordination with operable windows. Such approaches do not adequately address buildings with fixed glazing systems—a common feature in high-rise offices—where the lack of operable windows restricts adaptive envelope interaction. To address this gap, this study proposes a multi-zone control framework that integrates HVAC systems with electrochromic windows (ECWs). The framework leverages the Q-value Mixing (QMIX) algorithm to dynamically coordinate ECW transmittance with HVAC setpoints, aiming to enhance energy efficiency and thermal comfort, particularly in high-consumption buildings such as offices. Its performance is evaluated using EnergyPlus simulations. The results show that the proposed approach reduces HVAC energy use by 19.8% compared to the DQN-based HVAC-only control and by 40.28% relative to conventional rule-based control (RBC). In comparison with leading multi-agent deep reinforcement learning (MADRL) algorithms, including MADQN, VDN, and MAPPO, the framework reduces HVAC energy consumption by 1–5% and maintains a thermal comfort violation rate (TCVR) of less than 1% with an average temperature variation of 0.35 °C. Moreover, the model demonstrates strong generalizability, achieving 16.58–58.12% energy savings across six distinct climatic regions—ranging from tropical (Singapore) to temperate (Beijing)—with up to 48.2% savings observed in Chengdu. Our framework indicates the potential of coordinating HVAC systems with ECWs in simulation, while also identifying limitations that need to be addressed for real-world deployment. Full article
Show Figures

Figure 1

27 pages, 4949 KB  
Article
Resolving the Classic Resource Allocation Conflict in On-Ramp Merging: A Regionally Coordinated Nash-Advantage Decomposition Deep Q-Network Approach for Connected and Automated Vehicles
by Linning Li and Lili Lu
Sustainability 2025, 17(17), 7826; https://doi.org/10.3390/su17177826 - 30 Aug 2025
Viewed by 515
Abstract
To improve the traffic efficiency of connected and automated vehicles (CAVs) in on-ramp merging areas, this study proposes a novel region-level multi-agent reinforcement learning framework, Regionally Coordinated Nash-Advantage Decomposition Deep Q-Network with Conflict-Aware Q Fusion (RC-NashAD-DQN). Unlike existing vehicle-level control methods, which suffer [...] Read more.
To improve the traffic efficiency of connected and automated vehicles (CAVs) in on-ramp merging areas, this study proposes a novel region-level multi-agent reinforcement learning framework, Regionally Coordinated Nash-Advantage Decomposition Deep Q-Network with Conflict-Aware Q Fusion (RC-NashAD-DQN). Unlike existing vehicle-level control methods, which suffer from high computational overhead and poor scalability, our approach abstracts on-ramp and main road areas as region-level control agents, achieving coordinated yet independent decision-making while maintaining control precision and merging efficiency comparable to fine-grained vehicle-level approaches. Each agent adopts a value–advantage decomposition architecture to enhance policy stability and distinguish action values, while sharing state–action information to improve inter-agent awareness. A Nash equilibrium solver is applied to derive joint strategies, and a conflict-aware Q-fusion mechanism is introduced as a regularization term rather than a direct action-selection tool, enabling the system to resolve local conflicts—particularly at region boundaries—without compromising global coordination. This design reduces training complexity, accelerates convergence, and improves robustness against communication imperfections. The framework is evaluated using the SUMO simulator at the Taishan Road interchange on the S1 Yongtaiwen Expressway under heterogeneous traffic conditions involving both passenger cars and container trucks, and is compared with baseline models including C-DRL-VSL and MADDPG. Extensive simulations demonstrate that RC-NashAD-DQN significantly improves average traffic speed by 17.07% and reduces average delay by 12.68 s, outperforming all baselines in efficiency metrics while maintaining robust convergence performance. These improvements enhance cooperation and merging efficiency among vehicles, contributing to sustainable urban mobility and the advancement of intelligent transportation systems. Full article
Show Figures

Figure 1

49 pages, 1694 KB  
Review
Analysis of Deep Reinforcement Learning Algorithms for Task Offloading and Resource Allocation in Fog Computing Environments
by Endris Mohammed Ali, Jemal Abawajy, Frezewd Lemma and Samira A. Baho
Sensors 2025, 25(17), 5286; https://doi.org/10.3390/s25175286 - 25 Aug 2025
Viewed by 1256
Abstract
Fog computing is increasingly preferred over cloud computing for processing tasks from Internet of Things (IoT) devices with limited resources. However, placing tasks and allocating resources in distributed and dynamic fog environments remains a major challenge, especially when trying to meet strict Quality [...] Read more.
Fog computing is increasingly preferred over cloud computing for processing tasks from Internet of Things (IoT) devices with limited resources. However, placing tasks and allocating resources in distributed and dynamic fog environments remains a major challenge, especially when trying to meet strict Quality of Service (QoS) requirements. Deep reinforcement learning (DRL) has emerged as a promising solution to these challenges, offering adaptive, data-driven decision-making in real-time and uncertain conditions. While several surveys have explored DRL in fog computing, most focus on traditional centralized offloading approaches or emphasize reinforcement learning (RL) with limited integration of deep learning. To address this gap, this paper presents a comprehensive and focused survey on the full-scale application of DRL to the task offloading problem in fog computing environments involving multiple user devices and multiple fog nodes. We systematically analyze and classify the literature based on architecture, resource allocation methods, QoS objectives, offloading topology and control, optimization strategies, DRL techniques used, and application scenarios. We also introduce a taxonomy of DRL-based task offloading models and highlight key challenges, open issues, and future research directions. This survey serves as a valuable resource for researchers by identifying unexplored areas and suggesting new directions for advancing DRL-based solutions in fog computing. For practitioners, it provides insights into selecting suitable DRL techniques and system designs to implement scalable, efficient, and QoS-aware fog computing applications in real-world environments. Full article
(This article belongs to the Section Sensor Networks)
Show Figures

Figure 1

16 pages, 4253 KB  
Article
Collision Avoidance of Multi-UUV Systems Based on Deep Reinforcement Learning in Complex Marine Environments
by Fuyu Cao, Hongli Xu, Jingyu Ru, Zhengqi Li, Haopeng Zhang and Hao Liu
J. Mar. Sci. Eng. 2025, 13(9), 1615; https://doi.org/10.3390/jmse13091615 - 24 Aug 2025
Viewed by 431
Abstract
For multiple unmanned underwater vehicles (UUVs) systems, obstacle avoidance during cooperative operation in complex marine environments remains a challenging issue. Recent studies demonstrate the effectiveness of deep reinforcement learning (DRL) for obstacle avoidance in unknown marine environments. However, existing methods struggle in marine [...] Read more.
For multiple unmanned underwater vehicles (UUVs) systems, obstacle avoidance during cooperative operation in complex marine environments remains a challenging issue. Recent studies demonstrate the effectiveness of deep reinforcement learning (DRL) for obstacle avoidance in unknown marine environments. However, existing methods struggle in marine environments with complex non-convex obstacles, especially during multi-UUV cooperative operation, as they typically simplify environmental obstacles to convex shapes with sparse distributions and ignore the dynamic coupling between cooperative operation and collision avoidance. To address these limitations, we propose a centralized training with decentralized execution framework with a novel multi-agent dynamic encoder based on an efficient self-attention mechanism. The framework, to our knowledge, is the first to dynamically process observations from an arbitrary number of neighbors that effectively addresses multi-UUV collision avoidance in marine environments with complex non-convex obstacles while satisfying additional constraints derived from cooperative operation. Experimental results show that the proposed method effectively avoids obstacles and satisfies cooperative constraints in both simulated and real-world scenarios with complex non-convex obstacles. Our method outperforms typical collision avoidance baselines and enables policy transfer from simulation to real-world scenarios without additional training, demonstrating practical application potential. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

21 pages, 6890 KB  
Article
SOAR-RL: Safe and Open-Space Aware Reinforcement Learning for Mobile Robot Navigation in Narrow Spaces
by Minkyung Jun, Piljae Park and Hoeryong Jung
Sensors 2025, 25(17), 5236; https://doi.org/10.3390/s25175236 - 22 Aug 2025
Viewed by 1026
Abstract
As human–robot shared service environments become increasingly common, autonomous navigation in narrow space environments (NSEs), such as indoor corridors and crosswalks, becomes challenging. Mobile robots must go beyond reactive collision avoidance and interpret surrounding risks to proactively select safer routes in dynamic and [...] Read more.
As human–robot shared service environments become increasingly common, autonomous navigation in narrow space environments (NSEs), such as indoor corridors and crosswalks, becomes challenging. Mobile robots must go beyond reactive collision avoidance and interpret surrounding risks to proactively select safer routes in dynamic and spatially constrained environments. This study proposes a deep reinforcement learning (DRL)-based navigation framework that enables mobile robots to interact with pedestrians while identifying and traversing open and safe spaces. The framework fuses 3D LiDAR and RGB camera data to recognize individual pedestrians and estimate their position and velocity in real time. Based on this, a human-aware occupancy map (HAOM) is constructed, combining both static obstacles and dynamic risk zones, and used as the input state for DRL. To promote proactive and safe navigation behaviors, we design a state representation and reward structure that guide the robot toward less risky areas, overcoming the limitations of traditional approaches. The proposed method is validated through a series of simulation experiments, including straight, L-shaped, and cross-shaped layouts, designed to reflect typical narrow space environments. Various dynamic obstacle scenarios were incorporated during both training and evaluation. The results demonstrate that the proposed approach significantly improves navigation success rates and reduces collision incidents compared to conventional navigation planners across diverse NSE conditions. Full article
(This article belongs to the Section Navigation and Positioning)
Show Figures

Figure 1

Back to TopTop