Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (228)

Search Parameters:
Keywords = proximal policy optimization (PPO)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 1515 KB  
Article
From Key Role to Core Infrastructure: Platforms as AI Enablers in Hospitality Management
by Antonio Grieco, Pierpaolo Caricato and Paolo Margiotta
Platforms 2025, 3(3), 16; https://doi.org/10.3390/platforms3030016 (registering DOI) - 4 Sep 2025
Abstract
The increasing complexity of managing maintenance activities across geographically dispersed hospitality facilities necessitates advanced digital solutions capable of effectively balancing operational costs and service quality. This study addresses this challenge by designing and validating an intelligent Prescriptive Maintenance module, leveraging advanced Reinforcement Learning [...] Read more.
The increasing complexity of managing maintenance activities across geographically dispersed hospitality facilities necessitates advanced digital solutions capable of effectively balancing operational costs and service quality. This study addresses this challenge by designing and validating an intelligent Prescriptive Maintenance module, leveraging advanced Reinforcement Learning (RL) techniques within a Digital Twin (DT) infrastructure, specifically tailored for luxury hospitality networks characterized by high standards and demanding operational constraints. The proposed framework is based on an RL agent trained through Proximal Policy Optimization (PPO), which allows the system to dynamically prescribe preventive and corrective maintenance interventions. By adopting such an AI-driven approach, platforms are the enablers to minimize service disruptions, optimize operational efficiency, and proactively manage resources in dynamic and extended operational contexts. Experimental validation highlights the potential of the developed solution to significantly enhance resource allocation strategies and operational planning compared to traditional preventive approaches, particularly under varying resource availability conditions. By providing a comprehensive and generalizable representation model of maintenance management, this study delivers valuable insights for both researchers and industry practitioners aiming to leverage digital transformation and AI for sustainable and resilient hospitality operations. Full article
Show Figures

Figure 1

21 pages, 2516 KB  
Article
Risk-Aware Reinforcement Learning with Dynamic Safety Filter for Collision Risk Mitigation in Mobile Robot Navigation
by Bingbing Guo, Guina Wang, Yiyang Chen, Yue Gao and Qian Xie
Sensors 2025, 25(17), 5488; https://doi.org/10.3390/s25175488 - 3 Sep 2025
Abstract
Mobile robots face collision risk avoidance challenges in dynamic environments, necessitating that we address the safety and adaptability shortcomings of traditional navigation methods. Traditional methods rely on predefined rules, making it difficult to achieve flexible, safe, and real-time obstacle avoidance in complex, dynamic [...] Read more.
Mobile robots face collision risk avoidance challenges in dynamic environments, necessitating that we address the safety and adaptability shortcomings of traditional navigation methods. Traditional methods rely on predefined rules, making it difficult to achieve flexible, safe, and real-time obstacle avoidance in complex, dynamic environments. To address this issue, a risk-aware, dynamic, adaptive regulation barrier policy optimization (RADAR-BPO) method is proposed, combining proximal policy optimization (PPO) with the control barrier function (CBF). RADAR-BPO generates exploratory actions using PPO and constructs a real-time safety filter using the CBF. This method uses quadratic programming to minimize risky actions, thereby ensuring safe obstacle avoidance while maintaining navigation efficiency. Testing of three phased learning environments in the ROS Gazebo simulation environment demonstrated that the proposed method achieves an obstacle avoidance success rate of nearly 90% in complex, dynamic, multi-obstacle environments and improves the overall mission success rate, validating its robustness and effectiveness in complex dynamic scenarios. Full article
(This article belongs to the Special Issue Indoor Localization Technologies and Applications)
Show Figures

Figure 1

31 pages, 7088 KB  
Article
Cascade Hydropower Plant Operational Dispatch Control Using Deep Reinforcement Learning on a Digital Twin Environment
by Erik Rot Weiss, Robert Gselman, Rudi Polner and Riko Šafarič
Energies 2025, 18(17), 4660; https://doi.org/10.3390/en18174660 - 2 Sep 2025
Abstract
In this work, we propose the use of a reinforcement learning (RL) agent for the control of a cascade hydropower plant system. Generally, this job is handled by power plant dispatchers who manually adjust power plant electricity production to meet the changing demand [...] Read more.
In this work, we propose the use of a reinforcement learning (RL) agent for the control of a cascade hydropower plant system. Generally, this job is handled by power plant dispatchers who manually adjust power plant electricity production to meet the changing demand set by energy traders. This work explores the more fundamental problem with the cascade hydropower plant operation of flow control for power production in a highly nonlinear setting on a data-based digital twin. Using deep deterministic policy gradient (DDPG), twin delayed DDPG (TD3), soft actor-critic (SAC), and proximal policy optimization (PPO) algorithms, we can generalize the characteristics of the system and determine the human dispatcher level of control of the entire system of eight hydropower plants on the river Drava in Slovenia. The creation of an RL agent that makes decisions similar to a human dispatcher is not only interesting in terms of control but also in terms of long-term decision-making analysis in an ever-changing energy portfolio. The specific novelty of this work is in training an RL agent on an accurate testing environment of eight real-world cascade hydropower plants on the river Drava in Slovenia and comparing the agent’s performance to human dispatchers. The results show that the RL agent’s absolute mean error of 7.64 MW is comparable to the general human dispatcher’s absolute mean error of 5.8 MW at a peak installed power of 591.95 MW. Full article
Show Figures

Figure 1

35 pages, 2863 KB  
Article
DeepSIGNAL-ITS—Deep Learning Signal Intelligence for Adaptive Traffic Signal Control in Intelligent Transportation Systems
by Mirabela Melinda Medvei, Alin-Viorel Bordei, Ștefania Loredana Niță and Nicolae Țăpuș
Appl. Sci. 2025, 15(17), 9396; https://doi.org/10.3390/app15179396 - 27 Aug 2025
Viewed by 477
Abstract
Urban traffic congestion remains a major contributor to vehicle emissions and travel inefficiency, prompting the need for adaptive and intelligent traffic management systems. In response, we introduce DeepSIGNAL-ITS (Deep Learning Signal Intelligence for Adaptive Lights in Intelligent Transportation Systems), a unified framework that [...] Read more.
Urban traffic congestion remains a major contributor to vehicle emissions and travel inefficiency, prompting the need for adaptive and intelligent traffic management systems. In response, we introduce DeepSIGNAL-ITS (Deep Learning Signal Intelligence for Adaptive Lights in Intelligent Transportation Systems), a unified framework that leverages real-time traffic perception and learning-based control to optimize signal timing and reduce congestion. The system integrates vehicle detection via the YOLOv8 architecture at roadside units (RSUs) and manages signal control using Proximal Policy Optimization (PPO), guided by global traffic indicators such as accumulated vehicle waiting time. Secure communication between RSUs and cloud infrastructure is ensured through Transport Layer Security (TLS)-encrypted data exchange. We validate the framework through extensive simulations in SUMO across diverse urban settings. Simulation results show an average 30.20% reduction in vehicle waiting time at signalized intersections compared to baseline fixed-time configurations derived from OpenStreetMap (OSM). Furthermore, emissions assessed via the HBEFA-based model in SUMO reveal measurable reductions across pollutant categories, underscoring the framework’s dual potential to improve both traffic efficiency and environmental sustainability in simulated urban environments. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

24 pages, 8688 KB  
Article
Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO
by Meimei Su, Haochen Chai, Chunhui Zhao, Yang Lyu and Jinwen Hu
Drones 2025, 9(9), 598; https://doi.org/10.3390/drones9090598 - 26 Aug 2025
Viewed by 639
Abstract
Obstacle avoidance during high-speed, low-altitude flight remains a significant challenge for unmanned aerial vehicles (UAVs), particularly in unfamiliar environments where prior maps and heavy onboard sensors are unavailable. To address this, we present an entropy-aware deep reinforcement learning framework that enables fixed-wing UAVs [...] Read more.
Obstacle avoidance during high-speed, low-altitude flight remains a significant challenge for unmanned aerial vehicles (UAVs), particularly in unfamiliar environments where prior maps and heavy onboard sensors are unavailable. To address this, we present an entropy-aware deep reinforcement learning framework that enables fixed-wing UAVs to navigate safely using only monocular onboard cameras. Our system features a lightweight, single-frame depth estimation module optimized for real-time execution on edge computing platforms, followed by a reinforcement learning controller equipped with a novel reward function that balances goal-reaching performance with path smoothness under fixed-wing dynamic constraints. To enhance policy optimization, we incorporate high-quality experiences from the replay buffer into the gradient computation, introducing a soft imitation mechanism that encourages the agent to align its behavior with previously successful actions. To further balance exploration and exploitation, we integrate an adaptive entropy regularization mechanism into the Proximal Policy Optimization (PPO) algorithm. This module dynamically adjusts policy entropy during training, leading to improved stability, faster convergence, and better generalization to unseen scenarios. Extensive software-in-the-loop (SITL) and hardware-in-the-loop (HITL) experiments demonstrate that our approach outperforms baseline methods in obstacle avoidance success rate and path quality, while remaining lightweight and deployable on resource-constrained aerial platforms. Full article
Show Figures

Figure 1

25 pages, 11784 KB  
Article
Improved PPO Optimization for Robotic Arm Grasping Trajectory Planning and Real-Robot Migration
by Chunlei Li, Zhe Liu, Liang Li, Zeyu Ji, Chenbo Li, Jiaxing Liang and Yafeng Li
Sensors 2025, 25(17), 5253; https://doi.org/10.3390/s25175253 - 23 Aug 2025
Viewed by 695
Abstract
Addressing key challenges in unstructured environments, including local optimum traps, limited real-time interaction, and convergence difficulties, this research pioneers a hybrid reinforcement learning approach that combines simulated annealing (SA) with proximal policy optimization (PPO) for robotic arm trajectory planning. The framework enables the [...] Read more.
Addressing key challenges in unstructured environments, including local optimum traps, limited real-time interaction, and convergence difficulties, this research pioneers a hybrid reinforcement learning approach that combines simulated annealing (SA) with proximal policy optimization (PPO) for robotic arm trajectory planning. The framework enables the accurate, collision-free grasping of randomly appearing objects in dynamic obstacles through three key innovations: a probabilistically enhanced simulation environment with a 20% obstacle generation rate; an optimized state-action space featuring 12-dimensional environment coding and 6-DoF joint control; and an SA-PPO algorithm that dynamically adjusts the learning rate to balance exploration and convergence. Experimental results show a 6.52% increase in success rate (98% vs. 92%) and a 7.14% reduction in steps per set compared to the baseline PPO. A real deployment on the AUBO-i5 robotic arm enables real machine grasping, validating a robust transfer from simulation to reality. This work establishes a new paradigm for adaptive robot manipulation in industrial scenarios requiring a real-time response to environmental uncertainty. Full article
Show Figures

Figure 1

24 pages, 11782 KB  
Article
Research on Joint Game-Theoretic Modeling of Network Attack and Defense Under Incomplete Information
by Yifan Wang, Xiaojian Liu and Xuejun Yu
Entropy 2025, 27(9), 892; https://doi.org/10.3390/e27090892 - 23 Aug 2025
Viewed by 403
Abstract
In the face of increasingly severe cybersecurity threats, incomplete information and environmental dynamics have become central challenges in network attack–defense scenarios. In real-world network environments, defenders often find it difficult to fully perceive attack behaviors and network states, leading to a high degree [...] Read more.
In the face of increasingly severe cybersecurity threats, incomplete information and environmental dynamics have become central challenges in network attack–defense scenarios. In real-world network environments, defenders often find it difficult to fully perceive attack behaviors and network states, leading to a high degree of uncertainty in the system. Traditional approaches are inadequate in dealing with the diversification of attack strategies and the dynamic evolution of network structures, making it difficult to achieve highly adaptive defense strategies and efficient multi-agent coordination. To address these challenges, this paper proposes a multi-agent network defense approach based on joint game modeling, termed JG-Defense (Joint Game-based Defense), which aims to enhance the efficiency and robustness of defense decision-making in environments characterized by incomplete information. The method integrates Bayesian game theory, graph neural networks, and a proximal policy optimization framework, and it introduces two core mechanisms. First, a Dynamic Communication Graph Neural Network (DCGNN) is used to model the dynamic network structure, improving the perception of topological changes and attack evolution trends. A multi-agent communication mechanism is incorporated within the DCGNN to enable the sharing of local observations and strategy coordination, thereby enhancing global consistency. Second, a joint game loss function is constructed to embed the game equilibrium objective into the reinforcement learning process, optimizing both the rationality and long-term benefit of agent strategies. Experimental results demonstrate that JG-Defense outperforms the Cybermonic model by 15.83% in overall defense performance. Furthermore, under the traditional PPO loss function, the DCGNN model improves defense performance by 11.81% compared to the Cybermonic model. These results verify that the proposed integrated approach achieves superior global strategy coordination in dynamic attack–defense scenarios with incomplete information. Full article
(This article belongs to the Section Multidisciplinary Applications)
Show Figures

Figure 1

24 pages, 11770 KB  
Article
Secure Communication and Resource Allocation in Double-RIS Cooperative-Aided UAV-MEC Networks
by Xi Hu, Hongchao Zhao, Dongyang He and Wujie Zhang
Drones 2025, 9(8), 587; https://doi.org/10.3390/drones9080587 - 19 Aug 2025
Viewed by 368
Abstract
In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC [...] Read more.
In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC optimization scheme, leveraging their joint reflection to build multi-dimensional signal paths, boosting legitimate link gains while suppressing eavesdropping channels. It considers double-RIS phase shifts, ground user (GU) transmission power, UAV trajectories, resource allocation, and receiving beamforming, aiming to maximize secure energy efficiency (EE) while ensuring long-term stability of GU and UAV task queues. Given random task arrivals and high-dimensional variable coupling, a dynamic model integrating queue stability and secure transmission constraints is built using Lyapunov optimization, transforming long-term stochastic optimization into slot-by-slot deterministic decisions via the drift-plus-penalty method. To handle high-dimensional continuous spaces, an end-to-end proximal policy optimization (PPO) framework is designed for online learning of multi-dimensional resource allocation and direct acquisition of joint optimization strategies. Simulation results show that compared with benchmark schemes (e.g., single RIS, non-cooperative double RIS) and reinforcement learning algorithms (e.g., advantage actor–critic (A2C), deep deterministic policy gradient (DDPG), deep Q-network (DQN)), the proposed scheme achieves significant improvements in secure EE and queue stability, with faster convergence and better optimization effects, fully verifying its superiority and robustness in complex scenarios. Full article
(This article belongs to the Section Drone Communications)
Show Figures

Figure 1

52 pages, 15058 KB  
Article
Optimizing Autonomous Vehicle Navigation Through Reinforcement Learning in Dynamic Urban Environments
by Mohammed Abdullah Alsuwaiket
World Electr. Veh. J. 2025, 16(8), 472; https://doi.org/10.3390/wevj16080472 - 18 Aug 2025
Viewed by 635
Abstract
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates [...] Read more.
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates Proximal Policy Optimization (PPO) and Graph Neural Networks (GNNs) to effectively model urban features like intersections, traffic density, and pedestrian zones. A key innovation is the urban context-aware reward mechanism (UCARM), which dynamically adapts the reward structure based on traffic rules, congestion levels, and safety considerations. Additionally, the framework incorporates a Dynamic Risk Assessment Module (DRAM), which uses Bayesian inference combined with Markov Decision Processes (MDPs) to proactively evaluate collision risks and guide safer navigation. The framework’s performance was validated across three datasets—Argoverse, nuScenes, and CARLA. Results demonstrate significant improvements: An average travel time of 420 ± 20 s, a collision rate of 3.1%, and energy consumption of 11,833 ± 550 J in Argoverse; 410 ± 20 s, 2.5%, and 11,933 ± 450 J in nuScenes; and 450 ± 25 s, 3.6%, and 13,000 ± 600 J in CARLA. The proposed method achieved an average navigation success rate of 92.5%, consistently outperforming baseline models in safety, efficiency, and adaptability. These findings indicate the framework’s robustness and practical applicability for scalable AV deployment in real-world urban traffic conditions. Full article
(This article belongs to the Special Issue Modeling for Intelligent Vehicles)
Show Figures

Figure 1

20 pages, 2083 KB  
Article
Maritime Mobile Edge Computing for Sporadic Tasks: A PPO-Based Dynamic Offloading Strategy
by Yanglong Sun, Wenqian Luo, Zhiping Xu, Bo Lin, Weijian Xu and Weipeng Liu
Mathematics 2025, 13(16), 2643; https://doi.org/10.3390/math13162643 - 17 Aug 2025
Viewed by 325
Abstract
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and [...] Read more.
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and delays. This paper proposes a scheduling and offloading strategy tailored for MMEC scenarios driven by object detection tasks, which explicitly considers (1) the hierarchical structure of object detection models, and (2) the sporadic nature of maritime observation tasks. To minimize average task completion time under varying task arrival patterns, we formulate the average blocking delay minimization problem as a Markov Decision Process (MDP). Then, we propose an Orthogonalization-Normalization Proximal Policy Optimization (ON-PPO) algorithm, in which task category states are orthogonally encoded and system states are normalized. Experiments demonstrate that ON-PPO effectively learns policy parameters, mitigates interference between tasks of different categories during training, and adapts efficiently to sporadic task arrivals. Simulation results show that, compared to baseline algorithms, ON-PPO maintains stable task queues and achieves a 22.9% reduction in average task latency. Full article
Show Figures

Figure 1

28 pages, 2383 KB  
Article
CIM-LP: A Credibility-Aware Incentive Mechanism Based on Long Short-Term Memory and Proximal Policy Optimization for Mobile Crowdsensing
by Sijia Mu and Huahong Ma
Electronics 2025, 14(16), 3233; https://doi.org/10.3390/electronics14163233 - 14 Aug 2025
Viewed by 243
Abstract
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other [...] Read more.
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other areas. However, the enthusiasm of participants and the quality of uploaded data directly affect the reliability and practical value of the sensing results. Therefore, the design of incentive mechanisms has become a core issue in driving the healthy operation of MCS. The existing research, when optimizing long-term utility rewards for participants, has often failed to fully consider dynamic changes in trustworthiness. It has typically relied on historical data from a single point in time, overlooking the long-term dependencies in the time series, which results in suboptimal decision-making and limits the overall efficiency and fairness of sensing tasks. To address this issue, a credibility-aware incentive mechanism based on long short-term memory and proximal policy optimization (CIM-LP) is proposed. The mechanism employs a Markov decision process (MDP) model to describe the decision-making process of the participants. Without access to global information, an incentive model combining long short-term memory (LSTM) networks and proximal policy optimization (PPO), collectively referred to as LSTM-PPO, is utilized to formulate the most reasonable and effective sensing duration strategy for each participant, aiming to maximize the utility reward. After task completion, the participants’ credibility is dynamically updated by evaluating the quality of the uploaded data, which then adjusts their utility rewards for the next phase. Simulation results based on real datasets show that compared with several existing incentive algorithms, the CIM-LP mechanism increases the average utility of the participants by 6.56% to 112.76% and the task completion rate by 16.25% to 128.71%, demonstrating its significant advantages in improving data quality and task completion efficiency. Full article
Show Figures

Figure 1

30 pages, 3877 KB  
Article
Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption
by Juhyang Lee, Youngseo Park, Jeongon Eom, Hungyu Hwang and Sewon Kim
J. Mar. Sci. Eng. 2025, 13(8), 1554; https://doi.org/10.3390/jmse13081554 - 13 Aug 2025
Viewed by 501
Abstract
As the IMO and the EU strengthen carbon emission regulations, eco-friendly voyage planning is increasingly recognized by ship owners as one of the most important performance factors of the vessel fleet. The eco-friendly voyage planning aims to reduce carbon emissions and fuel consumption [...] Read more.
As the IMO and the EU strengthen carbon emission regulations, eco-friendly voyage planning is increasingly recognized by ship owners as one of the most important performance factors of the vessel fleet. The eco-friendly voyage planning aims to reduce carbon emissions and fuel consumption while satisfying voyage constraints. In this study, a novel route waypoint optimization method is proposed, which combines a fuel consumption forecasting model based on the Transformer and a Proximal Policy Optimization (PPO) algorithm for adaptive waypoint planning. The developed framework suggests a multi-objective methodology unlike the traditional approaches where a single objective is sought after, which characterizes fuel efficiency against navigational safety and operational simplicity. The methodology consists of three sequential phases. First, the transformer model is employed to predict ship fuel consumption using navigational and environmental data. Next, the predicted consumption values are utilized as a reward function in a PPO-based reinforcement learning framework to generate fuel-efficient routes. Finally, the number and placement of waypoints are further optimized with respect to terrain and bathymetric constraints, improving the practicality and safety of the navigational plan. The results show that the proposed method could decrease average fuel consumption by up to 11.33% across three real-world case studies: Busan–Rotterdam, Busan–Los Angeles, and Mokpo–Houston, compared to AIS-based routes. The transformer model outperformed Long Short-Term Memory (LSTM) and Random Forest baselines with the highest prediction accuracy, achieving an R2 score of 86.75%. This study is the first to incorporate transformer-based forecasting into reinforcement learning for maritime route planning and demonstrates how the method adaptively controls waypoint density in response to environmental and geographical conditions. These results support the practical application of the approach in smart ship navigation systems aligned with IMO’s decarbonization goals. Full article
(This article belongs to the Special Issue Intelligent Solutions for Marine Operations)
Show Figures

Figure 1

22 pages, 896 KB  
Article
Dynamic Jamming Policy Generation for Netted Radars Using Hybrid Policy Network
by Wanbing Hao, Wentao Ke, Xiaoyi Feng and Zhaoqiang Xia
Appl. Sci. 2025, 15(16), 8898; https://doi.org/10.3390/app15168898 - 12 Aug 2025
Viewed by 260
Abstract
Radar jamming resource allocation is crucial for maximizing jamming effectiveness and ensuring operational superiority in complex electromagnetic environments. However, the existing approaches still sufferfrom inefficiency, instability, and suboptimal global solutions. To address these issues, this work proposes addressing effective jamming resource allocation in [...] Read more.
Radar jamming resource allocation is crucial for maximizing jamming effectiveness and ensuring operational superiority in complex electromagnetic environments. However, the existing approaches still sufferfrom inefficiency, instability, and suboptimal global solutions. To address these issues, this work proposes addressing effective jamming resource allocation in dynamic radar countermeasures with multiple jamming types. A deep reinforcement learning framework is designed to jointly optimize transceiver strategies for adaptive jamming under state-switching scenarios. In this framework, a hybrid policy network is proposed to coordinate beam selection and power allocation, while a dynamic fusion metric is integrated to evaluate jamming effectiveness. Then the non-convex optimization is resolved via a proximal policy optimization version 2 (PPO2)-driven iterative algorithm. Experiments demonstrate that the proposed method achieves superior convergence speed and reduced power consumption compared to baseline methods, ensuring robust jamming performance against eavesdroppers under stringent resource constraints. Full article
(This article belongs to the Section Applied Physics General)
Show Figures

Figure 1

24 pages, 1390 KB  
Article
Dependent Task Graph Offloading Model Based on Deep Reinforcement Learning in Mobile Edge Computing
by Ruxin Guo, Lunyu Zhou, Linzhi Li, Yuhui Song and Xiaolan Xie
Electronics 2025, 14(16), 3184; https://doi.org/10.3390/electronics14163184 - 10 Aug 2025
Viewed by 421
Abstract
Mobile edge computing (MEC) has emerged as a promising solution for enabling resource-constrained user devices to run large-scale and complex applications by offloading their computational tasks to the edge servers. One of the most critical challenges in MEC is designing efficient task offloading [...] Read more.
Mobile edge computing (MEC) has emerged as a promising solution for enabling resource-constrained user devices to run large-scale and complex applications by offloading their computational tasks to the edge servers. One of the most critical challenges in MEC is designing efficient task offloading strategies. Traditional approaches either rely on non-intelligent algorithms that lack adaptability to the dynamic edge environment, or utilize learning-based methods that often ignore task dependencies within applications. To address this issue, this study investigates task offloading for mobile applications with interdependent tasks in an MEC system, employing a deep reinforcement learning framework. Specifically, we model task dependencies using a Directed Acyclic Graph (DAG), where nodes represent subtasks and directed edges indicate their dependency relationships. Based on task priorities, the DAG is transformed into a topological sequence of task vectors. We propose a novel graph-based offloading model, which combines an attention-based network and a Proximal Policy Optimization (PPO) algorithm to learn optimal offloading decisions. Our method leverages offline reinforcement learning through the attention network to capture intrinsic task dependencies within applications. Experimental results show that our proposed model exhibits strong decision-making capabilities and outperforms existing baseline algorithms. Full article
(This article belongs to the Special Issue Advancements in Edge and Cloud Computing for Industrial IoT)
Show Figures

Figure 1

12 pages, 2368 KB  
Article
Uncertainty-Aware Continual Reinforcement Learning via PPO with Graph Representation Learning
by Dongjae Kim
Mathematics 2025, 13(16), 2542; https://doi.org/10.3390/math13162542 - 8 Aug 2025
Viewed by 482
Abstract
Continual reinforcement learning (CRL) agents face significant challenges when encountering distributional shifts. This paper formalizes these shifts into two key scenarios, namely virtual drift (domain switches), where object semantics change (e.g., walls becoming lava), and concept drift (task switches), where the environment’s structure [...] Read more.
Continual reinforcement learning (CRL) agents face significant challenges when encountering distributional shifts. This paper formalizes these shifts into two key scenarios, namely virtual drift (domain switches), where object semantics change (e.g., walls becoming lava), and concept drift (task switches), where the environment’s structure is reconfigured (e.g., moving from object navigation to a door key puzzle). This paper demonstrates that while conventional convolutional neural networks (CNNs) struggle to preserve relational knowledge during these transitions, graph convolutional networks (GCNs) can inherently mitigate catastrophic forgetting by encoding object interactions through explicit topological reasoning. A unified framework is proposed that integrates GCN-based state representation learning with a proximal policy optimization (PPO) agent. The GCN’s message-passing mechanism preserves invariant relational structures, which diminishes performance degradation during abrupt domain switches. Experiments conducted in procedurally generated MiniGrid environments show that the method significantly reduces catastrophic forgetting in domain switch scenarios. While showing comparable mean performance in task switch scenarios, our method demonstrates substantially lower performance variance (Levene’s test, p<1.0×1010), indicating superior learning stability compared to CNN-based methods. By bridging graph representation learning with robust policy optimization in CRL, this research advances the stability of decision-making in dynamic environments and establishes GCNs as a principled alternative to CNNs for applications requiring stable, continual learning. Full article
(This article belongs to the Special Issue Decision Making under Uncertainty in Soft Computing)
Show Figures

Figure 1

Back to TopTop