Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (548)

Search Parameters:
Keywords = deep reinforcement learning (DRL) method

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
30 pages, 1499 KB  
Article
Environment-Aware Optimal Placement and Dynamic Reconfiguration of Underwater Robotic Sonar Networks Using Deep Reinforcement Learning
by Qiming Sang, Yu Tian, Jin Zhang, Yuyang Xiao, Zhiduo Tan, Jiancheng Yu and Fumin Zhang
J. Mar. Sci. Eng. 2026, 14(8), 733; https://doi.org/10.3390/jmse14080733 - 15 Apr 2026
Viewed by 134
Abstract
Underwater dynamic target detection, classification, localization, and tracking (DCLT) is central to maritime surveillance and monitoring and increasingly relies on distributed AUV-based robotic sonar networks operating in passive listening and, when required, cooperative multistatic modes. Achieving a robust performance in realistic oceans remains [...] Read more.
Underwater dynamic target detection, classification, localization, and tracking (DCLT) is central to maritime surveillance and monitoring and increasingly relies on distributed AUV-based robotic sonar networks operating in passive listening and, when required, cooperative multistatic modes. Achieving a robust performance in realistic oceans remains challenging, because sensor placement must adapt to time-varying acoustic conditions and target priors while preserving acoustic communication connectivity, and because frequent reconfiguration under dynamic currents makes classical large-scale planning computationally expensive. This paper presents an integrated deep reinforcement learning (DRL)-based framework for passive-stage sonar placement and dynamic reconfiguration in distributed AUV networks. First, we cast placement as a constructive finite-horizon Markov decision process (MDP) and train a Proximal Policy Optimization (PPO) agent to sequentially build a collision-free layout on a discretized surveillance grid. The terminal reward is formulated to jointly optimize the environment-aware detection performance, computed from BELLHOP-based transmission loss models, and global network connectivity, quantified using algebraic connectivity. Second, to enable time-critical reconfiguration, we estimate flow-aware motion costs for all AUV–destination pairs using a PPO with a Long Short-Term Memory (LSTM) trajectory policy trained for partial observability. The learned policy can be deployed onboard, allowing each AUV to refine its path online using locally sensed currents, improving robustness to ocean-model uncertainty. The resulting cost matrix is solved via an efficient zero-element assignment method to obtain the optimal one-to-one reassignment. In the reported simulation studies, the proposed Sequential PPO placement method achieves a final reward 16–21% higher than Particle Swarm Optimization (PSO) and 2–3.7% higher than the Genetic Algorithm (GA), while the proposed PPO + LSTM planner reduces average travel time by 30.44% compared with A*. The proposed closed-loop architecture supports frequent re-optimization, scalable fleet operation, and a seamless transition to communication-supported cooperative multistatic tracking after detection, enabling efficient, adaptive DCLT in dynamic marine environments. Full article
(This article belongs to the Section Ocean Engineering)
29 pages, 46316 KB  
Article
Adaptive Traffic Signal Control Using Deep Reinforcement Learning with Noise Injection
by Raul Alejandro Velasquez Ortiz, María Elena Lárraga Ramírez, Luis Agustín Alvarez-Icaza and Héctor Alonso Guzmán Gutiérrez
Appl. Sci. 2026, 16(8), 3833; https://doi.org/10.3390/app16083833 - 15 Apr 2026
Viewed by 200
Abstract
Adaptive traffic signal control (ATSC) remains a critical challenge for urban mobility. In this direction, deep reinforcement learning (DRL) has been widely investigated for ATSC, showing promising improvements in simulated environments. However, a noticeable gap remains between simulation-based results and practical implementations, due [...] Read more.
Adaptive traffic signal control (ATSC) remains a critical challenge for urban mobility. In this direction, deep reinforcement learning (DRL) has been widely investigated for ATSC, showing promising improvements in simulated environments. However, a noticeable gap remains between simulation-based results and practical implementations, due to reward formulations that do not address phase instability. Stochastic variations may trigger premature phase changes (“flickers”), affecting signal behavior and potentially limiting deployment in real scenarios. Although several works have examined delay, queues, and decentralized coordination, stability-focused variables remain comparatively less explored, particularly in single yet complex intersections. This study proposes a decentralized DRL model for ATSC with noise injection (ATSC-DRLNI) applied to a single intersection, introducing a stability-oriented reward function that integrates flickers, queue length, and advantage actor-critic (A2C) learning feedback. The model is evaluated in the Simulation of Urban MObility (SUMO) platform and compared against seven baseline methods, using real traffic data from a Mexican city for calibration and validation. Results suggest that penalizing flickers may contribute to more stable phase transitions, while reductions of up to 40% in queue length were observed in heavy-traffic scenarios. These findings indicate that incorporating stability-related variables into reward functions may help in implementing DRL-based ATSC studies. Full article
(This article belongs to the Section Transportation and Future Mobility)
Show Figures

Figure 1

18 pages, 6676 KB  
Article
Joint Phase and Power Optimization in RIS-Aided Multi-User Systems Using Deep Reinforcement Learning
by Qian Guo, Anming Dong, Sufang Li, Jiguo Yu and You Zhou
Electronics 2026, 15(8), 1564; https://doi.org/10.3390/electronics15081564 - 8 Apr 2026
Viewed by 333
Abstract
Reconfigurable intelligent surfaces (RIS) have emerged as a promising technology for enhancing wireless communication by intelligently shaping the propagation environment. However, non-line-of-sight (NLoS) blockage between the access point (AP) and user equipment (UE) can still significantly degrade communication performance. This paper investigates the [...] Read more.
Reconfigurable intelligent surfaces (RIS) have emerged as a promising technology for enhancing wireless communication by intelligently shaping the propagation environment. However, non-line-of-sight (NLoS) blockage between the access point (AP) and user equipment (UE) can still significantly degrade communication performance. This paper investigates the channel degradation caused by NLoS blockage in a single-antenna AP and multi-antenna UE system and proposes a joint power allocation and phase optimization scheme based on RIS and deep reinforcement learning (DRL). Under a composite channel model with direct and RIS-reflected links, the objective is to maximize the weighted sum rate subject to total power constraints, unit-modulus constraints on RIS elements, and quality of service (QoS) requirements. Due to the coupled variables and the non-convex unit-modulus constraint, conventional alternating optimization (AO) and convex approximation methods usually incur high complexity and yield suboptimal solutions. To address this issue, a DRL algorithm based on an Actor–Critic architecture is developed to learn adaptive power allocation and reflection coefficient adjustment policies through interaction with the environment, without requiring full global channel state information (CSI). Simulation results demonstrate that the proposed method achieves higher signal-to-interference-plus-noise ratio (SINR) and throughput while providing faster convergence and better generalization than existing methods. Full article
(This article belongs to the Special Issue AI-Driven Intelligent Systems in Energy, Healthcare, and Beyond)
Show Figures

Figure 1

25 pages, 3942 KB  
Article
Deep Reinforcement Learning-Based Scheduling for an Electric–Hydrogen Integrated Station Using a Data-Driven Electrolyzer Model
by Dongdong Li, Liang Liu and Haiyu Liao
Appl. Sci. 2026, 16(7), 3605; https://doi.org/10.3390/app16073605 - 7 Apr 2026
Viewed by 345
Abstract
To address the inaccurate scheduling of electric–hydrogen integrated stations (EHISs) caused by the limited accuracy of conventional mechanistic models for proton exchange membrane (PEM) electrolyzers, this study proposes a deep reinforcement learning (DRL)-based scheduling strategy incorporating a data-driven electrolyzer model. First, a deep [...] Read more.
To address the inaccurate scheduling of electric–hydrogen integrated stations (EHISs) caused by the limited accuracy of conventional mechanistic models for proton exchange membrane (PEM) electrolyzers, this study proposes a deep reinforcement learning (DRL)-based scheduling strategy incorporating a data-driven electrolyzer model. First, a deep XGBoost model is developed to characterize the hydrogen production behavior of the PEM electrolyzer, thereby replacing the traditional mechanistic model and reducing prediction errors. Second, the EHIS scheduling problem is formulated as a constrained Markov decision process (CMDP) that explicitly considers user demand and carbon emission constraints. Third, an improved deep Q-network (DQN) algorithm integrating Lagrangian relaxation and the template policy-based reinforcement learning (TPRL) method is designed to solve the scheduling problem, which enhances convergence speed and generalization performance under similar operating scenarios. The simulation results demonstrate that the proposed method can effectively alleviate the decision-making risks introduced by model inaccuracies and significantly improve the operational profitability of the station while satisfying user demand and carbon emission constraints. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

24 pages, 2051 KB  
Article
Physics-Informed Neural Networks and Deep Reinforcement Learning for Optimal Anti-Icing Strategies of Circular Tube Components in Polar Vessels
by Jinhao Xi, Chenyang Liu, Haiming Wen, Yan Chen, Siyu Zhang, Yuqiao Xin, Yutong Zhong and Dayong Zhang
J. Mar. Sci. Eng. 2026, 14(7), 685; https://doi.org/10.3390/jmse14070685 - 7 Apr 2026
Viewed by 337
Abstract
In polar environments, icing on ship deck surfaces severely compromises navigation safety. Conventional electric trace heating systems operate in continuous heating mode, resulting in high energy consumption. This study proposes an intelligent periodic heating control method that integrates physics-informed neural networks (PINNs) and [...] Read more.
In polar environments, icing on ship deck surfaces severely compromises navigation safety. Conventional electric trace heating systems operate in continuous heating mode, resulting in high energy consumption. This study proposes an intelligent periodic heating control method that integrates physics-informed neural networks (PINNs) and deep reinforcement learning (DRL) for energy-efficient anti-icing of circular pipe components on polar vessels. Using a polar coupled environment simulation platform, experiments were conducted on electric heating anti-icing for circular pipe components. Temperature data under various heating modes were collected, and a physically constrained PINN temperature prediction model was constructed, achieving high prediction accuracy with limited samples (test set R2 = 0.9091; 5-fold cross-validation R2 = 0.8877 ± 0.0312). The DRL agent trained in this virtual environment autonomously optimized the heating strategy, yielding optimal cycle parameters: heating ratio D = 0.722 and cycle duration τ = 88 s. While maintaining surface temperatures above 0 °C against a −10 °C ambient baseline, this strategy achieved a unit energy consumption of 0.27 kJ/°C, representing a 63% reduction compared to conventional continuous heating. This study provides a data-physics fusion control approach for polar vessel anti-icing systems, demonstrating strong potential for engineering applications. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

16 pages, 1553 KB  
Article
Research on the Collaborative Optimization Method of Power Prediction and DRL Control
by Mengjie Li, Yongbao Liu and Xing He
Processes 2026, 14(7), 1150; https://doi.org/10.3390/pr14071150 - 3 Apr 2026
Viewed by 254
Abstract
This paper proposes a collaborative energy management strategy based on power prediction and deep reinforcement learning (DRL) to address the trade-offs among economic efficiency, durability, and dynamic performance in fuel cell hybrid power systems (FCHPS) under dynamic driving conditions. First, a hybrid prediction [...] Read more.
This paper proposes a collaborative energy management strategy based on power prediction and deep reinforcement learning (DRL) to address the trade-offs among economic efficiency, durability, and dynamic performance in fuel cell hybrid power systems (FCHPS) under dynamic driving conditions. First, a hybrid prediction model termed LSTM-LSSVM with Cascade Correction (LSTM-LSSVM-CC) is developed. The cascade correction (CC) mechanism adopts a hierarchical structure to capture both low-frequency steady-state trends and high-frequency dynamic fluctuations, which are typically challenging for single models to represent. By integrating an online residual correction mechanism, this model generates accurate future power demand sequences. Second, a Dynamic Spatio-Temporal Fusion (DSTF) method is introduced to construct a high-dimensional DRL state space. This approach integrates predicted data, historical residuals, and real-time system states, enabling the agent to perform anticipatory decision-making. Third, a Dynamic Hierarchical Adaptive Multi-Objective Optimization Framework (DHAMOF) is designed. This framework dynamically adjusts objective weights and constraint boundaries based on real-time operating characteristics, enabling adaptive switching of optimization priorities across diverse scenarios. Furthermore, a closed-loop control architecture comprising “prediction–decision–execution–feedback” is established. By incorporating rolling horizon optimization and a proportional-integral (PI) residual compensation mechanism, the proposed architecture effectively suppresses prediction error accumulation and mitigates communication delays. Simulation results under combined CLTC-P and WLTP driving cycles demonstrate that, compared to conventional fixed-weight strategies, the proposed method achieves an 11.3% reduction in hydrogen consumption, a 30.9% decrease in SOC fluctuation range, and a 55.3% reduction in power tracking error. Moreover, under disturbance scenarios involving prediction errors, sensor noise, and a 200 ms communication delay, the system exhibits superior robustness: the increase in hydrogen consumption is limited to within 8.3 g/100 km, and the power tracking error is reduced by 65.6% relative to uncorrected baselines. This collaborative optimization approach overcomes the limitations of traditional open-loop prediction and fixed-weight control, offering a novel technical pathway for the high-efficiency and stable operation of fuel cell hybrid power systems. Full article
(This article belongs to the Special Issue Recent Advances in Fuel Cell Technology and Its Application Process)
Show Figures

Figure 1

23 pages, 8076 KB  
Article
Task Offloading of Parked Vehicles Edge Computing Based on Differential Privacy Hotstuff
by Guoling Liang, Zhaoyu Su, Chunhai Li, Mingfeng Chen and Feng Zhao
Information 2026, 17(4), 339; https://doi.org/10.3390/info17040339 - 1 Apr 2026
Viewed by 292
Abstract
The integration of blockchain into parked vehicle edge computing (PVEC) has emerged as a promising approach to mitigate the inherent trust challenges in distributed and untrusted computing environments. However, during task offloading and consensus, vehicles are vulnerable to location information disclosure, leading to [...] Read more.
The integration of blockchain into parked vehicle edge computing (PVEC) has emerged as a promising approach to mitigate the inherent trust challenges in distributed and untrusted computing environments. However, during task offloading and consensus, vehicles are vulnerable to location information disclosure, leading to privacy leakage. To address this problem, we propose a location differential privacy-enabled blockchain PVEC (DBPVEC) framework to protect location information during offloading and consensus. Specifically, we design a location differential privacy mechanism based on the Laplace mechanism and theoretically prove that it satisfies ε-differential privacy. This mechanism perturbs vehicles’ locations, and a privacy-preserving offloading strategy is designed to enhance the Hotstuff consensus and protect location privacy in edge computing. Subsequently, we formulate a joint optimization problem, considering system energy consumption, latency, and privacy strength. To solve it, we design a two-layer deep reinforcement learning (DRL) algorithm, with a Deep Q-Network (DQN) as the upper layer and a Deep Deterministic Policy Gradient (DDPG) as the lower layer, to determine the optimal offloading strategy. The experimental results demonstrate that our scheme achieves significant reductions compared to the two baseline methods: the total cost decreases by 68.31% and 63.25%, energy consumption by 9.96% and 16.27%, and delay by 31.46% and 18.07%, respectively. Moreover, it effectively preserves vehicle location privacy during task offloading and consensus while maintaining favorable performance in energy consumption and latency. Full article
(This article belongs to the Section Information and Communications Technology)
Show Figures

Figure 1

35 pages, 5726 KB  
Article
A Multi-Objective Collaborative Optimization Approach for Building Integrated Energy Systems Based on Deep Reinforcement Learning
by Limin Wang, Yongkai Wu, Jumin Zhao, Wei Gao and Dengao Li
Appl. Sci. 2026, 16(7), 3280; https://doi.org/10.3390/app16073280 - 28 Mar 2026
Viewed by 279
Abstract
To address the challenges of coordinated optimization in building integrated energy systems (IES) under the dual-carbon targets—characterized by strong multi-energy coupling, significant uncertainty in renewable generation, and stringent safety constraints—a novel safe deep reinforcement learning algorithm, Safe-DDPG, is proposed. Traditional deep reinforcement learning [...] Read more.
To address the challenges of coordinated optimization in building integrated energy systems (IES) under the dual-carbon targets—characterized by strong multi-energy coupling, significant uncertainty in renewable generation, and stringent safety constraints—a novel safe deep reinforcement learning algorithm, Safe-DDPG, is proposed. Traditional deep reinforcement learning methods often suffer from high constraint-violation risk and limited policy reliability due to coupled objectives in building IES optimization. To overcome these limitations, a dual-channel critic architecture is designed to independently evaluate and decouple economic and safety objectives. In addition, a dynamic safety–penalty mechanism based on logarithmic barrier functions is introduced, together with an adaptive exploration strategy, enabling dynamic balancing between economic cost and constraint satisfaction according to system states during training. Experimental results demonstrate that, compared with mainstream algorithms, Safe-DDPG achieves substantial improvements across multiple key performance indicators: safety violations are reduced by up to 96.7%, average daily operating costs decrease by 18.5%, and cumulative rewards increase by more than 30%. Ablation studies further confirm the effectiveness and necessity of each core component. Two DRL methods from reference papers are reproduced, and their performance is compared with the proposed method in the existing experimental results, showing that the proposed method has significant advantages in reward value and economic cost. This work provides a safe, reliable, and efficient reinforcement-learning-based approach for optimization and scheduling of building energy systems under complex operational constraints. Full article
Show Figures

Figure 1

26 pages, 6706 KB  
Article
Efficient Emergency Load Shedding to Mitigate Fault-Induced Delayed Voltage Recovery Using Cloud–Edge Collaborative Learning and Guided Evolutionary Strategy
by Dongyang Yang, Bing Cheng, Jisi Wu, Yunan Zhao, Xingao Tang and Renke Huang
Electronics 2026, 15(7), 1377; https://doi.org/10.3390/electronics15071377 - 26 Mar 2026
Viewed by 338
Abstract
Fault-induced delayed voltage recovery (FIDVR) poses a serious threat to modern power grid operation, where stalled induction motors following a fault can sustain dangerously low bus voltages and potentially trigger cascading failures. While deep reinforcement learning (DRL) has shown promise for emergency load [...] Read more.
Fault-induced delayed voltage recovery (FIDVR) poses a serious threat to modern power grid operation, where stalled induction motors following a fault can sustain dangerously low bus voltages and potentially trigger cascading failures. While deep reinforcement learning (DRL) has shown promise for emergency load shedding control, existing centralized DRL approaches require extensive communication infrastructure and large neural network models that are computationally prohibitive to train at scale. Fully decentralized approaches, on the other hand, lack inter-agent information sharing and coordination, often resulting in inadequate voltage recovery across area boundaries. To address these limitations, we propose a Cloud–Edge Collaborative DRL framework that combines lightweight, area-specific edge agents for local load shedding control with a supervisory cloud agent that coordinates their actions globally, achieving scalable training and system-wide voltage recovery simultaneously. Training is further accelerated through a modified Guided Surrogate-gradient-based Evolutionary Random Search (GSERS) algorithm. Validation on the IEEE 300-bus system demonstrates that the proposed framework reduces training time by approximately 90% compared to the fully centralized approach, while achieving comparable voltage recovery performance to the centralized method and approximately 80% better reward performance than the fully decentralized approach, confirming the critical benefit of the cloud-level coordination mechanism. Full article
(This article belongs to the Section Power Electronics)
Show Figures

Figure 1

31 pages, 5541 KB  
Article
Preference-Guided Reinforcement Learning for Dynamic Green Flexible Assembly Job Shop Scheduling with Learning–Forgetting Effects
by Ruyi Wang, Xiaojuan Liao, Guangzhu Chen, Yaxin Liu and Leyuan Liu
Sustainability 2026, 18(7), 3222; https://doi.org/10.3390/su18073222 - 25 Mar 2026
Viewed by 507
Abstract
With the evolution from Industry 4.0 to 5.0, flexible assembly scheduling must simultaneously address production efficiency, environmental sustainability, and human factors, while remaining adaptive to real-time disruptions. This study investigates the dynamic green scheduling problem in dual-resource Flexible Assembly Job Shops with worker [...] Read more.
With the evolution from Industry 4.0 to 5.0, flexible assembly scheduling must simultaneously address production efficiency, environmental sustainability, and human factors, while remaining adaptive to real-time disruptions. This study investigates the dynamic green scheduling problem in dual-resource Flexible Assembly Job Shops with worker learning and forgetting, aiming to minimize makespan and total energy consumption. To tackle this problem, a Hierarchical Dual-Agent Deep Reinforcement Learning algorithm (HAD-DRL) is proposed. The framework integrates a Heterogeneous Graph Neural Network to extract real-time workshop states and employs two collaborative agents, i.e., a high-level preference decision agent and a low-level scheduling execution agent. The upper agent dynamically adjusts the preference weights between economic and environmental objectives, while the lower agent generates corresponding scheduling actions. Unlike existing multi-agent methods that optimize a single objective at each step, HAD-DRL achieves adaptive coordination and balanced trade-offs among conflicting goals. Experimental results demonstrate that the proposed method outperforms heuristic and baseline DRL approaches in both objectives, validating its effectiveness and practical applicability for intelligent and sustainable manufacturing. Full article
(This article belongs to the Special Issue Sustainable Manufacturing Systems in the Context of Industry 4.0)
Show Figures

Figure 1

24 pages, 4424 KB  
Article
Hybrid Attribution-Based Interpretable Deep Reinforcement Learning for Autonomous Driving Behavior Decision-Making
by Yaxuan Liu, Jiakun Huang, Mingjun Li, Qing Ye and Xiaolin Song
Appl. Sci. 2026, 16(6), 3096; https://doi.org/10.3390/app16063096 - 23 Mar 2026
Viewed by 303
Abstract
With the increasing deployment of autonomous driving systems, the opaque nature of deep reinforcement learning (DRL) decision models hinders understanding and validation of driving decisions. To address this challenge, we propose a Hybrid Attribution-based Interpretable Deep Reinforcement Learning framework (HA-IDRL) for autonomous driving [...] Read more.
With the increasing deployment of autonomous driving systems, the opaque nature of deep reinforcement learning (DRL) decision models hinders understanding and validation of driving decisions. To address this challenge, we propose a Hybrid Attribution-based Interpretable Deep Reinforcement Learning framework (HA-IDRL) for autonomous driving behavior decision-making. The framework introduces a Hybrid Gradient–LRP (HGL) attribution mechanism that integrates gradient-based attribution and Layer-wise Relevance Propagation (LRP) to capture complementary sensitivity and contribution information, producing more consistent and comprehensive post hoc explanations. In addition to post hoc interpretability, we enhance structural interpretability by replacing the conventional multilayer perceptron (MLP) in the Dueling Deep Q-Network (Dueling DQN) architecture with Kolmogorov–Arnold Networks (KAN). By representing nonlinear interactions through learnable univariate functions and explicit summation structures, KAN provides inherently interpretable functional decompositions. The proposed framework is evaluated on a highway lane-changing task using the highway-env simulator. Experimental results show that HA-IDRL achieves decision-making performance comparable to representative DRL baselines, including Dueling DQN and Soft Actor-Critic (SAC), while providing explanations that are more stable and better aligned with human driving semantics. Moreover, the proposed method produces explanations with low computational overhead, enabling efficient and real-time interpretability in practical autonomous driving applications. Overall, HA-IDRL advances trustworthy autonomous driving by enabling high-performance decision-making and rigorous, multi-level interpretability, thereby improving the transparency and operational reliability of DRL-based driving policies. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

25 pages, 4865 KB  
Article
Hybrid Attention-Augmented Deep Reinforcement Learning for Intelligent Machining Process Route Planning
by Ruizhe Wang, Minrui Wang, Ziyan Du, Xiaochuan Dong and Yibing Peng
Machines 2026, 14(3), 343; https://doi.org/10.3390/machines14030343 - 18 Mar 2026
Viewed by 349
Abstract
Machining process route planning (MPRP) is vital for autonomous manufacturing yet remains challenging under complex, multi-dimensional engineering constraints. This paper proposes an attention-augmented deep reinforcement learning (DRL) framework to achieve intelligent process orchestration. First, an Optional Process Attribute Adjacency Graph (OPAAG) is established [...] Read more.
Machining process route planning (MPRP) is vital for autonomous manufacturing yet remains challenging under complex, multi-dimensional engineering constraints. This paper proposes an attention-augmented deep reinforcement learning (DRL) framework to achieve intelligent process orchestration. First, an Optional Process Attribute Adjacency Graph (OPAAG) is established to formally model the “feature–process–resource–constraint” coupling, enhancing the agent’s perception of manufacturing semantics. The architecture synergistically integrates Graph Attention Networks (GAT) to perceive spatial benchmark dependencies and a Transformer-based encoder to capture sequential resource correlations within variable-length machining chains. Furthermore, a dynamic action masking mechanism is integrated to guarantee a 100% constraint satisfaction rate during both training and inference stages. Experimental evaluations across diverse part geometries demonstrate that the proposed method offers significant advantages in cost optimization, inference efficiency, and topological stability compared to traditional heuristic algorithms and standard DRL models. By effectively distilling the search space and maintaining action feasibility, the framework provides an efficient and robust solution for autonomous process planning in complex industrial scenarios. Full article
(This article belongs to the Section Advanced Manufacturing)
Show Figures

Figure 1

19 pages, 1361 KB  
Article
A New Method for Optimizing Low-Earth-Orbit Satellite Communication Links Based on Deep Reinforcement Learning
by He Yu, Shengli Li, Junchao Wu, Yanhong Sun and Limin Wang
Aerospace 2026, 13(3), 285; https://doi.org/10.3390/aerospace13030285 - 18 Mar 2026
Viewed by 342
Abstract
In low-Earth-orbit (LEO) satellite networks, the need for intelligent parameter-adjustment strategies has become increasingly critical due to the presence of highly dynamic channel conditions, limited spectrum resources, and complex interference environments. In this paper, a method for optimizing LEO satellite communication links based [...] Read more.
In low-Earth-orbit (LEO) satellite networks, the need for intelligent parameter-adjustment strategies has become increasingly critical due to the presence of highly dynamic channel conditions, limited spectrum resources, and complex interference environments. In this paper, a method for optimizing LEO satellite communication links based on deep reinforcement learning (DRL) is proposed. Through the optimization of the transmit power, the modulation and coding scheme (MCS), the beamforming parameters, and the retransmission mechanisms, adaptive link control is achieved in dynamic operational scenarios. A multidimensional state space is constructed, within which the channel state information, the interference environment, and the historical performance metrics are integrated. The spatio-temporal characteristics of the channel are extracted by means of a hybrid neural architecture that incorporates a convolutional neural network (CNN) and a long short-term memory (LSTM) network. To effectively accommodate both continuous and discrete action spaces, a hybrid DRL framework that combines proximal policy optimization (PPO) with a deep Q-network (DQN) is employed, thereby enabling cross-layer optimization of the physical-layer and link-layer parameters. The results demonstrate that substantial improvements in throughput, bit error rate (BER), and transmit-power efficiency are achieved under severely time-varying channel conditions, which provides a new idea for resource management and dynamic-environment adaptation in satellite communication systems. Full article
(This article belongs to the Special Issue Advanced Spacecraft/Satellite Technologies (2nd Edition))
Show Figures

Figure 1

30 pages, 15769 KB  
Article
A Feature-Fusion Deep Reinforcement Learning Framework for Multi-Configuration Engineering Drawing Layout
by Yunlei Sun, Peng Dai, Yangxingyue Liu and Chao Liu
Algorithms 2026, 19(3), 226; https://doi.org/10.3390/a19030226 - 17 Mar 2026
Viewed by 344
Abstract
Engineering drawings are fundamental to industries such as oil and gas, construction, and manufacturing. However, current practices relying on manual design or rigid parametric templates often suffer from inefficiency and layout inconsistencies. To address these issues, the layout task is formulated as the [...] Read more.
Engineering drawings are fundamental to industries such as oil and gas, construction, and manufacturing. However, current practices relying on manual design or rigid parametric templates often suffer from inefficiency and layout inconsistencies. To address these issues, the layout task is formulated as the Orthogonal Rectangle Packing Problem with Multiple Configurations and Complex Constraints (ORPPMC). The Deep Reinforcement Learning for Multi-Configuration Drawing Layout (DRL-MCDL) framework is proposed, which integrates the Pointer Network for Drawing Element Sequencing (PN-DES) with the Target-Type-Matching-based Multi-Pattern Positioning Strategy (TTM-MPPS). Within this framework, PN-DES employs deep reinforcement learning and feature fusion to combine element attributes with layout configurations for optimal sequence inference, while TTM-MPPS performs precise positioning in accordance with industrial rules to ensure strict adherence to aesthetic requirements. Ablation experiments validate the contribution of each module. Experimental results on real-world engineering drawings demonstrate that DRL-MCDL achieves a Feasibility Rate (FR) exceeding 98.5% on standard instances (12–40 elements), significantly outperforming traditional methods. Furthermore, it maintains a high inference efficiency with an Average Time (AT) of less than 0.3 s, striking an optimal balance between layout quality and computational speed. Full article
(This article belongs to the Section Combinatorial Optimization, Graph, and Network Algorithms)
Show Figures

Figure 1

17 pages, 1288 KB  
Article
An Energy Management Optimization Method for Arctic Space Environment Monitoring Buoys Based on Deep Reinforcement Learning
by Hui Zhu, Bingrui Li, Yan Chen, Yinke Dou, Yi Tian, Yahao Li, Huiguang Li and Zepeng Gao
Energies 2026, 19(6), 1487; https://doi.org/10.3390/en19061487 - 17 Mar 2026
Viewed by 281
Abstract
To address the long-term operational challenges of space environment monitoring buoys under extreme Arctic conditions, this paper proposes an energy management optimization method based on deep reinforcement learning (DRL). By constructing a buoy system model that integrates renewable energy sources, a primary lithium [...] Read more.
To address the long-term operational challenges of space environment monitoring buoys under extreme Arctic conditions, this paper proposes an energy management optimization method based on deep reinforcement learning (DRL). By constructing a buoy system model that integrates renewable energy sources, a primary lithium battery power supply, and a battery energy storage unit, combined with an Arctic environmental model incorporating low-temperature efficiency degradation, a reward function was designed to minimize power supply deficits while ensuring system reliability. The Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm was employed to optimize energy scheduling strategies. Simulation results based on real Arctic data (August 2024–January 2025) demonstrate that integrating wind turbines significantly reduces reliance on primary lithium batteries. Specifically, the required lithium battery capacity was reduced by 87.5% (from 61.44 kWh to 7.685 kWh), and procurement costs were lowered by approximately $68,830 compared to non-rechargeable schemes1. This method significantly enhances the buoy’s endurance and scheduling intelligence, offering valid insights into energy management in intelligent polar observation equipment. Full article
(This article belongs to the Section F5: Artificial Intelligence and Smart Energy)
Show Figures

Figure 1

Back to TopTop