MDPI - Publisher of Open Access Journals

57 pages, 2224 KB

Open AccessArticle

Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning for Performance Optimization of Conical Solar Distillers with Sand-Filled Copper Fins: A Novel Bio-Inspired Approach

by Mohamed Loey, Mostafa Elbaz, Hanaa Salem Marie and Heba M. Khalil

AI 2026, 7(4), 145; https://doi.org/10.3390/ai7040145 - 17 Apr 2026

Viewed by 85

Abstract

This study introduces a novel Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning (QI-HBEUA-RL) for comprehensive optimization of conical solar distillers equipped with sand-filled copper conical fins. The proposed algorithm synergistically combines quantum computing principles (superposition and entanglement), bio-inspired metaheuristics (Bald Eagle Search [...] Read more.

This study introduces a novel Quantum-Inspired Hybrid Bald Eagle-Ukari Algorithm with Reinforcement Learning (QI-HBEUA-RL) for comprehensive optimization of conical solar distillers equipped with sand-filled copper conical fins. The proposed algorithm synergistically combines quantum computing principles (superposition and entanglement), bio-inspired metaheuristics (Bald Eagle Search and Ukari Algorithm), and reinforcement learning mechanisms to achieve unprecedented optimization performance in complex thermal-hydraulic systems. The QI-HBEUA-RL framework employs quantum-encoded population representation, enabling simultaneous exploration of multiple solution states, while reinforcement learning dynamically adjusts algorithmic parameters based on search landscape characteristics and historical performance data. Experimental validation tested seven distiller configurations in El-Oued, Algeria, under controlled conditions (7.85 kWh/m²/day solar radiation, 42.2 °C ambient temperature). The optimal configuration of copper conical fins with 14 g sand at 0 cm spacing achieved: daily productivity of 7.75 L/m²/day (+61.46% improvement over conventional design), thermal efficiency of 61.9%, exergy efficiency of 4.02%, and economic payback period of 5.8 days. Comprehensive algorithm comparison against six state-of-the-art multi-objective optimizers (NSGA-II, MOEA/D, MOPSO, MOGWO, MOHHO) across 30 independent runs demonstrated statistically significant superiority (p < 0.001, Wilcoxon test). QI-HBEUA-RL achieved 7.42% improvement in hypervolume indicator, 29.35% reduction in inverted generational distance, and 19.49% better solution spacing. Generalization validation on seven benchmark problems (ZDT1-6, DTLZ2, DTLZ7) and three renewable energy applications confirmed algorithm robustness across diverse problem types. Three real-world case studies, remote village water supply (238:1 benefit–cost), industrial facility (100% energy reduction), and emergency relief (740× cost savings) validate practical implementation viability. This research advances solar thermal desalination technology and multi-objective optimization methodologies, providing validated solutions for sustainable freshwater production in water-scarce regions. Full article

26 pages, 1456 KB

Open AccessArticle

Artificial Intelligence-Based Decision Support System for UAV Control in a Simulated Environment

by Przemysław Sujecki and Damian Frąszczak

Sensors 2026, 26(8), 2436; https://doi.org/10.3390/s26082436 - 15 Apr 2026

Viewed by 204

Abstract

Unmanned aerial vehicles (UAVs) are increasingly deployed in missions that require high autonomy and reliable decision-making; however, many operational concepts still assume access to GNSS and stable communication with a human operator. In contested environments, this assumption may no longer hold because GNSS [...] Read more.

Unmanned aerial vehicles (UAVs) are increasingly deployed in missions that require high autonomy and reliable decision-making; however, many operational concepts still assume access to GNSS and stable communication with a human operator. In contested environments, this assumption may no longer hold because GNSS degradation, radio-frequency interference, and intentional jamming can disrupt positioning and communication, thereby reducing mission effectiveness and safety. Recent surveys show that operation in GNSS-denied environments remains a major challenge and often requires alternative perception, localization, and control strategies. In response, this article investigates a reinforcement learning (RL)-based decision-support system for the autonomous control of a quadrotor UAV in a three-dimensional simulated environment. Rather than following pre-programmed waypoints, the UAV learns a control policy through interaction with the environment and reward-driven adaptation. The proposed system is designed for mission execution under uncertainty, limited external guidance, and partial observability. Two policy-gradient approaches are implemented and compared: classical REINFORCE and Proximal Policy Optimization (PPO) with an Actor–Critic architecture. The study presents the simulation environment, state and action representation, reward formulation, staged training procedure, and comparative evaluation. The results indicate that, within the considered unseen test scenario, the PPO-based configuration achieved higher mission effectiveness than REINFORCE in the final unseen test scenario, supporting the practical relevance of structured deep reinforcement learning for UAV operation in GPS-denied and communication-constrained environments. Full article

(This article belongs to the Special Issue UAVs as Mobile Sensing Platforms: Advances, Innovations, and Emerging Applications)

34 pages, 3125 KB

Open AccessArticle

Optimized Signal Acquisition and Advanced AI for Robust 1D EMG Classification: A Comparative Study of Machine Learning, Deep Learning, and Reinforcement Learning

by Anagha Shinde, Virendra Shete and Ninad Mehendale

Bioengineering 2026, 13(4), 463; https://doi.org/10.3390/bioengineering13040463 - 15 Apr 2026

Viewed by 231

Abstract

Electromyography (EMG) signals are critical for prosthetic control, rehabilitation, and human–machine interaction, yet their classification remains challenging due to noise, non-stationarity, and inter-subject variability. This study presents a comprehensive comparative analysis of machine learning (ML), deep learning (DL), and reinforcement learning (RL) approaches [...] Read more.

Electromyography (EMG) signals are critical for prosthetic control, rehabilitation, and human–machine interaction, yet their classification remains challenging due to noise, non-stationarity, and inter-subject variability. This study presents a comprehensive comparative analysis of machine learning (ML), deep learning (DL), and reinforcement learning (RL) approaches for 1D EMG signal classification, with a systematic evaluation of signal acquisition parameters. Using both synthetic and real-world EMG datasets, we demonstrate that 8–10 bit quantization and a 2000 Hz sampling rate provide optimal signal fidelity while maintaining data efficiency. Among the evaluated models, ensemble methods (Gradient Boosting, Voting Ensemble) and advanced DL architectures (LSTM, Transformer) achieved superior performance on real EMG data, with accuracies reaching 100% and 96.3%, respectively. Notably, reinforcement learning agents (Deep Q-Networks) demonstrated 100% accuracy on multiclass synthetic data, revealing their potential for learning complex bio-signal representations. Our findings establish that meticulous optimization of preprocessing pipelines, combined with robust AI models, significantly enhances EMG classification accuracy. This work provides empirical guidance for selecting optimal acquisition parameters and AI architectures for practical EMG analysis systems, with direct implications for prosthetic control and rehabilitation technologies. Full article

(This article belongs to the Section Biosignal Processing)

► Show Figures

Graphical abstract

33 pages, 85096 KB

Open AccessArticle

Modeling Seismic Resilience and Hospital Evacuation: A Comparative Analysis of Multi-Agent Reinforcement Learning and Classical Evacuation Models

by Chunlin Bian, Yonghao Guo, Gang Meng, Liuyang Li, Hua Chen, Fuhong Lv and Xiaofeng Chai

Buildings 2026, 16(8), 1538; https://doi.org/10.3390/buildings16081538 - 14 Apr 2026

Viewed by 159

Abstract

Hospitals in earthquake-prone regions must evacuate heterogeneous occupants rapidly while preserving operational continuity under disrupted conditions. However, many hospital-evacuation studies still rely on static routing assumptions or narrowly defined behavioral rules, which limits their value for building-level resilience planning. This paper develops a [...] Read more.

Hospitals in earthquake-prone regions must evacuate heterogeneous occupants rapidly while preserving operational continuity under disrupted conditions. However, many hospital-evacuation studies still rely on static routing assumptions or narrowly defined behavioral rules, which limits their value for building-level resilience planning. This paper develops a comparative hospital-campus evacuation framework that combines GIS-based geodesic routing, heterogeneous agent-based modeling, and reinforcement-learning-based decision policies. Puge County People’s Hospital in Sichuan, China, is used as the case study. Six algorithms are evaluated: three rule-based baselines—Shortest Path (SP), Random Walk (RW), and the Social Force Model (SFM)—together with a training-free density-aware heuristic, Density-Aware Gradient Routing (DAGR), and two reinforcement-learning approaches, Density-Aware Q-Learning (DAQL) and SARSA. Experiments cover three population scales (

N \in {50, 100, 200}

), normal daytime conditions, staffing-variation scenarios, and a blocked-exit disruption scenario, with 30 independent runs for each main condition. The results show that the rule-based and training-free methods remain the most reliable under full multi-agent evaluation: the SFM and RW achieve the highest completion ratios (approximately 100% and 93.5%, respectively), while DAGR provides the strongest balance between completion and evacuation efficiency among the non-trained methods. In contrast, the trained RL agents perform substantially worse in direct multi-agent deployment with DAQL reaching approximately 37% completion and SARSA approximately 17%, highlighting a train–evaluation distribution shift associated with independent Q-learning. The ablation analysis further shows that collision avoidance is the most critical reward component, whereas density-avoidance shaping can unintentionally induce collective deadlock when all agents execute the learned policy simultaneously. Among the enhanced variants, DAQL_RoleAware yields the best overall improvement, increasing the completion ratio to approximately 52% and reducing the 90th-percentile evacuation time to approximately 363 s. Overall, this paper clarifies both the promise and the present limitations of density-aware reinforcement learning for hospital evacuation while providing a more building-centred and reproducible basis for future coordination-aware evacuation design and emergency-planning research. Full article

(This article belongs to the Special Issue Innovative Solutions for Enhancing Seismic Resilience of Buildings)

32 pages, 2407 KB

Open AccessArticle

Continuous-Time Scheduling of Berths and Onshore Power Supply in Cold-Chain Logistics: A Chance-Constrained Stochastic Programming Model and RL-ALNS Algorithm

by Zheyin Zhao and Jin Zhu

Mathematics 2026, 14(8), 1292; https://doi.org/10.3390/math14081292 - 13 Apr 2026

Viewed by 162

Abstract

Amid tightening emission rules and growing cold-chain demand, ports face complex multi-objective scheduling under dual uncertainties in vessel arrivals and operations. This work develops a multi-objective chance-constrained stochastic MILP model for joint berth, QC, and OPS scheduling. Heavy-tailed operational delays are managed via [...] Read more.

Amid tightening emission rules and growing cold-chain demand, ports face complex multi-objective scheduling under dual uncertainties in vessel arrivals and operations. This work develops a multi-objective chance-constrained stochastic MILP model for joint berth, QC, and OPS scheduling. Heavy-tailed operational delays are managed via chance constraints, converting Weibull distributions to time buffers, while convex formulations allow piecewise cargo damage penalties to be computed linearly. A reinforcement learning-based adaptive large neighborhood search (RL-ALNS) algorithm is proposed to solve this NP-hard continuous-time problem, integrating a spatiotemporal decoder and an MDP-based selector to ensure microgrid limits and efficiency. Simulations demonstrate RL-ALNS’s superior Pareto convergence versus conventional heuristics. The model cuts the 95th-percentile tail risk by 46.59% and actual costs by 24.44% under mild delays, compared to deterministic scheduling. Overall, it quantifies the non-linear cost–emission–reliability trade-off, providing a robust tool for port decision-making. Full article

(This article belongs to the Special Issue Advanced Optimization Modeling and Algorithms for Planning and Scheduling)

16 pages, 2714 KB

Open AccessArticle

Mitigating Distribution Shift in Offline RL-Based Recommender Systems with a Q-Learning Regularization Decision Transformer

by Yu Zhou, Xinyu Guo, Yuanbo Jiang, Jiaxuan Fang, Jin-Qiang Wang, Peng Zhi, Gang Liu, Rui Zhou, Ling-Huey Li, Chuanyi Liu, Qingguo Zhou and Kuan-Ching Li

Information 2026, 17(4), 364; https://doi.org/10.3390/info17040364 - 13 Apr 2026

Viewed by 212

Abstract

Optimizing long-term user satisfaction in sequential recommender systems is a critical challenge. Offline reinforcement learning (RL) offers a promising solution by learning recommendation policies from historical interaction logs without incurring the high costs of online exploration. However, offline RL suffers from severe distribution [...] Read more.

Optimizing long-term user satisfaction in sequential recommender systems is a critical challenge. Offline reinforcement learning (RL) offers a promising solution by learning recommendation policies from historical interaction logs without incurring the high costs of online exploration. However, offline RL suffers from severe distribution shift: the learned policy often overestimates the value of out-of-distribution (OOD) items, leading to unreliable recommendations and compromising user satisfaction. To address this issue, we propose a novel framework known as the Q-Learning Regularized Decision Transformer (QRDT). Built upon the Decision Transformer architecture, QRDT models recommendations as a sequence prediction task to capture complex user interest dynamics. To mitigate distribution shift, the QRDT integrates Kullback–Leibler (KL) divergence and maximum entropy regularization into the Q-value function, enabling conservative long-term value estimation while encouraging diverse exploration within the logged data distribution. Extensive experiments on four real-world Amazon e-commerce datasets (CDs, Clothing, Cellphones, and Beauty) demonstrate that the QRDT achieves competitive performance and outperforms the PGPR baseline in most scenarios. Specifically, the proposed method yields improvements of 2.99% in Hit Rate (HR), 2.19% in Normalized Discounted Cumulative Gain (NDCG), 0.94% in Recall, and 0.84% in Precision, verifying the effectiveness of our regularization approach. Full article

(This article belongs to the Special Issue 2nd Edition of Modern Recommender Systems: Approaches, Challenges and Applications)

► Show Figures

Graphical abstract

30 pages, 939 KB

Open AccessArticle

AI-Driven Financial Solutions for Climate Resilience and Geopolitical Risk Mitigation in Low- and Middle-Income Countries

by Abdelrahman Mohamed Mohamed Saeed and Muhammad Ali

Economies 2026, 14(4), 134; https://doi.org/10.3390/economies14040134 - 10 Apr 2026

Viewed by 395

Abstract

Climate change disproportionately threatens low- and middle-income countries, yet integrated assessments combining socio-economic fragility with physical hazards remain limited. This study quantifies multi-dimensional climate vulnerability and derives optimized adaptation policies for six representative nations (Bangladesh, Colombia, Kenya, Morocco, Pakistan, Vietnam) by fusing socio-economic [...] Read more.

Climate change disproportionately threatens low- and middle-income countries, yet integrated assessments combining socio-economic fragility with physical hazards remain limited. This study quantifies multi-dimensional climate vulnerability and derives optimized adaptation policies for six representative nations (Bangladesh, Colombia, Kenya, Morocco, Pakistan, Vietnam) by fusing socio-economic indicators with climate risk data (2000–2024). A computational framework integrating unsupervised learning, dimensionality reduction, and predictive modeling was employed. Principal Component Analysis synthesized eight indicators into a Compound Vulnerability Score (CVS), while K-Means and DBSCAN identified distinct vulnerability regimes. XGBoost quantified driver importance, and Graph Neural Networks captured systemic interconnections. XGBoost identified projected drought risk (31.2%), precipitation change (18.1%), and poverty headcount (14.3%) as primary drivers. Graph networks demonstrated significant risk amplification in African nations (Morocco SRS: 0.728–0.874; Kenya SRS: 0.504–0.641) versus damping in Asian countries. A Reinforcement Learning (RL) agent was trained using Deep Q-Networks with experience replay to optimize intervention portfolios under budget constraints. The RL policy achieved a 23% reduction in systemic risk compared to uniform allocation baselines, generating context-specific priorities: drought management for Morocco (score 50) and Pakistan (40); poverty alleviation for Kenya (40); coastal protection for Bangladesh (40); agricultural resilience for Vietnam (35); and institutional capacity building for Colombia (50). In conclusion, socio-economic fragility non-linearly amplifies climate hazards, with poverty and drought risk constituting critical vulnerability multipliers. The AI-driven framework demonstrates that targeted interventions in high-sensitivity systems maximize systemic risk reduction. This integrated approach provides a replicable, evidence-based foundation for strategic adaptation finance allocation in an increasingly uncertain climate future. Full article

(This article belongs to the Special Issue Energy Consumption, Financial Development and Economic Growth)

► Show Figures

Figure 1

52 pages, 3234 KB

Open AccessPerspective

Edge-Intelligent and Cyber-Resilient Coordination of Electric Vehicles and Distributed Energy Resources in Modern Distribution Grids

by Mahmoud Ghofrani

Energies 2026, 19(8), 1867; https://doi.org/10.3390/en19081867 - 10 Apr 2026

Viewed by 415

Abstract

The rapid electrification of transportation and proliferation of distributed energy resources (DERs) are transforming distribution grids into highly dynamic, data-intensive, and cyber-physical systems. While reinforcement learning (RL), multi-agent coordination, and edge computing offer powerful tools for adaptive control, their deployment in safety-critical utility [...] Read more.

The rapid electrification of transportation and proliferation of distributed energy resources (DERs) are transforming distribution grids into highly dynamic, data-intensive, and cyber-physical systems. While reinforcement learning (RL), multi-agent coordination, and edge computing offer powerful tools for adaptive control, their deployment in safety-critical utility environments raises concerns regarding stability, certification compatibility, cyber-resilience, and regulatory acceptance. This paper presents an architecture-centric framework for edge-intelligent and cyber-resilient coordination of electric vehicles (EVs) and DERs that reconciles adaptive learning with deterministic safety guarantees. The proposed hierarchical edge–cloud architecture integrates multi-agent system (MAS) coordination, constraint-invariant reinforcement learning, and embedded cybersecurity mechanisms within a structured control hierarchy. Learning-enabled edge agents operate exclusively within standards-compliant safety envelopes enforced through supervisory constraint projection, control barrier functions, and Lyapunov-consistent stability safeguards. Protection-critical functions remain deterministic and isolated from adaptive layers, preserving compatibility with IEEE 1547 and existing utility protection schemes. The framework further incorporates anomaly triggered policy freezing, fail-safe fallback modes, and communication-aware resilience mechanisms to prevent unsafe transient behavior in non-stationary, distributed environments. Unlike simulation-only learning approaches, the architecture embeds progressive validation through software-in-the-loop (SIL), hardware-in-the-loop (HIL), and power hardware-in-the-loop (PHIL) testing to empirically verify transient stability, constraint compliance, and cyber-resilience under realistic timing and disturbance conditions. Beyond technical performance, the paper situates edge intelligence within standards evolution, governance structures, workforce transformation, techno-economic assessment, and equitable deployment pathways. By framing adaptive control as a bounded, auditable augmentation layer rather than a disruptive replacement for certified infrastructure, the proposed architecture provides a pragmatic roadmap for evolutionary modernization of distribution systems. Full article

(This article belongs to the Section E: Electric Vehicles)

► Show Figures

Figure 1

32 pages, 9226 KB

Open AccessArticle

Regenerative–Frictional Brake Blending in Electric Vehicles Considering Energy Recovery and Dynamic Battery Charging Limit: A Reinforcement Learning-Based Approach

by Farshid Naseri, Bjartur Ragnarsson a Nordi, Konstantinos Spiliotopoulos and Erik Schaltz

Machines 2026, 14(4), 416; https://doi.org/10.3390/machines14040416 - 9 Apr 2026

Viewed by 392

Abstract

This paper presents the design, development, and evaluation of a Reinforcement Learning (RL)–based torque-split controller for the regenerative braking system (RBS) in battery electric vehicles (BEVs). The controller employs a Deep Deterministic Policy Gradient (DDPG) agent to distribute the braking demand between regenerative [...] Read more.

This paper presents the design, development, and evaluation of a Reinforcement Learning (RL)–based torque-split controller for the regenerative braking system (RBS) in battery electric vehicles (BEVs). The controller employs a Deep Deterministic Policy Gradient (DDPG) agent to distribute the braking demand between regenerative and frictional braking systems with the aim of maximizing energy recovery while adhering to the physical and operational constraints. To capture the charging limitation of the battery, a State-of-Power (SoP) calculation mechanism is incorporated, providing a time-varying bound on the regenerative charge power. The agent is trained in a MATLAB/Simulink environment representing the digital twin of a BEV drivetrain, and considers a mix of different braking scenarios, i.e., light braking, medium braking, hard braking, and emergency braking. The RL’s reward shaping promotes efficient utilization of the SoP-limited regenerative capability while discouraging constraint violations and aggressive control behavior. Across a range of State-of-Charge (SoC) conditions and driving cycles, including the Worldwide Harmonized Light–Vehicle Test Procedure (WLTP) and synthetic random-rich driving cycle, the RL controller consistently delivers promising performance, yielding energy recovery of up to ~98% of the total braking energy available on WLTP type 3 driving cycle while being able to operate closely to the battery SoP limit. The results demonstrate the proposed controller’s capability for adaptive, constraint-aware energy management in BEVs and underline its potential for future intelligent braking strategies. Full article

(This article belongs to the Special Issue Next-Level Energy Storage Solutions for Electric Road and Maritime Mobility: Innovations in Cost, Performance and Safety)

► Show Figures

Figure 1

31 pages, 4265 KB

Open AccessArticle

Sustainable Grid-Compliant Rooftop PV Curtailment via LQR-Based Active Power Regulation and QPSO–RL MPPT in a Three-Switch Micro-Inverter

by Ganesh Moorthy Jagadeesan, Kanagaraj Nallaiyagounder, Vijayakumar Madhaiyan and Qutubuddin Mohammed

Sustainability 2026, 18(8), 3674; https://doi.org/10.3390/su18083674 - 8 Apr 2026

Viewed by 188

Abstract

The increasing penetration of rooftop photovoltaic (RTPV) systems in low-voltage (LV) distribution networks introduces challenges such as voltage rises, reverse power flow, and reduced hosting capacity, thereby necessitating effective active power regulation (APR) in module-level micro-inverters. This paper proposes a dual-layer control framework [...] Read more.

The increasing penetration of rooftop photovoltaic (RTPV) systems in low-voltage (LV) distribution networks introduces challenges such as voltage rises, reverse power flow, and reduced hosting capacity, thereby necessitating effective active power regulation (APR) in module-level micro-inverters. This paper proposes a dual-layer control framework for a 250 watt-peak (Wp) three-switch rooftop PV micro-inverter, integrating quantum-behaved particle swarm optimization with reinforcement learning (QPSO-RL) for accurate maximum power point tracking (MPPT) and a linear quadratic regulator (LQR) for reserve-aware APR. The QPSO-RL algorithm improves available-power estimation under varying irradiance, temperature, and partial-shading conditions, while the LQR-based controller ensures fast, well-damped, and grid-compliant power regulation. The proposed framework was developed and validated using MATLAB/Simulink 2024 for simulation studies and LabVIEW with NI myRIO 2022 for real-time hardware implementation. Both simulation and experimental results confirm that the proposed method achieves 99.5% MPPT accuracy, convergence within 20 ms, grid-injected current total harmonic distortion (THD) below 3%, and a near-unity power factor. In addition, the reserve-based regulation strategy improves feeder compliance and reduces converter stress, thereby supporting reliable rooftop PV integration. These results demonstrate that the proposed QPSO-RL + LQR framework offers a practical and intelligent solution for high-performance, grid-supportive rooftop PV micro-inverter applications. Full article

(This article belongs to the Section Energy Sustainability)

► Show Figures

Figure 1

17 pages, 4078 KB

Open AccessArticle

Simulation-Driven Approach to Evaluate a Reinforcement Learning-Based Navigation System for Last-Mile Drone Logistics

by Zakaria Benali and Amina Hamoud

Vehicles 2026, 8(4), 85; https://doi.org/10.3390/vehicles8040085 - 8 Apr 2026

Viewed by 294

Abstract

Unmanned Aerial Systems (UAS) offer sustainable solutions for urban last-mile logistics, yet existing navigation algorithms struggle with the complexity of dynamic metropolitan environments. This study optimises a reinforcement learning (RL)-based guidance, navigation, and control (GNC) algorithm using a Proximal Policy Optimisation (PPO) model [...] Read more.

Unmanned Aerial Systems (UAS) offer sustainable solutions for urban last-mile logistics, yet existing navigation algorithms struggle with the complexity of dynamic metropolitan environments. This study optimises a reinforcement learning (RL)-based guidance, navigation, and control (GNC) algorithm using a Proximal Policy Optimisation (PPO) model within a high-fidelity simulation of Bristol City Centre. The primary contribution is training the RL model to autonomously detect and avoid dynamic obstacles, specifically manned aircraft, to ensure safe and legal drone operations. Additionally, flight operations are continuously monitored via a Structured Query Language (SQL) database to verify compliance with low airspace regulations. Simulation results demonstrate that the proposed framework achieves high obstacle detection accuracy under nominal conditions, while the implementation of curriculum learning significantly enhances the system’s adaptability and recovery capabilities during high-speed, dynamic encounters. Full article

► Show Figures

Figure 1

28 pages, 3267 KB

Open AccessArticle

A Hierarchical Dynamic Path Planning Framework for Autonomous Vehicles Based on Physics-Informed Potential Field and TD3 Reinforcement Learning

by Yan Pan, Yu Wang and Bin Ran

Appl. Sci. 2026, 16(7), 3610; https://doi.org/10.3390/app16073610 - 7 Apr 2026

Viewed by 303

Abstract

Autonomous driving in dense traffic demands policies that ensure safety, accurate path tracking, and ride comfort, yet reinforcement learning (RL) alone suffers from low sample efficiency and weak safety guarantees, while classical artificial potential field (APF) methods lack adaptability to dynamic scenarios. This [...] Read more.

Autonomous driving in dense traffic demands policies that ensure safety, accurate path tracking, and ride comfort, yet reinforcement learning (RL) alone suffers from low sample efficiency and weak safety guarantees, while classical artificial potential field (APF) methods lack adaptability to dynamic scenarios. This paper proposes PIPF-TD3, which integrates APF theory with the Twin Delayed Deep Deterministic Policy Gradient (TD3) by embedding composite potential values and Doppler-weighted gradients as physics-informed features into the state vector. A Hybrid A* planner generates a reference path encoded as an attractive field; repulsive fields model nearby obstacles using real-time perception data; and a multi-objective reward function jointly optimizes path tracking, collision avoidance, and ride comfort. Experiments in CARLA 0.9.14 across two scenarios—a highway segment with mixed obstacles and a signalized intersection with conflicting turning movements—show that PIPF-TD3 achieves 100% task completion with zero collisions, whereas TD3 without potential field guidance suffers a 90% collision rate. PIPF-TD3 reduces mean cross-track error to 0.12 m (72.1% reduction over the rule-based FSM baseline), maintains 67.0% larger safety clearance, and yields RMS longitudinal and lateral accelerations of 1.12 and 0.75 m/s², outperforming the FSM by 37.1% and 42.7%. These results confirm that Doppler-weighted physical priors substantially enhance RL-based driving safety and quality in complex traffic conditions. Full article

(This article belongs to the Section Transportation and Future Mobility)

► Show Figures

Figure 1

18 pages, 2716 KB

Open AccessArticle

Reducing Port Container Congestion with Reinforcement Learning: The Serial Mediation Role of Operational Learning Stability and Logistics Efficiency

by Md. Mizanur Rahman, Jianqiang Fan, Edvard Tijan and Umma Al Fateha

J. Mar. Sci. Eng. 2026, 14(7), 687; https://doi.org/10.3390/jmse14070687 - 7 Apr 2026

Viewed by 399

Abstract

Container congestion remains a persistent operational challenge in seaports because berth, yard, and gate processes are tightly coupled, demand is volatile, and control actions often operate under delayed feedback. Reinforcement learning (RL) is increasingly proposed for adaptive terminal decision support, yet the literature [...] Read more.

Container congestion remains a persistent operational challenge in seaports because berth, yard, and gate processes are tightly coupled, demand is volatile, and control actions often operate under delayed feedback. Reinforcement learning (RL) is increasingly proposed for adaptive terminal decision support, yet the literature still says little about the mechanism through which RL may reduce congestion in practice. This study therefore develops a simulation-based mechanism framework in which RL improves congestion outcomes primarily by increasing Operational Learning Stability (OLStab), defined here as the consistency and governability of learning-enabled operational decisions under variability and disruption. A queueing-based, gate-focused terminal simulator is used as the data-generating process, with gate congestion treated as a reduced-form proxy for broader terminal congestion pressure. The statistical layer is interpreted cautiously as an internal mechanism consistency check within synthetic data rather than as empirical causal identification. Results show that RL is strongly associated with higher OLStab and that OLStab is the dominant pathway linking RL to lower congestion pressure in the simulated environment. Logistics Efficiency (LE) is directionally consistent with congestion reduction in bivariate analysis but adds limited incremental mediation once OLStab is jointly modeled. The theorized moderation by Decision Latency Sensitivity (DLS) is not robustly recovered within the examined latency range. Overall, the study contributes a more bounded explanation of how RL may reduce congestion in a designed gate-focused terminal control environment and highlights learning stability as a practical screening criterion for future digital twin and pilot deployment studies. Full article

(This article belongs to the Special Issue Maritime Ports Energy Infrastructure)

► Show Figures

Figure 1

32 pages, 1364 KB

Open AccessArticle

XRL-LLM: Explainable Reinforcement Learning Framework for Voltage Control

by Shrenik Jadhav, Birva Sevak and Van-Hai Bui

Energies 2026, 19(7), 1789; https://doi.org/10.3390/en19071789 - 6 Apr 2026

Viewed by 416

Abstract

Reinforcement learning (RL) agents are increasingly deployed for voltage control in power distribution networks. However, their opaque decision-making creates a significant trust barrier, limiting their adoption in safety-sensitive operational settings. This paper presents XRL-LLM, a novel framework that generates natural language explanations for [...] Read more.

Reinforcement learning (RL) agents are increasingly deployed for voltage control in power distribution networks. However, their opaque decision-making creates a significant trust barrier, limiting their adoption in safety-sensitive operational settings. This paper presents XRL-LLM, a novel framework that generates natural language explanations for RL control decisions by combining game-theoretic feature attribution (KernelSHAP) with large language model (LLM) reasoning grounded in power systems domain knowledge. We deployed a Proximal Policy Optimization (PPO) agent on an IEEE 33-bus network to coordinate capacitor banks and on-load tap changers, successfully reducing voltage violations by 90.5% across diverse loading conditions. To make these decisions interpretable, KernelSHAP identifies the most influential state features. These features are then processed by a domain-context-engineered LLM prompt that explicitly encodes network topology, device specifications, and ANSI C84.1 voltage limits.Evaluated via G-Eval across 30 scenarios, XRL-LLM achieves an explanation quality score of 4.13/5. This represents a 33.7% improvement over template-based generation and a 67.9% improvement over raw SHAP outputs, delivering statistically significant gains in accuracy, actionability, and completeness (

p < 0.001

, Cohen’s d values up to 4.07). Additionally, a physics-grounded counterfactual verification procedure, which perturbs the underlying power flow model, confirms a causal faithfulness of 0.81 under critical loading. Finally, five ablation studies yield three broader insights. First, structured domain context engineering produces synergistic quality gains that exceed any single knowledge component, demonstrating that prompt composition matters more than the choice of foundational model. Second, even an open source 8B-parameter model outperforms templates given the same prompt, confirming the framework’s backbone-agnostic value. Most importantly, counterfactual faithfulness increases alongside load severity, indicating that post hoc attributions are most reliable in the high-stakes regimes where trustworthy explanations matter most. Full article

(This article belongs to the Special Issue Flexible and Secure Operation of Multi-Scenario Integrated Energy System Coupled with Electricity and Hydrogen)

► Show Figures

Figure 1

34 pages, 5130 KB

Open AccessArticle

Sampling as a Structural Constraint for Stable Multitask Offline Reinforcement Learning

by Chayoung Kim

Appl. Sci. 2026, 16(7), 3511; https://doi.org/10.3390/app16073511 - 3 Apr 2026

Viewed by 187

Abstract

Multitask offline reinforcement learning (RL) faces severe instabilities due to heterogeneous data distributions and interference in shared function approximators. Although previous studies address these issues through network architecture modifications, we reinterpret sampling as a structural constraint instead of a performance optimization technique. We [...] Read more.

Multitask offline reinforcement learning (RL) faces severe instabilities due to heterogeneous data distributions and interference in shared function approximators. Although previous studies address these issues through network architecture modifications, we reinterpret sampling as a structural constraint instead of a performance optimization technique. We propose a two-stage sampling framework. Task-balanced sampling ensures equal task representation in each batch, whereas within-task pairwise ranking maintains relative quality ordering without cross-task value-scale interference. This design promotes stable gradient contributions from the shared function approximators. Through ablation studies on continuous control benchmarks, we demonstrate that removing the pairwise ranking at

20 K

steps leads to systematic performance degradation across all tasks. Notably, the Hopper task collapses immediately after constraint removal, losing

85 %

of performance within

1 K

steps. This demonstrates that pairwise ranking is not a temporary warm-up but a persistent constraint essential throughout training. Our findings establish sampling as a fundamental structural element in multitask offline RL, achieving stability without network architecture modifications. Full article

► Show Figures

Figure 1

Search Results (1,172)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,172)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI