MDPI - Publisher of Open Access Journals

31 pages, 6262 KB

Open AccessArticle

Profit-Oriented Multi-Objective Dynamic Flexible Job Shop Scheduling with Multi-Agent Framework Under Uncertain Production Orders

by Qingyao Ma, Yao Lu and Huawei Chen

Machines 2025, 13(10), 932; https://doi.org/10.3390/machines13100932 - 9 Oct 2025

Viewed by 413

Abstract

In the highly competitive manufacturing environment, customers are increasingly demanding punctual, flexible, and customized deliveries, compelling enterprises to balance profit, energy efficiency, and production performance while seeking new scheduling methods to enhance dynamic responsiveness. Although deep reinforcement learning (DRL) has made progress in [...] Read more.

In the highly competitive manufacturing environment, customers are increasingly demanding punctual, flexible, and customized deliveries, compelling enterprises to balance profit, energy efficiency, and production performance while seeking new scheduling methods to enhance dynamic responsiveness. Although deep reinforcement learning (DRL) has made progress in dynamic flexible job shop scheduling, existing research has rarely addressed profit-oriented optimization. To tackle this challenge, this paper proposes a novel multi-objective dynamic flexible job shop scheduling (MODFJSP) model that aims to maximize net profit and minimize makespan on the basis of traditional FJSP. The model incorporates uncertainties such as new job insertions, fluctuating due dates, and high-profit urgent jobs, and establishes a multi-agent collaborative framework consisting of “job selection–machine assignment.” For the two types of agents, this paper proposes adaptive state representations, reward functions, and variable action spaces to achieve the dual optimization objectives. The experimental results show that the double deep Q-network (DDQN), within the multi-agent cooperative framework, outperforms PPO, DQN, and classical scheduling rules in terms of solution quality and robustness. It achieves superior performance on multiple metrics such as IGD, HV, and SC, and generates bi-objective Pareto frontiers that are closer to the ideal point. The results demonstrate the effectiveness and practical value of the proposed collaborative framework for solving MODFJSP. Full article

(This article belongs to the Section Industrial Systems)

► Show Figures

Figure 1

27 pages, 4238 KB

Open AccessArticle

A Scalable Reinforcement Learning Framework for Ultra-Reliable Low-Latency Spectrum Management in Healthcare Internet of Things

by Adeel Iqbal, Ali Nauman, Tahir Khurshaid and Sang-Bong Rhee

Mathematics 2025, 13(18), 2941; https://doi.org/10.3390/math13182941 - 11 Sep 2025

Viewed by 502

Abstract

Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This [...] Read more.

Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This paper proposed a multi-agent reinforcement learning (MARL) framework for dynamic, priority-aware spectrum management (PASM), where cooperative MARL agents jointly optimize throughput, latency, energy efficiency, fairness, and blocking probability under varying traffic and channel conditions. Six learning strategies are developed and compared, including Q-Learning, Double Q-Learning, Deep Q-Network (DQN), Actor–Critic, Dueling DQN, and Proximal Policy Optimization (PPO), within a simulated H-IoT environment that captures heterogeneous traffic, device priorities, and realistic URLLC constraints. A comprehensive simulation study across scalable scenarios ranging from 3 to 50 devices demonstrated that PPO consistently outperforms all baselines, improving mean throughput by

6.2 %

, reducing 95th-percentile delay by

11.5 %

, increasing energy efficiency by

11.9 %

, lowering blocking probability by

33.3 %

, and accelerating convergence by

75.8 %

compared to the strongest non-PPO baseline. These findings establish PPO as a robust and scalable solution for QoS-compliant spectrum management in dense H-IoT environments, while Dueling DQN emerges as a competitive deep RL alternative. Full article

(This article belongs to the Special Issue Applied Mathematics in Artificial Intelligence: Methods, Algorithms, and Applications)

► Show Figures

Figure 1

26 pages, 4054 KB

Open AccessArticle

Multi-Time-Scale Demand Response Optimization in Active Distribution Networks Using Double Deep Q-Networks

by Wei Niu, Jifeng Li, Zongle Ma, Wenliang Yin and Liang Feng

Energies 2025, 18(18), 4795; https://doi.org/10.3390/en18184795 - 9 Sep 2025

Viewed by 588

Abstract

This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle [...] Read more.

This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle (EV) participants in a 24 h rolling horizon. By incorporating a structured state representation—including forecasted load, photovoltaic (PV) output, dynamic pricing, historical DR actions, and voltage states—the agent autonomously learns control policies that minimize total operational costs while maintaining grid feasibility and voltage stability. The physical system is modeled via detailed constraints, including power flow balance, voltage magnitude bounds, PV curtailment caps, deferrable load recovery windows, and user-specific availability envelopes. A case study based on a modified IEEE 33-bus distribution network with embedded PV and DR nodes demonstrates the framework’s effectiveness. Simulation results show that the proposed method achieves significant cost savings (up to 35% over baseline), enhances PV absorption, reduces load variance by 42%, and maintains voltage profiles within safe operational thresholds. Training curves confirm smooth Q-value convergence and stable policy performance, while spatiotemporal visualizations reveal interpretable DR behavior aligned with both economic and physical system constraints. This work contributes a scalable, model-free approach for intelligent DR coordination in smart grids, integrating learning-based control with physical grid realism. The modular design allows for future extension to multi-agent systems, storage coordination, and market-integrated DR scheduling. The results position Double DQN as a promising architecture for operational decision-making in AI-enabled distribution networks. Full article

(This article belongs to the Topic Advanced Operation, Control, and Planning of Intelligent Energy Systems)

► Show Figures

Figure 1

22 pages, 763 KB

Open AccessArticle

Optimizing TSCH Scheduling for IIoT Networks Using Reinforcement Learning

by Sahar Ben Yaala, Sirine Ben Yaala and Ridha Bouallegue

Technologies 2025, 13(9), 400; https://doi.org/10.3390/technologies13090400 - 3 Sep 2025

Viewed by 665

Abstract

In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the [...] Read more.

In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the industrial Internet of Things. TSCH relies on time synchronization and channel hopping to improve performance and reduce energy consumption. Despite these characteristics, configuring an efficient schedule under varying traffic conditions and interference scenarios remains a challenging problem. The exploitation of reinforcement learning (RL) techniques offers a promising approach to address this challenge. AI enables TSCH to dynamically adapt its scheduling based on real-time network conditions, making decisions that optimize key performance criteria such as energy efficiency, reliability, and latency. By learning from the environment, reinforcement learning can reconfigure schedules to mitigate interference scenarios and meet traffic demands. In this work, we compare various reinforcement learning (RL) algorithms in the context of the TSCH environment. In particular, we evaluate the deep Q-network (DQN), double deep Q-network (DDQN), and prioritized DQN (PER-DQN). We focus on the convergence speed of these algorithms and their capacity to adapt the schedule. Our results show that the PER-DQN algorithm improves the packet delivery ratio and achieves faster convergence compared to DQN and DDQN, demonstrating its effectiveness for dynamic TSCH scheduling in Industrial IoT environments. These quantifiable improvements highlight the potential of prioritized experience replay to enhance reliability and efficiency under varying network conditions. Full article

(This article belongs to the Section Information and Communication Technologies)

► Show Figures

Figure 1

24 pages, 6077 KB

Open AccessArticle

Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads

by Yuying Fang, Pengwei Wang, Song Gao, Binbin Sun, Qing Zhang and Yuhua Zhang

Technologies 2025, 13(9), 394; https://doi.org/10.3390/technologies13090394 - 1 Sep 2025

Viewed by 719

Abstract

To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time [...] Read more.

To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time domain, a low-computational-cost adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed to realize the real-time adjustment of prediction time domain change with vehicle speed and road curvature. Secondly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm; then, the convergence speed and control effectiveness of the tracking controller are improved. Finally, hardware-in-the-loop tests and real vehicle tests are conducted. The results show that the proposed adaptive predictive horizon controller (DQN-AP-MPC) solves the problem of poor control performance caused by fixed predictive time domain and fixed weight matrix values, significantly improving the tracking accuracy of intelligent vehicles under different road conditions. Especially under variable curvature and high-speed conditions, the proposed controller reduces the maximum lateral error by 76.81% compared to the unimproved MPC controller, and reduces the average absolute error by 64.44%. The proposed controller has a faster convergence speed and better trajectory tracking performance when tested on variable curvature road conditions and double lane roads. Full article

(This article belongs to the Section Manufacturing Technology)

► Show Figures

Figure 1

30 pages, 3950 KB

Open AccessArticle

A Modular Hybrid SOC-Estimation Framework with a Supervisor for Battery Management Systems Supporting Renewable Energy Integration in Smart Buildings

by Mehmet Kurucan, Panagiotis Michailidis, Iakovos Michailidis and Federico Minelli

Energies 2025, 18(17), 4537; https://doi.org/10.3390/en18174537 - 27 Aug 2025

Cited by 2 | Viewed by 719

Abstract

Accurate state-of-charge (SOC) estimation is crucial in smart-building energy management systems, where rooftop photovoltaics and lithium-ion energy storage systems must be coordinated to align renewable generation with real-time demand. This paper introduces a novel, modular hybrid framework for SOC estimation, which synergistically combines [...] Read more.

Accurate state-of-charge (SOC) estimation is crucial in smart-building energy management systems, where rooftop photovoltaics and lithium-ion energy storage systems must be coordinated to align renewable generation with real-time demand. This paper introduces a novel, modular hybrid framework for SOC estimation, which synergistically combines the predictive power of artificial neural networks (ANNs), the logical consistency of finite state automata (FSA), and an adaptive dynamic supervisor layer. Three distinct ANN architectures—feedforward neural network (FFNN), long short-term memory (LSTM), and 1D convolutional neural network (1D-CNN)—are employed to extract comprehensive temporal and spatial features from raw data. The inherent challenge of ANNs producing physically irrational SOC values is handled by processing their raw predictions through an FSA module, which constrains physical validity by applying feasible transitions and domain constraints based on battery operational states. To further enhance the adaptability and robustness of the framework, two advanced supervisor mechanisms are developed for model selection during estimation. A lightweight rule-based supervisor picks a model transparently using recent performance scores and quick signal heuristics, whereas a more advanced double deep Q-network (DQN) reinforcement-learning supervisor continuously learns from reward feedback to adaptively choose the model that minimizes SOC error under changing conditions. This RL agent dynamically selects the most suitable ANN+FSA model, significantly improving performance under varying and unpredictable operational conditions. Comprehensive experimental validation demonstrates that the hybrid approach consistently outperforms raw ANN predictions and conventional extended Kalman filter (EKF)-based methods. Notably, the RL-based supervisor exhibits good adaptability and achieves lower error results in challenging high-variance scenarios. Full article

(This article belongs to the Section G: Energy and Buildings)

► Show Figures

Figure 1

24 pages, 11770 KB

Open AccessArticle

Secure Communication and Resource Allocation in Double-RIS Cooperative-Aided UAV-MEC Networks

by Xi Hu, Hongchao Zhao, Dongyang He and Wujie Zhang

Drones 2025, 9(8), 587; https://doi.org/10.3390/drones9080587 - 19 Aug 2025

Viewed by 690

Abstract

In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC [...] Read more.

In complex urban wireless environments, unmanned aerial vehicle–mobile edge computing (UAV-MEC) systems face challenges like link blockage and single-antenna eavesdropping threats. The traditional single reconfigurable intelligent surface (RIS), limited in collaboration, struggles to address these issues. This paper proposes a double-RIS cooperative UAV-MEC optimization scheme, leveraging their joint reflection to build multi-dimensional signal paths, boosting legitimate link gains while suppressing eavesdropping channels. It considers double-RIS phase shifts, ground user (GU) transmission power, UAV trajectories, resource allocation, and receiving beamforming, aiming to maximize secure energy efficiency (EE) while ensuring long-term stability of GU and UAV task queues. Given random task arrivals and high-dimensional variable coupling, a dynamic model integrating queue stability and secure transmission constraints is built using Lyapunov optimization, transforming long-term stochastic optimization into slot-by-slot deterministic decisions via the drift-plus-penalty method. To handle high-dimensional continuous spaces, an end-to-end proximal policy optimization (PPO) framework is designed for online learning of multi-dimensional resource allocation and direct acquisition of joint optimization strategies. Simulation results show that compared with benchmark schemes (e.g., single RIS, non-cooperative double RIS) and reinforcement learning algorithms (e.g., advantage actor–critic (A2C), deep deterministic policy gradient (DDPG), deep Q-network (DQN)), the proposed scheme achieves significant improvements in secure EE and queue stability, with faster convergence and better optimization effects, fully verifying its superiority and robustness in complex scenarios. Full article

(This article belongs to the Section Drone Communications)

► Show Figures

Figure 1

28 pages, 4548 KB

Open AccessArticle

A Deep Reinforcement Learning Framework for Strategic Indian NIFTY 50 Index Trading

by Raj Gaurav Mishra, Dharmendra Sharma, Mahipal Gadhavi, Sangeeta Pant and Anuj Kumar

AI 2025, 6(8), 183; https://doi.org/10.3390/ai6080183 - 11 Aug 2025

Viewed by 1899

Abstract

This paper presents a comprehensive deep reinforcement learning (DRL) framework for developing strategic trading models tailored to the Indian NIFTY 50 index, leveraging the temporal and nonlinear nature of financial markets. Three advanced DRL architectures deep Q-network (DQN), double deep Q-network (DDQN), and [...] Read more.

This paper presents a comprehensive deep reinforcement learning (DRL) framework for developing strategic trading models tailored to the Indian NIFTY 50 index, leveraging the temporal and nonlinear nature of financial markets. Three advanced DRL architectures deep Q-network (DQN), double deep Q-network (DDQN), and dueling double deep Q-network (Dueling DDQN) were implemented and empirically evaluated. Using a decade-long dataset of 15-min interval OHLC data enriched with technical indicators such as the exponential moving average (EMA), pivot points, and multiple supertrend configurations, the models were trained using prioritized experience replay, epsilon-greedy exploration strategies, and softmax sampling mechanisms. A test set comprising one year of unseen data (May 2024–April 2025) was used to assess generalization performance across key financial metrics, including Sharpe ratio, profit factor, win rate, and trade frequency. Each architecture was analyzed in three progressively sophisticated variants incorporating enhancements in reward shaping, exploration–exploitation balancing, and penalty-based trade constraints. DDQN V3 achieved a Sharpe ratio of 0.7394, a 73.33% win rate, and a 16.58 profit factor across 15 trades, indicating strong volatility-adjusted suitability for real-world deployment. In contrast, the Dueling DDQN V3 achieved a high Sharpe ratio of 1.2278 and a 100% win rate but with only three trades, indicating an excessive conservatism. The DQN V1 model served as a strong baseline, outperforming passive strategies but exhibiting limitations due to Q-value overestimation. The novelty of this work lies in its systematic exploration of DRL variants integrated with enhanced exploration mechanisms and reward–penalty structures, rigorously applied to high-frequency trading on the NIFTY 50 index within an emerging market context. Our findings underscore the critical importance of architectural refinements, dynamic exploration strategies, and trade regularization in stabilizing learning and enhancing profitability in DRL-based intelligent trading systems. Full article

(This article belongs to the Special Issue AI in Finance: Leveraging AI to Transform Financial Services)

► Show Figures

Figure 1

17 pages, 3062 KB

Open AccessArticle

Spatiotemporal Risk-Aware Patrol Planning Using Value-Based Policy Optimization and Sensor-Integrated Graph Navigation in Urban Environments

by Swarnamouli Majumdar, Anjali Awasthi and Lorant Andras Szolga

Appl. Sci. 2025, 15(15), 8565; https://doi.org/10.3390/app15158565 - 1 Aug 2025

Viewed by 754

Abstract

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal [...] Read more.

This study proposes an intelligent patrol planning framework that leverages reinforcement learning, spatiotemporal crime forecasting, and simulated sensor telemetry to optimize autonomous vehicle (AV) navigation in urban environments. Crime incidents from Washington DC (2024–2025) and Seattle (2008–2024) are modeled as a dynamic spatiotemporal graph, capturing the evolving intensity and distribution of criminal activity across neighborhoods and time windows. The agent’s state space incorporates synthetic AV sensor inputs—including fuel level, visual anomaly detection, and threat signals—to reflect real-world operational constraints. We evaluate and compare three learning strategies: Deep Q-Network (DQN), Double Deep Q-Network (DDQN), and Proximal Policy Optimization (PPO). Experimental results show that DDQN outperforms DQN in convergence speed and reward accumulation, while PPO demonstrates greater adaptability in sensor-rich, high-noise conditions. Real-map simulations and hourly risk heatmaps validate the effectiveness of our approach, highlighting its potential to inform scalable, data-driven patrol strategies in next-generation smart cities. Full article

(This article belongs to the Special Issue AI-Aided Intelligent Vehicle Positioning in Urban Areas)

► Show Figures

Figure 1

27 pages, 3211 KB

Open AccessArticle

Hybrid Deep Learning-Reinforcement Learning for Adaptive Human-Robot Task Allocation in Industry 5.0

by Claudio Urrea

Systems 2025, 13(8), 631; https://doi.org/10.3390/systems13080631 - 26 Jul 2025

Cited by 1 | Viewed by 1818

Abstract

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural [...] Read more.

Human-Robot Collaboration (HRC) is pivotal for flexible, worker-centric manufacturing in Industry 5.0, yet dynamic task allocation remains difficult because operator states—fatigue and skill—fluctuate abruptly. I address this gap with a hybrid framework that couples real-time perception and double-estimating reinforcement learning. A Convolutional Neural Network (CNN) classifies nine fatigue–skill combinations from synthetic physiological cues (heart-rate, blink rate, posture, wrist acceleration); its outputs feed a Double Deep Q-Network (DDQN) whose state vector also includes task-queue and robot-status features. The DDQN optimises a multi-objective reward balancing throughput, workload and safety and executes at 10 Hz within a closed-loop pipeline implemented in MATLAB R2025a and RoboDK v5.9. Benchmarking on a 1000-episode HRC dataset (2500 allocations·episode⁻¹) shows the hybrid CNN+DDQN controller raises throughput to 60.48 ± 0.08 tasks·min⁻¹ (+21% vs. rule-based, +12% vs. SARSA, +8% vs. Dueling DQN, +5% vs. PPO), trims operator fatigue by 7% and sustains 99.9% collision-free operation (one-way ANOVA, p < 0.05; post-hoc power 1 − β = 0.87). Visual analyses confirm responsive task reallocation as fatigue rises or skill varies. The approach outperforms strong baselines (PPO, A3C, Dueling DQN) by mitigating Q-value over-estimation through double learning, providing robust policies under stochastic human states and offering a reproducible blueprint for multi-robot, Industry 5.0 factories. Future work will validate the controller on a physical Doosan H2017 cell and incorporate fairness constraints to avoid workload bias across multiple operators. Full article

(This article belongs to the Section Systems Engineering)

► Show Figures

Figure 1

19 pages, 5417 KB

Open AccessArticle

SE-TFF: Adaptive Tourism-Flow Forecasting Under Sparse and Heterogeneous Data via Multi-Scale SE-Net

by Jinyuan Zhang, Tao Cui and Peng He

Appl. Sci. 2025, 15(15), 8189; https://doi.org/10.3390/app15158189 - 23 Jul 2025

Viewed by 568

Abstract

Accurate and timely forecasting of cross-regional tourist flows is essential for sustainable destination management, yet existing models struggle with sparse data, complex spatiotemporal interactions, and limited interpretability. This paper presents SE-TFF, a multi-scale tourism-flow forecasting framework that couples a Squeeze-and-Excitation (SE) network with [...] Read more.

Accurate and timely forecasting of cross-regional tourist flows is essential for sustainable destination management, yet existing models struggle with sparse data, complex spatiotemporal interactions, and limited interpretability. This paper presents SE-TFF, a multi-scale tourism-flow forecasting framework that couples a Squeeze-and-Excitation (SE) network with reinforcement-driven optimization to adaptively re-weight environmental, economic, and social features. A benchmark dataset of 17.8 million records from 64 countries and 743 cities (2016–2024) is compiled from the Open Travel Data repository in github (OPTD) for training and validation. SE-TFF introduces (i) a multi-channel SE module for fine-grained feature selection under heterogeneous conditions, (ii) a Top-K attention filter to preserve salient context in highly sparse matrices, and (iii) a Double-DQN layer that dynamically balances prediction objectives. Experimental results show SE-TFF attains 56.5% MAE and 65.6% RMSE reductions over the best baseline (ARIMAX) at 20% sparsity, with 0.92 × 10³ average MAE across multi-task outputs. SHAP analysis ranks climate anomalies, tourism revenue, and employment as dominant predictors. These gains demonstrate SE-TFF’s ability to deliver real-time, interpretable forecasts for data-limited destinations. Future work will incorporate real-time social media signals and larger multimodal datasets to enhance generalizability. Full article

► Show Figures

Figure 1

18 pages, 1138 KB

Open AccessArticle

Intelligent Priority-Aware Spectrum Access in 5G Vehicular IoT: A Reinforcement Learning Approach

by Adeel Iqbal, Tahir Khurshaid and Yazdan Ahmad Qadri

Sensors 2025, 25(15), 4554; https://doi.org/10.3390/s25154554 - 23 Jul 2025

Viewed by 637

Abstract

Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning [...] Read more.

Efficient and intelligent spectrum access is crucial for meeting the diverse Quality of Service (QoS) demands of Vehicular Internet of Things (V-IoT) systems in next-generation cellular networks. This work proposes a novel reinforcement learning (RL)-based priority-aware spectrum management (RL-PASM) framework, a centralized self-learning priority-aware spectrum management framework operating through Roadside Units (RSUs). RL-PASM dynamically allocates spectrum resources across three traffic classes: high-priority (HP), low-priority (LP), and best-effort (BE), utilizing reinforcement learning (RL). This work compares four RL algorithms: Q-Learning, Double Q-Learning, Deep Q-Network (DQN), and Actor-Critic (AC) methods. The environment is modeled as a discrete-time Markov Decision Process (MDP), and a context-sensitive reward function guides fairness-preserving decisions for access, preemption, coexistence, and hand-off. Extensive simulations conducted under realistic vehicular load conditions evaluate the performance across key metrics, including throughput, delay, energy efficiency, fairness, blocking, and interruption probability. Unlike prior approaches, RL-PASM introduces a unified multi-objective reward formulation and centralized RSU-based control to support adaptive priority-aware access for dynamic vehicular environments. Simulation results confirm that RL-PASM balances throughput, latency, fairness, and energy efficiency, demonstrating its suitability for scalable and resource-constrained deployments. The results also demonstrate that DQN achieves the highest average throughput, followed by vanilla QL. DQL and AC maintain fairness at high levels and low average interruption probability. QL demonstrates the lowest average delay and the highest energy efficiency, making it a suitable candidate for edge-constrained vehicular deployments. Selecting the appropriate RL method, RL-PASM offers a robust and adaptable solution for scalable, intelligent, and priority-aware spectrum access in vehicular communication infrastructures. Full article

(This article belongs to the Special Issue Emerging Trends in Next-Generation mmWave Cognitive Radio Networks)

► Show Figures

Figure 1

24 pages, 8227 KB

Open AccessArticle

Application of Dueling Double Deep Q-Network for Dynamic Traffic Signal Optimization: A Case Study in Danang City, Vietnam

by Tho Cao Phan, Viet Dinh Le and Teron Nguyen

Mach. Learn. Knowl. Extr. 2025, 7(3), 65; https://doi.org/10.3390/make7030065 - 14 Jul 2025

Viewed by 1264

Abstract

This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world [...] Read more.

This study investigates the application of the Dueling Double Deep Q-Network (3DQN) algorithm to optimize traffic signal control at a major urban intersection in Danang City, Vietnam. The objective is to enhance signal timing efficiency in response to mixed traffic flow and real-world traffic dynamics. A simulation environment was developed using the Simulation of Urban Mobility (SUMO) software version 1.11, incorporating both a fixed-time signal controller and two 3DQN models trained with 1 million (1M-Step) and 5 million (5M-Step) iterations. The models were evaluated using randomized traffic demand scenarios ranging from 50% to 150% of baseline traffic volumes. The results demonstrate that the 3DQN models outperform the fixed-time controller, significantly reducing vehicle delays, with the 5M-Step model achieving average waiting times of under five minutes. To further assess the model’s responsiveness to real-time conditions, traffic flow data were collected using YOLOv8 for object detection and SORT for vehicle tracking from live camera feeds, and integrated into the SUMO-3DQN simulation. The findings highlight the robustness and adaptability of the 3DQN approach, particularly under peak traffic conditions, underscoring its potential for deployment in intelligent urban traffic management systems. Full article

► Show Figures

Graphical abstract

23 pages, 20322 KB

Open AccessArticle

An Intelligent Path Planning System for Urban Airspace Monitoring: From Infrastructure Assessment to Strategic Optimization

by Qianyu Liu, Wei Dai, Zichun Yan and Claudio J. Tessone

Smart Cities 2025, 8(3), 100; https://doi.org/10.3390/smartcities8030100 - 19 Jun 2025

Viewed by 838

Abstract

Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that [...] Read more.

Urban Air Mobility (UAM) requires reliable communication and surveillance infrastructures to ensure safe Unmanned Aerial Vehicle (UAV) operations in dense metropolitan environments. However, urban infrastructure is inherently heterogeneous, leading to significant spatial variations in monitoring performance. This study proposes a unified framework that integrates infrastructure readiness assessment with Deep Reinforcement Learning (DRL)-based UAV path planning. Using Singapore as a representative case, we employ a data-driven methodology combining clustering analysis and in situ measurements to estimate the citywide distribution of surveillance quality. We then introduce an infrastructure-aware path planning algorithm based on a Double Deep Q-Network (DQN) with a convolutional architecture, which enables UAVs to learn efficient trajectories while avoiding surveillance blind zones. Extensive simulations demonstrate that the proposed approach significantly improves path success rates, reduces traversal through poorly monitored regions, and maintains high navigation efficiency. These results highlight the potential of combining infrastructure modeling with DRL to support performance-aware airspace operations and inform future UAM governance systems. Full article

► Show Figures

Figure 1

19 pages, 3650 KB

Open AccessArticle

Enhanced-Dueling Deep Q-Network for Trustworthy Physical Security of Electric Power Substations

by Nawaraj Kumar Mahato, Junfeng Yang, Jiaxuan Yang, Gangjun Gong, Jianhong Hao, Jing Sun and Jinlu Liu

Energies 2025, 18(12), 3194; https://doi.org/10.3390/en18123194 - 18 Jun 2025

Viewed by 587

Abstract

This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, [...] Read more.

This paper introduces an Enhanced-Dueling Deep Q-Network (EDDQN) specifically designed to bolster the physical security of electric power substations. We model the intricate substation security challenge as a Markov Decision Process (MDP), segmenting the facility into three zones, each with potential normal, suspicious, or attacked states. The EDDQN agent learns to strategically select security actions, aiming for optimal threat prevention while minimizing disruptive errors and false alarms. This methodology integrates Double DQN for stable learning, Prioritized Experience Replay (PER) to accelerate the learning process, and a sophisticated neural network architecture tailored to the complexities of multi-zone substation environments. Empirical evaluation using synthetic data derived from historical incident patterns demonstrates the significant advantages of EDDQN over other standard DQN variations, yielding an average reward of 7.5, a threat prevention success rate of 91.1%, and a notably low false alarm rate of 0.5%. The learned action policy exhibits a proactive security posture, establishing EDDQN as a promising and reliable intelligent solution for enhancing the physical resilience of power substations against evolving threats. This research directly addresses the critical need for adaptable and intelligent security mechanisms within the electric power infrastructure. Full article

(This article belongs to the Special Issue Energy, Electrical and Power Engineering: 3rd Edition)

► Show Figures

Graphical abstract

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI