Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (855)

Search Parameters:
Keywords = Deep Q-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 1338 KB  
Article
EVMC: An Energy-Efficient Virtual Machine Consolidation Approach Based on Deep Q-Networks for Cloud Data Centers
by Peiying Zhang, Jingfei Gao, Jing Liu and Lizhuang Tan
Electronics 2025, 14(19), 3813; https://doi.org/10.3390/electronics14193813 - 26 Sep 2025
Abstract
As the mainstream computing paradigm, cloud computing breaks the physical rigidity of traditional resource models and provides heterogeneous computing resources, better meeting the diverse needs of users. However, the frequent creation and termination of virtual machines (VMs) tends to induce resource fragmentation, resulting [...] Read more.
As the mainstream computing paradigm, cloud computing breaks the physical rigidity of traditional resource models and provides heterogeneous computing resources, better meeting the diverse needs of users. However, the frequent creation and termination of virtual machines (VMs) tends to induce resource fragmentation, resulting in resource wastage in cloud data centers. Virtual machine consolidation (VMC) technology effectively improves resource utilization by intelligently migrating virtual machines onto fewer physical hosts. However, most existing approaches lack rational host detection mechanisms and efficient migration strategies, often neglecting quality of service (QoS) guarantees while optimizing energy consumption, which can easily lead to Service Level Agreement Violations (SLAVs). To address these challenges, this paper proposes an energy-efficient virtual machine consolidation method (EVMC). First, a co-location coefficient model is constructed to detect the fewest suitable VMs on hosts. Then, leveraging the environment-aware decision-making capability of the DQN agent, dynamic VM migration strategies are implemented. Experimental results demonstrate that EVMC outperforms existing state-of-the-art approaches in terms of energy consumption and SLAV rate, showcasing its effectiveness and potential for practical application. Full article
Show Figures

Figure 1

15 pages, 1698 KB  
Article
AI-Driven Energy-Efficient Data Aggregation and Routing Protocol Modeling to Maximize Network Lifetime in Wireless Sensor Networks
by R. Arun Chakravarthy, C. Sureshkumar, M. Arun and M. Bhuvaneswari
NDT 2025, 3(4), 22; https://doi.org/10.3390/ndt3040022 - 25 Sep 2025
Abstract
The research work presents an artificial intelligence-driven, energy-aware data aggregation and routing protocol for wireless sensor networks (WSNs) with the primary objective of extending overall network lifetime. The proposed scheme leverages reinforcement learning in conjunction with deep Q-networks (DQNs) to adaptively optimize both [...] Read more.
The research work presents an artificial intelligence-driven, energy-aware data aggregation and routing protocol for wireless sensor networks (WSNs) with the primary objective of extending overall network lifetime. The proposed scheme leverages reinforcement learning in conjunction with deep Q-networks (DQNs) to adaptively optimize both Cluster Head (CH) selection and routing decisions. An adaptive clustering mechanism is introduced wherein factors such as residual node energy, spatial proximity, and traffic load are jointly considered to elect suitable CHs. This approach mitigates premature energy depletion at individual nodes and promotes balanced energy consumption across the network, thereby enhancing node sustainability. For data forwarding, the routing component employs a DQN-based strategy to dynamically identify energy-efficient transmission paths, ensuring reduced communication overhead and reliable sink connectivity. Performance evaluation, conducted through extensive simulations, utilizes key metrics including network lifetime, total energy consumption, packet delivery ratio (PDR), latency, and load distribution. Comparative analysis with baseline protocols such as LEACH, PEGASIS, and HEED demonstrates that the proposed protocol achieves superior energy efficiency, higher packet delivery reliability, and lower packet losses, while adapting effectively to varying network dynamics. The experimental outcomes highlight the scalability and robustness of the protocol, underscoring its suitability for diverse WSN applications including environmental monitoring, surveillance, and Internet of Things (IoT)-oriented deployments. Full article
Show Figures

Figure 1

16 pages, 3480 KB  
Article
Reinforcement Learning for Robot Assisted Live Ultrasound Examination
by Chenyang Li, Tao Zhang, Ziqi Zhou, Baoliang Zhao, Peng Zhang and Xiaozhi Qi
Electronics 2025, 14(18), 3709; https://doi.org/10.3390/electronics14183709 - 19 Sep 2025
Viewed by 321
Abstract
Due to its portability, non-invasiveness, and real-time capabilities, ultrasound imaging has been widely adopted for liver disease detection. However, conventional ultrasound examinations heavily rely on operator expertise, leading to high workload and inconsistent imaging quality. To address these challenges, we propose a Robotic [...] Read more.
Due to its portability, non-invasiveness, and real-time capabilities, ultrasound imaging has been widely adopted for liver disease detection. However, conventional ultrasound examinations heavily rely on operator expertise, leading to high workload and inconsistent imaging quality. To address these challenges, we propose a Robotic Ultrasound Scanning System (RUSS) based on reinforcement learning to automate the localization of standard liver planes. It can help reduce physician burden while improving scanning efficiency and accuracy. The reinforcement learning agent employs a Deep Q-Network (DQN) integrated with LSTM to control probe movements within a discrete action space, utilizing the cross-sectional area of the abdominal aorta region as the criterion for standard plane determination. System performance was comprehensively evaluated against a target standard plane, achieving an average Peak Signal-to-Noise Ratio (PSNR) of 24.51 dB and a Structural Similarity Index (SSIM) of 0.70, indicating high fidelity in the acquired images. Furthermore, a mean Dice coefficient of 0.80 for the abdominal aorta segmentation confirmed high anatomical localization accuracy. These preliminary results demonstrate the potential of our method for achieving consistent and autonomous ultrasound scanning. Full article
(This article belongs to the Topic Robot Manipulation Learning and Interaction Control)
Show Figures

Figure 1

29 pages, 3320 KB  
Article
Risk-Aware Crypto Price Prediction Using DQN with Volatility-Adjusted Rewards Across Multi-Period State Representations
by Otabek Sattarov and Fazliddin Makhmudov
Mathematics 2025, 13(18), 3012; https://doi.org/10.3390/math13183012 - 18 Sep 2025
Viewed by 600
Abstract
Forecasting Bitcoin prices remains a complex task due to the asset’s inherent and significant volatility. Traditional reinforcement learning (RL) models often rely on a single observation from the time series, potentially missing out on short-term patterns that could enhance prediction performance. This study [...] Read more.
Forecasting Bitcoin prices remains a complex task due to the asset’s inherent and significant volatility. Traditional reinforcement learning (RL) models often rely on a single observation from the time series, potentially missing out on short-term patterns that could enhance prediction performance. This study presents a Deep Q-Network (DQN) model that utilizes a multi-step state representation, incorporating consecutive historical timesteps to reflect recent market behavior more accurately. By doing so, the model can more effectively identify short-term trends under volatile conditions. Additionally, we propose a novel reward mechanism that adjusts for volatility by penalizing large prediction errors more heavily during periods of high market volatility, thereby encouraging more risk-aware forecasting behavior. We validate the effectiveness of our approach through extensive experiments on Bitcoin data across minutely, hourly, and daily timeframes. The proposed model achieves notable results, including a Mean Absolute Percentage Error (MAPE) of 10.12%, Root Mean Squared Error (RMSE) of 815.33, and Value-at-Risk (VaR) of 0.04. These outcomes demonstrate the advantages of integrating short-term temporal features and volatility sensitivity into RL frameworks for more reliable cryptocurrency price prediction. Full article
Show Figures

Figure 1

18 pages, 1430 KB  
Article
Microgrid Operation Optimization Strategy Based on CMDP-D3QN-MSRM Algorithm
by Jiayu Kang, Yushun Zeng and Qian Wei
Electronics 2025, 14(18), 3654; https://doi.org/10.3390/electronics14183654 - 15 Sep 2025
Viewed by 235
Abstract
This paper addresses the microgrid operation optimization challenges arising from the variability in and uncertainty and complex power flow constraints of distributed power sources. A novel method is proposed, based on an improved Dual-Competitive Deep Q-Network (D3QN) algorithm, which is enhanced by a [...] Read more.
This paper addresses the microgrid operation optimization challenges arising from the variability in and uncertainty and complex power flow constraints of distributed power sources. A novel method is proposed, based on an improved Dual-Competitive Deep Q-Network (D3QN) algorithm, which is enhanced by a multi-stage reward mechanism (MSRM) and formulated within a Constrained Markov Decision Process (CMDP) framework. First, the reward mechanism of the D3QN algorithm is optimized by introducing a redesigned MSRM, enhancing the training efficiency and the optimality of trained agents. Second, the microgrid operation optimization problem is modeled as a CMDP, thereby enhancing the algorithm’s capacity for handling complex constraints. Finally, numerical experiments demonstrate that our method reduces operating costs by 16.5%, achieves a better convergence performance, and curtails bus voltage fluctuations by over 40%, significantly improving the economic efficiency and operational stability of microgrids. Full article
Show Figures

Figure 1

25 pages, 2304 KB  
Article
From Anatomy to Genomics Using a Multi-Task Deep Learning Approach for Comprehensive Glioma Profiling
by Akmalbek Abdusalomov, Sabina Umirzakova, Obidjon Bekmirzaev, Adilbek Dauletov, Abror Buriboev, Alpamis Kutlimuratov, Akhram Nishanov, Rashid Nasimov and Ryumduck Oh
Bioengineering 2025, 12(9), 979; https://doi.org/10.3390/bioengineering12090979 - 15 Sep 2025
Viewed by 450
Abstract
Background: Gliomas are among the most complex and lethal primary brain tumors, necessitating precise evaluation of both anatomical subregions and molecular alterations for effective clinical management. Methods: To find a solution to the disconnected nature of current bioimage analysis pipelines, where anatomical segmentation [...] Read more.
Background: Gliomas are among the most complex and lethal primary brain tumors, necessitating precise evaluation of both anatomical subregions and molecular alterations for effective clinical management. Methods: To find a solution to the disconnected nature of current bioimage analysis pipelines, where anatomical segmentation based on MRI and molecular biomarker prediction are done as separate tasks, we use here Molecular-Genomic and Multi-Task (MGMT-Net), a one deep learning scheme that carries out the task of the multi-modal MRI data without any conversion. MGMT-Net incorporates a novel Cross-Modality Attention Fusion (CMAF) module that dynamically integrates diverse imaging sequences and pairs them with a hybrid Transformer–Convolutional Neural Network (CNN) encoder to capture both global context and local anatomical detail. This architecture supports dual-task decoders, enabling concurrent voxel-wise tumor delineation and subject-level classification of key genomic markers, including the IDH gene mutation, the 1p/19q co-deletion, and the TERT gene promoter mutation. Results: Extensive validation on the Brain Tumor Segmentation (BraTS 2024) dataset and the combined Cancer Genome Atlas/Erasmus Glioma Database (TCGA/EGD) datasets demonstrated high segmentation accuracy and robust biomarker classification performance, with strong generalizability across external institutional cohorts. Ablation studies further confirmed the importance of each architectural component in achieving overall robustness. Conclusions: MGMT-Net presents a scalable and clinically relevant solution that bridges radiological imaging and genomic insights, potentially reducing diagnostic latency and enhancing precision in neuro-oncology decision-making. By integrating spatial and genetic analysis within a single model, this work represents a significant step toward comprehensive, AI-driven glioma assessment. Full article
(This article belongs to the Special Issue Mathematical Models for Medical Diagnosis and Testing)
Show Figures

Figure 1

15 pages, 806 KB  
Article
On Rate Fairness Maximization for the Downlink NOMA with Improper Signaling and Imperfect SIC
by Hao Cheng, Min Zhang and Ruoyu Su
Appl. Sci. 2025, 15(18), 9970; https://doi.org/10.3390/app15189970 - 11 Sep 2025
Viewed by 287
Abstract
Non-orthogonal multiple access (NOMA) is a key enabler for 6G networks due to its efficient spectrum utilization, which is garnering significant attention among the Internet of Things (IoT) community. This paper investigates the benefits of the improper Gaussian signaling (IGS) technique on the [...] Read more.
Non-orthogonal multiple access (NOMA) is a key enabler for 6G networks due to its efficient spectrum utilization, which is garnering significant attention among the Internet of Things (IoT) community. This paper investigates the benefits of the improper Gaussian signaling (IGS) technique on the max–min fairness of the downlink NOMA system under imperfect successive interference cancellation (SIC), where both of the users have the potential to adopt IGS. We first investigate fairness optimization under perfect SIC. In this case, the max–min optimization is solved by the alternate optimization algorithm, where the impropriety degree and power level are iteratively optimized. The closed-form solution for conventional proper Gaussian signaling is also obtained. Then, a deep Q network-based solution is considered for the rate fairness maximization of the downlink NOMA system under IGS and imperfect SIC. The simulations presented for the IGS-aided NOMA system support the analysis, illustrating that IGS can efficiently improve the fairness achievable rate compared to the conventional proper one. Full article
(This article belongs to the Section Electrical, Electronics and Communications Engineering)
Show Figures

Figure 1

26 pages, 6694 KB  
Article
AI Control for Pasteurized Soft-Boiled Eggs
by Primož Podržaj, Dominik Kozjek, Gašper Škulj, Tomaž Požrl and Marjan Jenko
Foods 2025, 14(18), 3171; https://doi.org/10.3390/foods14183171 - 11 Sep 2025
Viewed by 312
Abstract
This paper presents a novel approach to thermal process control in the food industry, specifically targeting the pasteurization and cooking of soft-boiled eggs. The unique challenge of this process lies in the precise temperature control required, as pasteurization and cooking must occur within [...] Read more.
This paper presents a novel approach to thermal process control in the food industry, specifically targeting the pasteurization and cooking of soft-boiled eggs. The unique challenge of this process lies in the precise temperature control required, as pasteurization and cooking must occur within a narrow temperature range. Traditional control methods, such as fuzzy logic controllers, have proven insufficient due to their limitations in handling varying loads and environmental conditions. To address these challenges, we propose the integration of robust reinforcement learning (RL) techniques, particularly the utilization of the Deep Q-Network (DQN) algorithm. Our approach involves training an RL agent in a simulated environment to manage the thermal process with high accuracy. The RL-based system adapts to different heat capacities, initial conditions, and environmental variations, demonstrating superior performance over traditional methods. Experimental results indicate that the RL-based controller significantly improves temperature regulation accuracy, ensuring consistent pasteurization and cooking quality. This study opens new avenues for the application of artificial intelligence in industrial food processing, highlighting the potential for RL algorithms to enhance process control and efficiency. Full article
(This article belongs to the Special Issue Artificial Intelligence (AI) and Machine Learning for Foods)
Show Figures

Graphical abstract

27 pages, 4238 KB  
Article
A Scalable Reinforcement Learning Framework for Ultra-Reliable Low-Latency Spectrum Management in Healthcare Internet of Things
by Adeel Iqbal, Ali Nauman, Tahir Khurshaid and Sang-Bong Rhee
Mathematics 2025, 13(18), 2941; https://doi.org/10.3390/math13182941 - 11 Sep 2025
Viewed by 302
Abstract
Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This [...] Read more.
Healthcare Internet of Things (H-IoT) systems demand ultra-reliable and low-latency communication (URLLC) to support critical functions such as remote monitoring, emergency response, and real-time diagnostics. However, spectrum scarcity and heterogeneous traffic patterns pose major challenges for centralized scheduling in dense H-IoT deployments. This paper proposed a multi-agent reinforcement learning (MARL) framework for dynamic, priority-aware spectrum management (PASM), where cooperative MARL agents jointly optimize throughput, latency, energy efficiency, fairness, and blocking probability under varying traffic and channel conditions. Six learning strategies are developed and compared, including Q-Learning, Double Q-Learning, Deep Q-Network (DQN), Actor–Critic, Dueling DQN, and Proximal Policy Optimization (PPO), within a simulated H-IoT environment that captures heterogeneous traffic, device priorities, and realistic URLLC constraints. A comprehensive simulation study across scalable scenarios ranging from 3 to 50 devices demonstrated that PPO consistently outperforms all baselines, improving mean throughput by 6.2%, reducing 95th-percentile delay by 11.5%, increasing energy efficiency by 11.9%, lowering blocking probability by 33.3%, and accelerating convergence by 75.8% compared to the strongest non-PPO baseline. These findings establish PPO as a robust and scalable solution for QoS-compliant spectrum management in dense H-IoT environments, while Dueling DQN emerges as a competitive deep RL alternative. Full article
Show Figures

Figure 1

24 pages, 5448 KB  
Article
GlioSurvQNet: A DuelContextAttn DQN Framework for Brain Tumor Prognosis with Metaheuristic Optimization
by M. Renugadevi, Venkateswarlu Gonuguntla, Ihssan S. Masad, G. Venkat Babu and K. Narasimhan
Diagnostics 2025, 15(18), 2304; https://doi.org/10.3390/diagnostics15182304 - 11 Sep 2025
Viewed by 344
Abstract
Background/Objectives: Accurate classification of brain tumors and reliable prediction of patient survival are essential in neuro-oncology, guiding clinical decisions and enabling precision treatment planning. However, conventional machine learning and deep learning methods often struggle with challenges such as data scarcity, class imbalance, limited [...] Read more.
Background/Objectives: Accurate classification of brain tumors and reliable prediction of patient survival are essential in neuro-oncology, guiding clinical decisions and enabling precision treatment planning. However, conventional machine learning and deep learning methods often struggle with challenges such as data scarcity, class imbalance, limited model interpretability, and poor generalization across diverse clinical settings. This study presents GlioSurvQNet, a novel reinforcement learning-based framework designed to address these limitations for both glioma grading and survival prediction. Methods: GlioSurvQNet is built upon a DuelContextAttn Deep Q-Network (DQN) architecture, tailored for binary classification of low-grade vs. high-grade gliomas and multi-class survival prediction (short-, medium-, and long-term categories). Radiomics features were extracted from multimodal MRI scans, including FLAIR, T1CE, and T2 sequences. Feature optimization was performed using a hybrid ensemble of metaheuristic algorithms, including Harris Hawks Optimization (HHO), Modified Gorilla Troops Optimization (mGTO), and Zebra Optimization Algorithm (ZOA). Subsequently, SHAP-based feature selection was applied to enhance model interpretability and robustness. Results: The classification module achieved the highest accuracy of 99.27% using the FLAIR + T1CE modality pair, while the survival prediction model attained an accuracy of 93.82% with the FLAIR + T2 + T1CE fusion. Comparative evaluations against established machine learning and deep learning models demonstrated that GlioSurvQNet consistently outperformed existing approaches in both tasks. Conclusions: GlioSurvQNet offers a powerful and interpretable AI-driven solution for brain tumor analysis. Its high accuracy and robustness make it a promising tool for clinical decision support in glioma diagnosis and prognosis. Full article
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)
Show Figures

Figure 1

26 pages, 4054 KB  
Article
Multi-Time-Scale Demand Response Optimization in Active Distribution Networks Using Double Deep Q-Networks
by Wei Niu, Jifeng Li, Zongle Ma, Wenliang Yin and Liang Feng
Energies 2025, 18(18), 4795; https://doi.org/10.3390/en18184795 - 9 Sep 2025
Viewed by 430
Abstract
This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle [...] Read more.
This paper presents a deep reinforcement learning-based demand response (DR) optimization framework for active distribution networks under uncertainty and user heterogeneity. The proposed model utilizes a Double Deep Q-Network (Double DQN) to learn adaptive, multi-period DR strategies across residential, commercial, and electric vehicle (EV) participants in a 24 h rolling horizon. By incorporating a structured state representation—including forecasted load, photovoltaic (PV) output, dynamic pricing, historical DR actions, and voltage states—the agent autonomously learns control policies that minimize total operational costs while maintaining grid feasibility and voltage stability. The physical system is modeled via detailed constraints, including power flow balance, voltage magnitude bounds, PV curtailment caps, deferrable load recovery windows, and user-specific availability envelopes. A case study based on a modified IEEE 33-bus distribution network with embedded PV and DR nodes demonstrates the framework’s effectiveness. Simulation results show that the proposed method achieves significant cost savings (up to 35% over baseline), enhances PV absorption, reduces load variance by 42%, and maintains voltage profiles within safe operational thresholds. Training curves confirm smooth Q-value convergence and stable policy performance, while spatiotemporal visualizations reveal interpretable DR behavior aligned with both economic and physical system constraints. This work contributes a scalable, model-free approach for intelligent DR coordination in smart grids, integrating learning-based control with physical grid realism. The modular design allows for future extension to multi-agent systems, storage coordination, and market-integrated DR scheduling. The results position Double DQN as a promising architecture for operational decision-making in AI-enabled distribution networks. Full article
Show Figures

Figure 1

31 pages, 6584 KB  
Review
Advancements in Active Journal Bearings: A Critical Review of Performance, Control, and Emerging Prospects
by Navaneeth Krishna Vernekar, Raghuvir Pai, Ganesha Aroor, Nitesh Kumar and Girish Hariharan
Modelling 2025, 6(3), 97; https://doi.org/10.3390/modelling6030097 - 5 Sep 2025
Viewed by 570
Abstract
The active or adjustable journal bearings are designed with unique mechanisms to reduce the rotor-bearing system lateral vibrations by adjusting their damping and stiffness. The article provides a comprehensive review of the literature, outlining the structure and findings of studies on active bearings. [...] Read more.
The active or adjustable journal bearings are designed with unique mechanisms to reduce the rotor-bearing system lateral vibrations by adjusting their damping and stiffness. The article provides a comprehensive review of the literature, outlining the structure and findings of studies on active bearings. Over the years, various kinds of adjustable bearing designs have been developed with unique operational mechanisms. Such bearing designs include adjustable pad sectors, externally adjustable pads, active oil injection through pad openings, and flexible deformable sleeves. These modifications enhance the turbine shaft line’s performance by increasing the system’s overall stability. The detailed review in this paper highlights the characteristics of bearings, along with the key advantages, limitations, and potential offered by active control across different bearing types. The efficiency of any rotor system can be greatly enhanced by optimally selecting the adjustable bearing parameters. These adjustable bearings have demonstrated a unique capability to modify the hydrodynamic operation within the bearing clearances. Experimental studies and simulation approaches were also utilized to optimize bearing geometries, lubrication regimes, and control mechanisms. The use of advanced controllers like PID, LQG, and Deep Q networks further refined the stability. The concluding section of the article explores potential avenues for the future development of active bearings. Full article
Show Figures

Figure 1

23 pages, 3818 KB  
Article
Energy Regulation-Aware Layered Control Architecture for Building Energy Systems Using Constraint-Aware Deep Reinforcement Learning and Virtual Energy Storage Modeling
by Siwei Li, Congxiang Tian and Ahmed N. Abdalla
Energies 2025, 18(17), 4698; https://doi.org/10.3390/en18174698 - 4 Sep 2025
Viewed by 742
Abstract
In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, [...] Read more.
In modern intelligent buildings, the control of Building Energy Systems (BES) faces increasing complexity in balancing energy costs, thermal comfort, and operational flexibility. Traditional centralized or flat deep reinforcement learning (DRL) methods often fail to effectively handle the multi-timescale dynamics, large state–action spaces, and strict constraint satisfaction required for real-world energy systems. To address these challenges, this paper proposes an energy policy-aware layered control architecture that combines Virtual Energy Storage System (VESS) modeling with a novel Dynamic Constraint-Aware Policy Optimization (DCPO) algorithm. The VESS is modeled based on the thermal inertia of building envelope components, quantifying flexibility in terms of virtual power, capacity, and state of charge, thus enabling BES to behave as if it had embedded, non-physical energy storage. Building on this, the BES control problem is structured using a hierarchical Markov Decision Process, in which the upper level handles strategic decisions (e.g., VESS dispatch, HVAC modes), while the lower level manages real-time control (e.g., temperature adjustments, load balancing). The proposed DCPO algorithm extends actor–critic learning by incorporating dynamic policy constraints, entropy regularization, and adaptive clipping to ensure feasible and efficient policy learning under both operational and comfort-related constraints. Simulation experiments demonstrate that the proposed approach outperforms established algorithms like Deep Q-Networks (DQN), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Specifically, it achieves a 32.6% reduction in operational costs and over a 51% decrease in thermal comfort violations compared to DQN, while ensuring millisecond-level policy generation suitable for real-time BES deployment. Full article
(This article belongs to the Section C: Energy Economics and Policy)
Show Figures

Figure 1

22 pages, 763 KB  
Article
Optimizing TSCH Scheduling for IIoT Networks Using Reinforcement Learning
by Sahar Ben Yaala, Sirine Ben Yaala and Ridha Bouallegue
Technologies 2025, 13(9), 400; https://doi.org/10.3390/technologies13090400 - 3 Sep 2025
Viewed by 473
Abstract
In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the [...] Read more.
In the context of industrial applications, ensuring medium access control is a fundamental challenge. Industrial IoT devices are resource-constrained and must guarantee reliable communication while reducing energy consumption. The IEEE 802.15.4e standard proposed time-slotted channel hopping (TSCH) to meet the requirements of the industrial Internet of Things. TSCH relies on time synchronization and channel hopping to improve performance and reduce energy consumption. Despite these characteristics, configuring an efficient schedule under varying traffic conditions and interference scenarios remains a challenging problem. The exploitation of reinforcement learning (RL) techniques offers a promising approach to address this challenge. AI enables TSCH to dynamically adapt its scheduling based on real-time network conditions, making decisions that optimize key performance criteria such as energy efficiency, reliability, and latency. By learning from the environment, reinforcement learning can reconfigure schedules to mitigate interference scenarios and meet traffic demands. In this work, we compare various reinforcement learning (RL) algorithms in the context of the TSCH environment. In particular, we evaluate the deep Q-network (DQN), double deep Q-network (DDQN), and prioritized DQN (PER-DQN). We focus on the convergence speed of these algorithms and their capacity to adapt the schedule. Our results show that the PER-DQN algorithm improves the packet delivery ratio and achieves faster convergence compared to DQN and DDQN, demonstrating its effectiveness for dynamic TSCH scheduling in Industrial IoT environments. These quantifiable improvements highlight the potential of prioritized experience replay to enhance reliability and efficiency under varying network conditions. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

24 pages, 6077 KB  
Article
Trajectory Tracking Control of Intelligent Vehicles with Adaptive Model Predictive Control and Reinforcement Learning Under Variable Curvature Roads
by Yuying Fang, Pengwei Wang, Song Gao, Binbin Sun, Qing Zhang and Yuhua Zhang
Technologies 2025, 13(9), 394; https://doi.org/10.3390/technologies13090394 - 1 Sep 2025
Viewed by 498
Abstract
To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time [...] Read more.
To improve the tracking accuracy and the adaptability of intelligent vehicles in various road conditions, an adaptive model predictive controller combining reinforcement learning is proposed in this paper. Firstly, to solve the problem of control accuracy decline caused by a fixed prediction time domain, a low-computational-cost adaptive prediction horizon strategy based on a two-dimensional Gaussian function is designed to realize the real-time adjustment of prediction time domain change with vehicle speed and road curvature. Secondly, to address the problem of tracking stability reduction under complex road conditions, the Deep Q-Network (DQN) algorithm is used to adjust the weight matrix of the Model Predictive Control (MPC) algorithm; then, the convergence speed and control effectiveness of the tracking controller are improved. Finally, hardware-in-the-loop tests and real vehicle tests are conducted. The results show that the proposed adaptive predictive horizon controller (DQN-AP-MPC) solves the problem of poor control performance caused by fixed predictive time domain and fixed weight matrix values, significantly improving the tracking accuracy of intelligent vehicles under different road conditions. Especially under variable curvature and high-speed conditions, the proposed controller reduces the maximum lateral error by 76.81% compared to the unimproved MPC controller, and reduces the average absolute error by 64.44%. The proposed controller has a faster convergence speed and better trajectory tracking performance when tested on variable curvature road conditions and double lane roads. Full article
(This article belongs to the Section Manufacturing Technology)
Show Figures

Figure 1

Back to TopTop