Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (977)

Search Parameters:
Keywords = Deep Q-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
26 pages, 1136 KB  
Article
A Hybrid Framework for Multi-Stock Trading: Deep Q-Networks with Portfolio Optimization
by Soroush Shahsafi and Farnoosh Naderkhani
J. Risk Financial Manag. 2026, 19(2), 132; https://doi.org/10.3390/jrfm19020132 - 9 Feb 2026
Abstract
This paper presents a hybrid framework for multi-stock trading that combines the decision-making ability of Deep Q-Networks (DQN) with the allocation precision of portfolio optimization models. Realistic markets are noisy and non-stationary, and complex action spaces can hinder reinforcement learning (RL) performance. The [...] Read more.
This paper presents a hybrid framework for multi-stock trading that combines the decision-making ability of Deep Q-Networks (DQN) with the allocation precision of portfolio optimization models. Realistic markets are noisy and non-stationary, and complex action spaces can hinder reinforcement learning (RL) performance. The DQN generates buy/sell signals based on market conditions. The framework passes buy-listed assets to an optimizer, which computes portfolio weights. Five allocation strategies are examined: naïve 1/N, Markowitz Mean–Variance, Global Minimum Variance, Risk Parity, and Sharpe Ratio Maximization. Empirical evaluations on emerging-market exchange-traded funds (ETFs), as well as additional tests on U.S. equities, show that even the baseline DQN outperforms traditional technical indicators. Furthermore, integrating any of the optimization approaches with DQN yields measurable improvements in return-risk performance metrics. Among the hybrid frameworks, DQN combined with Sharpe Ratio Maximization delivers the most consistent gains. The findings highlight the value of decomposing stock selection from capital allocation and demonstrate the effectiveness of the proposed DQN-optimization framework on our testbed. Full article
(This article belongs to the Special Issue AI Applications in Financial Markets and Computational Finance)
Show Figures

Figure 1

16 pages, 3489 KB  
Article
A Deployment Strategy for Reconfigurable Intelligent Surfaces with Joint Phase and Position Optimization
by Guangsong Yang, Hongbo Huang, Chuwei Sun, Yiliang Wu, Xinjie Xu and Shan Huang
Electronics 2026, 15(3), 718; https://doi.org/10.3390/electronics15030718 - 6 Feb 2026
Viewed by 76
Abstract
The actual implementation of fifth-generation (5G) and beyond networks faces persistent challenges, including environmental interference and limited coverage, which compromise transmission stability and network feasibility. Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology to dynamically reconfigure wireless propagation environments and enhance [...] Read more.
The actual implementation of fifth-generation (5G) and beyond networks faces persistent challenges, including environmental interference and limited coverage, which compromise transmission stability and network feasibility. Reconfigurable Intelligent Surfaces (RISs) have emerged as a promising technology to dynamically reconfigure wireless propagation environments and enhance communication quality. To fully unlock the potential of RIS, this paper proposes a novel deployment strategy based on Double Deep Q-Networks (DDQNs) that jointly optimizes the RIS placement and phase shift configuration to maximize the system sum-rate. Specifically, the coverage area is discretized into a grid, and at each candidate location, a DDQN-based method is developed to solve the corresponding non-convex phase optimization problem. Simulation results reveal that our proposed strategy significantly surpasses conventional benchmark schemes, resulting in a sum-rate improvement of up to 38.41%. The study provides a practical and efficient pre-deployment framework for RIS-enhanced wireless networks. Full article
Show Figures

Figure 1

8 pages, 1055 KB  
Proceeding Paper
Subchannel Allocation in Massive Multiple-Input Multiple-Output Orthogonal Frequency-Division Multiple Access and Hybrid Beamforming Systems with Deep Reinforcement Learning
by Jih-Wei Lee and Yung-Fang Chen
Eng. Proc. 2025, 120(1), 55; https://doi.org/10.3390/engproc2025120055 - 6 Feb 2026
Viewed by 63
Abstract
In this study, we emphasize that the maximum sum rate can be achieved through AI-based subchannel allocation, while taking into account all users’ quality of service (QoS) requirements in data rates for hybrid beamforming systems. We assume a limited number of radio frequency [...] Read more.
In this study, we emphasize that the maximum sum rate can be achieved through AI-based subchannel allocation, while taking into account all users’ quality of service (QoS) requirements in data rates for hybrid beamforming systems. We assume a limited number of radio frequency (RF) chains in practical hybrid beamforming architectures. This constraint makes subchannel allocation a critical aspect of hybrid beamforming in massive multiple-input multiple-output (MIMO) systems with orthogonal frequency division multiple access (MIMO-OFDMA), as it enables the system to serve more users within a single time slot. Unlike conventional subcarrier allocation methods, we employ a deep reinforcement learning (DRL)-based algorithm to address real-time decision-making challenges. Specifically, we propose a dueling double deep Q-network (Dueling-DDQN) to implement dynamic subchannel allocation. Simulation results demonstrate that the performance of the proposed algorithm gradually approaches that of the greedy method. Furthermore, both the average sum rate and the average spectral efficiency per user improve with a reasonable variation in outage probability. Full article
(This article belongs to the Proceedings of 8th International Conference on Knowledge Innovation and Invention)
Show Figures

Figure 1

27 pages, 1144 KB  
Article
Preference-Aligned Ride-Sharing Repositioning via a Two-Stage Bilevel RLHF Framework
by Ruihan Li and Vaneet Aggarwal
Electronics 2026, 15(3), 669; https://doi.org/10.3390/electronics15030669 - 3 Feb 2026
Viewed by 96
Abstract
Vehicle repositioning is essential for improving efficiency and service quality in ride-sharing platforms, yet existing approaches typically optimize proxy rewards that fail to reflect human-centered preferences such as wait time, service coverage, and unnecessary empty travel. We propose the first two-stage Bilevel Reinforcement [...] Read more.
Vehicle repositioning is essential for improving efficiency and service quality in ride-sharing platforms, yet existing approaches typically optimize proxy rewards that fail to reflect human-centered preferences such as wait time, service coverage, and unnecessary empty travel. We propose the first two-stage Bilevel Reinforcement Learning (RL) from Human Feedback (RLHF) framework for preference-aligned vehicle repositioning. In Stage 1, a value-based Deep Q-Network (DQN)-RLHF warm start learns an initial preference-aligned reward model and stable reference policy, mitigating the reward-model drift and cold-start instability that arise when applying on-policy RLHF directly. In Stage 2, a Kullback–Leibler (KL)-regularized Proximal Policy Optimization (PPO)-RLHF algorithm, equipped with action masking, behavioral-cloning anchoring, and alternating forward–reverse KL, fine-tunes the repositioning policy using either Large Language Model (LLM)-generated or rubric-based preference labels. We develop and compare two coordination schemes, pure alternating (PPO-Alternating) and k-step alternating (PPO-k-step), demonstrating that both yield consistent improvements across all tested arrival scales. Empirically, our framework reduces wait time and empty-mile ratio while improving served rate, without inducing trade-offs or reducing platform profit. These results show that human preference alignment can be stably and effectively incorporated into large-scale ride-sharing repositioning. Full article
29 pages, 3087 KB  
Review
Reinforcement Learning-Enabled Control and Design of Rigid-Link Robotic Fish: A Comprehensive Review
by Nhat Dinh, Darion Vosbein, Yuehua Wang and Qingsong Cui
Sensors 2026, 26(3), 996; https://doi.org/10.3390/s26030996 - 3 Feb 2026
Viewed by 227
Abstract
With the rising demand for maritime surveys of infrastructure, energy resources, and environmental conditions, autonomous robotic fish have emerged as a promising solution with their biomimetic propulsion, agile motion, efficiency, and capacity for underwater inspection, monitoring, data collection, and exploration tasks in complex [...] Read more.
With the rising demand for maritime surveys of infrastructure, energy resources, and environmental conditions, autonomous robotic fish have emerged as a promising solution with their biomimetic propulsion, agile motion, efficiency, and capacity for underwater inspection, monitoring, data collection, and exploration tasks in complex aquatic environments. Inspired by fish spines, rigid-link fish robots (RLFRs), a category of robotic fish, are widely utilized in robotics research and applications. Their rigid, actuated joints enable them to reproduce the undulatory locomotion and high maneuverability of biological fishes, while the modular nature of rigid links between joints makes them cost-effective and easy to assemble. This review examines and presents recent approaches and advancements in the field of structural design, as well as Reinforcement learning (RL)-enabled controls with sensors and actuators. Existing designs are classified by joint configuration, with key structural, material, fabrication, and propulsion considerations summarized. The review highlights the use of Q-learning, Deep Q-Network (DQN), and Deep Deterministic Policy Gradient (DDPG) algorithms for RLFR controllers, showing their impact on adaptability, motion control, and learning in dynamic hydrodynamic conditions. Technical challenges—including unstructured environments and complex fluid–body interactions—are discussed, along with future directions. This review aims to clarify current progress and identify technological gaps for advancing rigid-link robotic fish. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

23 pages, 776 KB  
Article
Deep Reinforcement Learning-Driven Adaptive Prompting for Robust Medical LLM Evaluation
by Dong Ding, Wang Xi, Zenghui Ding and Jianqing Gao
Appl. Sci. 2026, 16(3), 1514; https://doi.org/10.3390/app16031514 - 2 Feb 2026
Viewed by 128
Abstract
The accurate and reliable evaluation of large language models (LLMs) in medical domains is critical for real-world clinical deployment, automated medical reasoning, and patient safety. However, the evaluation process is highly sensitive to prompt design, and prevalent reliance on fixed or randomly sampled [...] Read more.
The accurate and reliable evaluation of large language models (LLMs) in medical domains is critical for real-world clinical deployment, automated medical reasoning, and patient safety. However, the evaluation process is highly sensitive to prompt design, and prevalent reliance on fixed or randomly sampled prompt policies often fails to dynamically adapt to clinical context, question complexity, or evolving safety requirements. This article presents a novel reinforcement learning-based framework for multi-prompt selection, which dynamically optimizes prompt policy per input for medical LLM evaluation across the Medical Knowledge Question-Answering dataset (MKQA), the Medical Multiple-Choice Question dataset (MCQ), and the Doctor-Patient Dialogue dataset. We formulate prompt selection as a Markov Decision Process (MDP) and employ a deep Q-Network (DQN) agent to maximize a reward signal incorporating textual accuracy, domain terminology coverage, safety, and dialogue relevance. Experiments on three medical LLM benchmarks demonstrate consistent improvements in composite reward (e.g., a 6.66% increase in MKQA vs. Random Baseline, and a 2.41% increase in Dialogue vs. Fixed Baseline) when compared to baselines. This was accompanied by robust enhancements in Safety (e.g., achieving 1.0000 in MKQA, a 5.26% increase vs. Fixed Baseline; and a 5.03% increase in Dialogue vs. Fixed Baseline) and substantial gains in Medical Terminology Coverage (e.g., a 74.61% increase in MKQA vs. Fixed Baseline, and a 9.13% increase in MCQ vs. Fixed Baseline) when compared to baselines. While varying across tasks, an improvement in accuracy was observed in the MKQA task, and the framework effectively optimizes the multi-objective reward function, even when minor trade-offs in other metrics like Accuracy and Contextual Relevance were observed in some contexts. Our framework enables robust, context-aware, and adaptive evaluation, laying a foundation for safer and more reliable LLM application in healthcare. Full article
(This article belongs to the Special Issue Artificial Intelligence in Healthcare: Status, Prospects and Future)
24 pages, 1202 KB  
Article
Coordinated Multi-Intersection Traffic Signal Control Using a Policy-Regulated Deep Q-Network
by Lin Ma, Yan Liu, Yang Liu, Changxi Ma and Shanpu Wang
Sustainability 2026, 18(3), 1510; https://doi.org/10.3390/su18031510 - 2 Feb 2026
Viewed by 182
Abstract
Coordinated control across multiple signalized intersections is essential for mitigating congestion propagation in urban road networks. However, existing DQN-based approaches often suffer from unstable action switching, limited interpretability, and insufficient capability to model spatial spillback between adjacent intersections. To address these limitations, this [...] Read more.
Coordinated control across multiple signalized intersections is essential for mitigating congestion propagation in urban road networks. However, existing DQN-based approaches often suffer from unstable action switching, limited interpretability, and insufficient capability to model spatial spillback between adjacent intersections. To address these limitations, this study proposes a Policy-Regulated and Aligned Deep Q-Network (PRA-DQN) for cooperative multi-intersection signal control. A differentiable policy function is introduced and explicitly trained to align with the optimal Q-value-derived target distribution, yielding more stable and interpretable policy behavior. In addition, a cooperative reward structure integrating local delay, movement pressure, and upstream–downstream interactions enables agents to simultaneously optimize local efficiency and regional coordination. A parameter-sharing multi-agent framework further enhances scalability and learning consistency across intersections. Simulation experiments conducted on a 2 × 2 SUMO grid show that PRA-DQN consistently outperforms fixed-time, classical DQN, distributed DQN, and pressure/wave-based baselines. Compared with fixed-time control, PRA-DQN reduces maximum queue length by 21.17%, average queue length by 18.75%, and average waiting time by 17.71%. Moreover, relative to classical DQN coordination, PRA-DQN achieves an additional 7.53% reduction in average waiting time. These results confirm the effectiveness and superiority of the proposed method in suppressing congestion propagation and improving network-level traffic performance. The proposed PRA-DQN provides a practical and scalable basis for real-time deployment of coordinated signal control and can be readily extended to larger networks and time-varying demand conditions. Full article
Show Figures

Figure 1

38 pages, 3226 KB  
Article
Optimization of High-Frequency Transmission Line Reflection Wave Compensation and Impedance Matching Based on a DQN-GA Hybrid Algorithm
by Tieli Liu, Jie Li, Xi Zhang, Debiao Zhang, Chenjun Hu, Kaiqiang Feng, Shuangchao Ge and Junlong Li
Electronics 2026, 15(3), 645; https://doi.org/10.3390/electronics15030645 - 2 Feb 2026
Viewed by 187
Abstract
In high-frequency circuit design, parameters such as the characteristic impedance and propagation constant of transmission lines directly affect key performance metrics, including signal integrity and power transmission efficiency. To address the challenge of optimizing impedance matching for high-frequency PCB transmission lines, this study [...] Read more.
In high-frequency circuit design, parameters such as the characteristic impedance and propagation constant of transmission lines directly affect key performance metrics, including signal integrity and power transmission efficiency. To address the challenge of optimizing impedance matching for high-frequency PCB transmission lines, this study applies a hybrid deep Q-network—genetic algorithm (DQN-GA) that integrates deep reinforcement learning with a genetic algorithm (GA). Unlike existing methods that primarily focus on predictive modeling or single-algorithm optimization, the proposed approach introduces a bidirectional interaction mechanism for algorithm fusion: transmission line structures learned by the deep Q-network (DQN) are encoded as chromosomes to enhance the diversity of the genetic algorithm population; simultaneously, high-fitness individuals from the genetic algorithm are decoded and stored in the experience replay pool of the DQN to accelerate its convergence. Simulation results demonstrate that the DQN-GA algorithm significantly outperforms both unoptimized structures and standalone GA methods, achieving substantial improvements in fitness scores and S11 transmission coefficients. This algorithm effectively overcomes the limitations of conventional approaches in addressing complex reflected wave compensation problems in high-frequency applications, providing a robust solution for signal integrity optimization in high-speed circuit design. This study not only advances the field of intelligent circuit optimization but also establishes a valuable framework for the application of hybrid algorithms to complex engineering challenges. Full article
Show Figures

Figure 1

30 pages, 1988 KB  
Systematic Review
MRI-Based Radiomics for Non-Invasive Prediction of Molecular Biomarkers in Gliomas
by Edoardo Agosti, Karen Mapelli, Gianluca Grimod, Amedeo Piazza, Marco Maria Fontanella and Pier Paolo Panciani
Cancers 2026, 18(3), 491; https://doi.org/10.3390/cancers18030491 - 2 Feb 2026
Viewed by 326
Abstract
Background: Radiomics has emerged as a promising approach to non-invasively characterize the molecular landscape of gliomas, providing quantitative, high-dimensional data derived from routine MRI. Given the recent shift toward molecularly driven classification, radiomics may support precision oncology by predicting key genomic, epigenetic, and [...] Read more.
Background: Radiomics has emerged as a promising approach to non-invasively characterize the molecular landscape of gliomas, providing quantitative, high-dimensional data derived from routine MRI. Given the recent shift toward molecularly driven classification, radiomics may support precision oncology by predicting key genomic, epigenetic, and phenotypic alterations without the need for invasive tissue sampling. This systematic review aimed to synthesize current radiomics applications for the non-invasive prediction of molecular biomarkers in gliomas, evaluating methodological trends, performance metrics, and translational readiness. Methods: This review followed the PRISMA 2020 guidelines. A systematic search was conducted in PubMed, Ovid MEDLINE, and Scopus on 10 January 2025, and updated on 1 February 2025, using predefined MeSH terms and keywords related to glioma, radiomics, machine learning, deep learning, and molecular biomarkers. Eligible studies included original research using MRI-based radiomics to predict molecular alterations in human gliomas, with reported performance metrics. Data extraction covered study design, cohort size, MRI sequences, segmentation approaches, feature extraction software, computational methods, biomarkers assessed, and diagnostic performance. Methodological quality was evaluated using the Radiomics Quality Score (RQS), Image Biomarker Standardization Initiative (IBSI) criteria, and Newcastle–Ottawa Scale (NOS). Due to heterogeneity, no meta-analysis was performed. Results: Of 744 screened records, 70 studies met the inclusion criteria. A total of 10,324 patients were included across all studies (mean 140 patients/study, range 23–628). The most frequently employed MRI sequences were T2-weighted (59 studies, 84.3%), contrast-enhanced T1WI (53 studies, 75.7%), T1WI (50 studies, 71.4%), and FLAIR (48 studies, 68.6%); diffusion-weighted imaging was used in only 7 studies (12.8%). Manual segmentation predominated (52 studies, 74.3%), whereas automated approaches were used in 13 studies (18.6%). Common feature extraction platforms included 3D Slicer (20 studies, 28.6%) and MATLAB-based tools (17 studies, 24.3%). Machine learning methods were applied in 47 studies (67.1%), with support vector machines used in 29 studies (41.4%); deep learning models were implemented in 27 studies (38.6%), primarily convolutional neural networks (20 studies, 28.6%). IDH mutation was the most frequently predicted biomarker (49 studies, 70%), followed by ATRX (27 studies, 38.6%), MGMT methylation (8 studies, 11,4%), and 1p/19q codeletion (7 studies, 10%). Reported AUC values ranged from 0.80 to 0.99 for IDH, approximately 0.71–0.953 for 1p/19q, 0.72–0.93 for MGMT, and 0.76–0.97 for ATRX, with deep learning or hybrid pipelines generally achieving the highest performance. RQS values highlighted substantial methodological variability, and IBSI adherence was inconsistent. NOS scores indicated high-quality methodology in a limited subset of studies. Conclusions: Radiomics demonstrates strong potential for the non-invasive prediction of key glioma molecular biomarkers, achieving high diagnostic performance across diverse computational approaches. However, widespread clinical translation remains hindered by heterogeneous imaging protocols, limited standardization, insufficient external validation, and variable methodological rigor. Full article
(This article belongs to the Special Issue Radiomics and Molecular Biology in Glioma: A Synergistic Approach)
Show Figures

Figure 1

17 pages, 1153 KB  
Article
A Federated Deep Q-Network Approach for Distributed Cloud Testing: Methodology and Case Study
by Aicha Oualla, Oussama Maakoul, Salma Azzouzi and My El Hassan Charaf
AI 2026, 7(2), 46; https://doi.org/10.3390/ai7020046 - 1 Feb 2026
Viewed by 165
Abstract
The rapid expansion of the Internet of Things (IoT) has brought forth numerous challenges in testing distributed applications within cloud environments. A significant issue is the latency associated with hosting these applications on cloud computing platforms, despite their potential to improve productivity and [...] Read more.
The rapid expansion of the Internet of Things (IoT) has brought forth numerous challenges in testing distributed applications within cloud environments. A significant issue is the latency associated with hosting these applications on cloud computing platforms, despite their potential to improve productivity and reduce costs. This necessitates a reevaluation of existing conformance testing frameworks for cloud environments, with a focus on addressing coordination and observability challenges during data processing. To tackle these challenges, this study proposes a novel approach based on Deep Q-Networks (DQN) and federated learning (FL). In this model, fog nodes train their local models independently and transmit only parameter updates to a central server, where these updates are aggregated into a global model. The DQN agents replace explicit coordination messages with learned decision functions, dynamically determining when and how testers should coordinate. This approach not only preserves the privacy of IoT devices but also enhances the efficiency of the testing process. We provide a comprehensive mathematical formulation of our approach, along with a detailed case study of a Smart City Traffic Management System. Our experimental results demonstrate significant improvements over traditional testing approaches, including a ~58% reduction in coordination messages. These findings confirm the effectiveness of our approach for distributed testing in dynamic environments with varying network conditions. Full article
Show Figures

Figure 1

24 pages, 789 KB  
Article
Decentralized Computation Offloading Strategy via Multi-Agent Deep Reinforcement Learning for Multi-Access Edge Computing Systems
by Emmanuella Adu, Yeongmuk Lee, Jihwan Moon, Sooyoung Jang, Inkyu Bang and Taehoon Kim
Sensors 2026, 26(3), 914; https://doi.org/10.3390/s26030914 - 30 Jan 2026
Viewed by 321
Abstract
Multi-access edge computing (MEC) has been widely recognized as a promising solution for alleviating the computational burden on edge devices, particularly in supporting fast and real-time processing of resource-intensive applications. In this paper, we propose a decentralized offloading decision strategy based on multi-agent [...] Read more.
Multi-access edge computing (MEC) has been widely recognized as a promising solution for alleviating the computational burden on edge devices, particularly in supporting fast and real-time processing of resource-intensive applications. In this paper, we propose a decentralized offloading decision strategy based on multi-agent deep reinforcement learning (MADRL), aiming to minimize the overall task completion latency experienced by edge devices. Our proposed scheme adopts a grant-free access mechanism during the initialization of offloading in a fully decentralized manner, which serves as the key feature of our strategy. As a result, determining the optimal offloading factor becomes significantly more challenging due to the simultaneous access attempts from multiple edge devices. To resolve this problem, we consider a discrete action space-based deep reinforcement learning (DRL) approach, termed deep Q network (DQN), to enable each edge device to learn a decentralized computation offloading policy based solely on its local observation without requiring global network information. In our design, each edge device dynamically adjusts its offloading factor according to its observed channel state and the number of active users, thereby balancing local and remote computation loads adaptively. Furthermore, the proposed MADRL-based framework jointly accounts for user association and offloading decision optimization to mitigate access collisions and computation bottlenecks in a multi-user environment. We perform extensive computer simulations using MATLAB R2023b to evaluate the performance of the proposed strategy, focusing on the task completion latency under various system configurations. The numerical results demonstrate that our proposed strategy effectively reduces the overall task completion latency and achieves faster convergence of learning performance compared with conventional schemes, confirming the efficiency and scalability of the proposed decentralized approach. Full article
(This article belongs to the Section Communications)
28 pages, 15662 KB  
Article
Cable Fire Risk Prediction via Dynamic Q-Learning-Driven Ensemble of Deep Temporal Networks
by Haoxuan Li, Hao Gao, Xuehong Gao and Guozhong Huang
Fire 2026, 9(2), 61; https://doi.org/10.3390/fire9020061 - 29 Jan 2026
Viewed by 327
Abstract
Cables, which are critical for power and signal transmission in complex buildings and underground infrastructure, are exposed to elevated fire risks during operation, making reliable risk prediction essential for building fire safety. This study proposes a multivariate cable fire risk prediction model that [...] Read more.
Cables, which are critical for power and signal transmission in complex buildings and underground infrastructure, are exposed to elevated fire risks during operation, making reliable risk prediction essential for building fire safety. This study proposes a multivariate cable fire risk prediction model that integrates three deep temporal networks (RNN, LSTM, and GRU) through a Q-learning-based ensemble learning (QBEL). The model uses current, voltage, power, temperature, humidity, oxygen concentration, and system risk values acquired from an intelligent fire alarm system as inputs. Using a real-world dataset comprising 3060 seven-dimensional time steps collected from a tobacco logistics center, QBEL achieves a test-set MSE of 1.73, RMSE of 1.31, MAE of 0.84, and MAPE of 2.66%, improving the MAE and MAPE of the best single recurrent network by approximately 10–12%. Comparative experiments against conventional ensemble approaches based on XGBoost (Python package, version 3.0.0) boosting and stacking, as well as recent time-series forecasting models including DLinear, PatchTST, MoLE, and Fredformer, demonstrate that QBEL attains the lowest MAE and MAPE among all methods, while maintaining an MSE close to that of the best linear baseline and a moderate computational cost of approximately 5.5 × 10−3 GFLOPs and 45 MB of memory per inference. These results indicate that QBEL provides a favorable balance between prediction accuracy and computational efficiency, supporting its potential use in edge-oriented monitoring pipelines for timely cable fire risk warnings in building environments. Full article
(This article belongs to the Special Issue Building Fire Prediction and Suppression)
Show Figures

Figure 1

20 pages, 1369 KB  
Article
Symmetry-Aware Interpretable Anomaly Alarm Optimization Method for Power Monitoring Systems Based on Hierarchical Attention Deep Reinforcement Learning
by Zepeng Hou, Qiang Fu, Weixun Li, Yao Wang, Zhengkun Dong, Xianlin Ye, Xiaoyu Chen and Fangyu Zhang
Symmetry 2026, 18(2), 216; https://doi.org/10.3390/sym18020216 - 23 Jan 2026
Viewed by 309
Abstract
With the rapid advancement of smart grids driven by renewable energy integration and the extensive deployment of supervisory control and data acquisition (SCADA) and phasor measurement units (PMUs), addressing the escalating alarm flooding via intelligent analysis of large-scale alarm data is pivotal to [...] Read more.
With the rapid advancement of smart grids driven by renewable energy integration and the extensive deployment of supervisory control and data acquisition (SCADA) and phasor measurement units (PMUs), addressing the escalating alarm flooding via intelligent analysis of large-scale alarm data is pivotal to safeguarding the safe and stable operation of power grids. To tackle these challenges, this study introduces a pioneering alarm optimization framework based on symmetry-driven crowdsourced active learning and interpretable deep reinforcement learning (DRL). Firstly, an anomaly alarm annotation method integrating differentiated crowdsourcing and active learning is proposed to mitigate the inherent asymmetry in data distribution. Secondly, a symmetrically structured DRL-based hierarchical attention deep Q-network is designed with a dual-path encoder to balance the processing of multi-scale alarm features. Finally, a SHAP-driven interpretability framework is established, providing global and local attribution to enhance decision transparency. Experimental results on a real-world power alarm dataset demonstrate that the proposed method achieves a Fleiss’ Kappa of 0.82 in annotation consistency and an F1-Score of 0.95 in detection performance, significantly outperforming state-of-the-art baselines. Additionally, the false positive rate is reduced to 0.04, verifying the framework’s effectiveness in suppressing alarm flooding while maintaining high recall. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry in Data Analysis)
Show Figures

Figure 1

26 pages, 5704 KB  
Article
Intent-Aware Collision Avoidance for UAVs in High-Density Non-Cooperative Environments Using Deep Reinforcement Learning
by Xuchuan Liu, Yuan Zheng, Chenglong Li, Bo Jiang and Wenyong Gu
Aerospace 2026, 13(2), 111; https://doi.org/10.3390/aerospace13020111 - 23 Jan 2026
Viewed by 203
Abstract
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus [...] Read more.
Collision avoidance between unmanned aerial vehicles (UAVs) and non-cooperative targets (e.g., off-nominal operations or birds) presents significant challenges in urban air mobility (UAM). This difficulty arises due to the highly dynamic and unpredictable flight intentions of these targets. Traditional collision-avoidance methods primarily focus on cooperative targets or non-cooperative ones with fixed behavior, rendering them ineffective when dealing with highly unpredictable flight patterns. To address this, we introduce a deep reinforcement learning-based collision-avoidance approach leveraging global and local intent prediction. Specifically, we propose a Global and Local Perception Prediction Module (GLPPM) that combines a state-space-based global intent association mechanism with a local feature extraction module, enabling accurate prediction of short- and long-term flight intents. Additionally, we propose a Fusion Sector Flight Control Module (FSFCM) that is trained with a Dueling Double Deep Q-Network (D3QN). The module integrates both predicted future and current intents into the state space and employs a specifically designed reward function, thereby ensuring safe UAV operations. Experimental results demonstrate that the proposed method significantly improves mission success rates in high-density environments, with up to 80 non-cooperative targets per square kilometer. In 1000 flight tests, the mission success rate is 15.2 percentage points higher than that of the baseline D3QN. Furthermore, the approach retains an 88.1% success rate even under extreme target densities of 120 targets per square kilometer. Finally, interpretability analysis via Deep SHAP further verifies the decision-making rationality of the algorithm. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

33 pages, 3714 KB  
Article
SADQN-Based Residual Energy-Aware Beamforming for LoRa-Enabled RF Energy Harvesting for Disaster-Tolerant Underground Mining Networks
by Hilary Kelechi Anabi, Samuel Frimpong and Sanjay Madria
Sensors 2026, 26(2), 730; https://doi.org/10.3390/s26020730 - 21 Jan 2026
Viewed by 136
Abstract
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent [...] Read more.
The end-to-end efficiency of radio-frequency (RF)-powered wireless communication networks (WPCNs) in post-disaster underground mine environments can be enhanced through adaptive beamforming. The primary challenges in such scenarios include (i) identifying the most energy-constrained nodes, i.e., nodes with the lowest residual energy to prevent the loss of tracking and localization functionality; (ii) avoiding reliance on the computationally intensive channel state information (CSI) acquisition process; and (iii) ensuring long-range RF wireless power transfer (LoRa-RFWPT). To address these issues, this paper introduces an adaptive and safety-aware deep reinforcement learning (DRL) framework for energy beamforming in LoRa-enabled underground disaster networks. Specifically, we develop a Safe Adaptive Deep Q-Network (SADQN) that incorporates residual energy awareness to enhance energy harvesting under mobility, while also formulating a SADQN approach with dual-variable updates to mitigate constraint violations associated with fairness, minimum energy thresholds, duty cycle, and uplink utilization. A mathematical model is proposed to capture the dynamics of post-disaster underground mine environments, and the problem is formulated as a constrained Markov decision process (CMDP). To address the inherent NP hardness of this constrained reinforcement learning (CRL) formulation, we employ a Lagrangian relaxation technique to reduce complexity and derive near-optimal solutions. Comprehensive simulation results demonstrate that SADQN significantly outperforms all baseline algorithms: increasing cumulative harvested energy by approximately 11% versus DQN, 15% versus Safe-DQN, and 40% versus PSO, and achieving substantial gains over random beamforming and non-beamforming approaches. The proposed SADQN framework maintains fairness indices above 0.90, converges 27% faster than Safe-DQN and 43% faster than standard DQN in terms of episodes, and demonstrates superior stability, with 33% lower performance variance than Safe-DQN and 66% lower than DQN after convergence, making it particularly suitable for safety-critical underground mining disaster scenarios where reliable energy delivery and operational stability are paramount. Full article
Show Figures

Figure 1

Back to TopTop