Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (754)

Search Parameters:
Keywords = Markov Decision Process

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3066 KB  
Article
A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning
by Min Zhou, Zhiheng Qi, Tianlin Zhu, Jan Vijg and Xiaoshui Huang
Mathematics 2025, 13(17), 2739; https://doi.org/10.3390/math13172739 - 26 Aug 2025
Viewed by 191
Abstract
Chart reasoning, a critical task for automating data interpretation in domains such as aiding scientific data analysis and medical diagnostics, leverages large-scale vision language models (VLMs) to interpret chart images and answer natural language questions, enabling semantic understanding that enhances knowledge accessibility and [...] Read more.
Chart reasoning, a critical task for automating data interpretation in domains such as aiding scientific data analysis and medical diagnostics, leverages large-scale vision language models (VLMs) to interpret chart images and answer natural language questions, enabling semantic understanding that enhances knowledge accessibility and supports data-driven decision making across diverse domains. In this work, we formalize chart reasoning as a sequential decision-making problem governed by a Markov Decision Process (MDP), thereby providing a mathematically grounded framework for analyzing visual question answering tasks. While recent advances such as multi-step reasoning with Monte Carlo tree search (MCTS) offer interpretable and stochastic planning capabilities, these methods often suffer from redundant path exploration and inefficient reward propagation. To address these challenges, we propose a novel algorithmic framework that integrates a pheromone-guided search strategy inspired by Ant Colony Optimization (ACO). In our approach, chart reasoning is cast as a combinatorial optimization problem over a dynamically evolving search tree, where path desirability is governed by pheromone concentration functions that capture global phenomena across search episodes and are reinforced through trajectory-level rewards. Transition probabilities are further modulated by local signals, which are evaluations derived from the immediate linguistic feedback of large language models. This enables fine grained decision making at each step while preserving long-term planning efficacy. Extensive experiments across four benchmark datasets, ChartQA, MathVista, GRAB, and ChartX, demonstrate the effectiveness of our approach, with multi-agent reasoning and pheromone guidance yielding success rate improvements of +18.4% and +7.6%, respectively. Full article
(This article belongs to the Special Issue Multimodal Deep Learning and Its Application in Healthcare)
Show Figures

Figure 1

38 pages, 6012 KB  
Article
Adaptive Spectrum Management in Optical WSNs for Real-Time Data Transmission and Fault Tolerance
by Mohammed Alwakeel
Mathematics 2025, 13(17), 2715; https://doi.org/10.3390/math13172715 - 23 Aug 2025
Viewed by 240
Abstract
Optical wireless sensor networks (OWSNs) offer promising capabilities for high-speed, energy-efficient communication, particularly in mission-critical environments such as industrial automation, healthcare monitoring, and smart buildings. However, dynamic spectrum management and fault tolerance remain key challenges in ensuring reliable and timely data transmission. This [...] Read more.
Optical wireless sensor networks (OWSNs) offer promising capabilities for high-speed, energy-efficient communication, particularly in mission-critical environments such as industrial automation, healthcare monitoring, and smart buildings. However, dynamic spectrum management and fault tolerance remain key challenges in ensuring reliable and timely data transmission. This paper proposes an adaptive spectrum management framework (ASMF) that addresses these challenges through a mathematically grounded and implementation-driven approach. The ASMF formulates the spectrum allocation problem as a constrained Markov decision process and leverages a dual-layer optimization strategy combining Lyapunov drift-plus-penalty for queue stability with deep reinforcement learning for adaptive long-term decision making. Additionally, ASMF integrates a hybrid fault-tolerant mechanism using LSTM-based link failure prediction and lightweight recovery logic, achieving up to 83% prediction accuracy. Experimental evaluations using real-world datasets from industrial, healthcare, and smart infrastructure scenarios demonstrate that ASMF reduces critical traffic latency by 37%, improves reliability by 42% under fault conditions, and enhances energy efficiency by 22.6% compared with state-of-the-art methods. The system also maintains a 99.94% packet delivery ratio for critical traffic and achieves 69.7% faster recovery after link failures. These results confirm the effectiveness of ASMF as a robust and scalable solution for adaptive spectrum management in dynamic, fault-prone OWSN environments. Full article
(This article belongs to the Special Issue Advances in Mobile Network and Intelligent Communication)
Show Figures

Figure 1

22 pages, 2971 KB  
Article
Cooperative Schemes for Joint Latency and Energy Consumption Minimization in UAV-MEC Networks
by Ming Cheng, Saifei He, Yijin Pan, Min Lin and Wei-Ping Zhu
Sensors 2025, 25(17), 5234; https://doi.org/10.3390/s25175234 - 22 Aug 2025
Viewed by 454
Abstract
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both [...] Read more.
The Internet of Things (IoT) has promoted emerging applications that require massive device collaboration, heavy computation, and stringent latency. Unmanned aerial vehicle (UAV)-assisted mobile edge computing (MEC) systems can provide flexible services for user devices (UDs) with wide coverage. The optimization of both latency and energy consumption remains a critical yet challenging task due to the inherent trade-off between them. Joint association, offloading, and computing resource allocation are essential to achieving satisfying system performance. However, these processes are difficult due to the highly dynamic environment and the exponentially increasing complexity of large-scale networks. To address these challenges, we introduce a carefully designed cost function to balance the latency and the energy consumption, formulate the joint problem into a partially observable Markov decision process, and propose two multi-agent deep-reinforcement-learning-based schemes to tackle the long-term problem. Specifically, the multi-agent proximal policy optimization (MAPPO)-based scheme uses centralized learning and decentralized execution, while the closed-form enhanced multi-armed bandit (CF-MAB)-based scheme decouples association from offloading and computing resource allocation. In both schemes, UDs act as independent agents that learn from environmental interactions and historic decisions, make decision to maximize its individual reward function, and achieve implicit collaboration through the reward mechanism. The numerical results validate the effectiveness and show the superiority of our proposed schemes. The MAPPO-based scheme enables collaborative agent decisions for high performance in complex dynamic environments, while the CF-MAB-based scheme supports independent rapid response decisions. Full article
Show Figures

Figure 1

28 pages, 2209 KB  
Article
A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing
by Xiaochuan Wang, Na Li and Xingchen Jin
Algorithms 2025, 18(9), 536; https://doi.org/10.3390/a18090536 - 22 Aug 2025
Viewed by 298
Abstract
Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization [...] Read more.
Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization and exact linearization for heterogeneous fleet coordination. Given the NP-hard nature, we propose a Hyper-Heuristic based on Cumulative Reward Q-Learning (HHCRQL), integrating reinforcement learning with heuristic operators in a Markov Decision Process (MDP). The algorithm dynamically selects operators using a four-dimensional state space and a cumulative reward function combining timestep and fitness. Experiments show that, for small instances, HHCRQL achieves solutions within 3% of Gurobi’s optimum when customer nodes exceed 15, outperforming Large Neighborhood Search (LNS) and LNS with Simulated Annealing (LNSSA) with stable, shorter runtime. For large-scale instances, HHCRQL reduces gaps by up to 9.17% versus Iterated Local Search (ILS), 6.74% versus LNS, and 5.95% versus LNSSA, while maintaining relatively stable runtime. Real-world validation using Shanghai logistics data reduces waiting times by 35.36% and total transportation times by 24.68%, confirming HHCRQL’s effectiveness, robustness, and scalability. Full article
Show Figures

Figure 1

6 pages, 1627 KB  
Proceeding Paper
A Reinforcement Learning Solution for Queue Management in Public Utility Services
by Todor Dobrev, Miroslav Markov and Valentina Markova
Eng. Proc. 2025, 104(1), 6; https://doi.org/10.3390/engproc2025104006 - 22 Aug 2025
Viewed by 579
Abstract
This paper presents a reinforcement learning-based approach for optimizing queue management in public utility service environments. Using one year of real operational data from a utility office, a simulation model is developed to replicate daily service dynamics. A Q-learning agent is trained to [...] Read more.
This paper presents a reinforcement learning-based approach for optimizing queue management in public utility service environments. Using one year of real operational data from a utility office, a simulation model is developed to replicate daily service dynamics. A Q-learning agent is trained to allocate service types to counters dynamically, aiming to minimize client waiting time. The model treats the environment as a Markov Decision Process and uses an epsilon-greedy policy for learning optimal actions. Experimental results across multiple counter configurations demonstrate significant reductions in average waiting times, confirming the effectiveness and adaptability of the proposed method in dynamic service environments. Full article
Show Figures

Figure 1

52 pages, 15058 KB  
Article
Optimizing Autonomous Vehicle Navigation Through Reinforcement Learning in Dynamic Urban Environments
by Mohammed Abdullah Alsuwaiket
World Electr. Veh. J. 2025, 16(8), 472; https://doi.org/10.3390/wevj16080472 - 18 Aug 2025
Viewed by 512
Abstract
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates [...] Read more.
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates Proximal Policy Optimization (PPO) and Graph Neural Networks (GNNs) to effectively model urban features like intersections, traffic density, and pedestrian zones. A key innovation is the urban context-aware reward mechanism (UCARM), which dynamically adapts the reward structure based on traffic rules, congestion levels, and safety considerations. Additionally, the framework incorporates a Dynamic Risk Assessment Module (DRAM), which uses Bayesian inference combined with Markov Decision Processes (MDPs) to proactively evaluate collision risks and guide safer navigation. The framework’s performance was validated across three datasets—Argoverse, nuScenes, and CARLA. Results demonstrate significant improvements: An average travel time of 420 ± 20 s, a collision rate of 3.1%, and energy consumption of 11,833 ± 550 J in Argoverse; 410 ± 20 s, 2.5%, and 11,933 ± 450 J in nuScenes; and 450 ± 25 s, 3.6%, and 13,000 ± 600 J in CARLA. The proposed method achieved an average navigation success rate of 92.5%, consistently outperforming baseline models in safety, efficiency, and adaptability. These findings indicate the framework’s robustness and practical applicability for scalable AV deployment in real-world urban traffic conditions. Full article
(This article belongs to the Special Issue Modeling for Intelligent Vehicles)
Show Figures

Figure 1

20 pages, 2083 KB  
Article
Maritime Mobile Edge Computing for Sporadic Tasks: A PPO-Based Dynamic Offloading Strategy
by Yanglong Sun, Wenqian Luo, Zhiping Xu, Bo Lin, Weijian Xu and Weipeng Liu
Mathematics 2025, 13(16), 2643; https://doi.org/10.3390/math13162643 - 17 Aug 2025
Viewed by 278
Abstract
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and [...] Read more.
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and delays. This paper proposes a scheduling and offloading strategy tailored for MMEC scenarios driven by object detection tasks, which explicitly considers (1) the hierarchical structure of object detection models, and (2) the sporadic nature of maritime observation tasks. To minimize average task completion time under varying task arrival patterns, we formulate the average blocking delay minimization problem as a Markov Decision Process (MDP). Then, we propose an Orthogonalization-Normalization Proximal Policy Optimization (ON-PPO) algorithm, in which task category states are orthogonally encoded and system states are normalized. Experiments demonstrate that ON-PPO effectively learns policy parameters, mitigates interference between tasks of different categories during training, and adapts efficiently to sporadic task arrivals. Simulation results show that, compared to baseline algorithms, ON-PPO maintains stable task queues and achieves a 22.9% reduction in average task latency. Full article
Show Figures

Figure 1

21 pages, 1936 KB  
Article
A Dynamic Risk Control Methodology for Mission-Critical Systems Under Dependent Fault Processes
by Zijian Kang, Yuhan Ma, Bin Wang and Kaiye Gao
Mathematics 2025, 13(16), 2618; https://doi.org/10.3390/math13162618 - 15 Aug 2025
Viewed by 297
Abstract
Industrial systems operating under severe mission environment are frequently confronted with intricate failure behaviors arising from system internal degradation and extrinsic stresses, posing an elevating challenge to system survivability and mission reliability. Mission termination strategies are attracting increasing attention as an intuitive and [...] Read more.
Industrial systems operating under severe mission environment are frequently confronted with intricate failure behaviors arising from system internal degradation and extrinsic stresses, posing an elevating challenge to system survivability and mission reliability. Mission termination strategies are attracting increasing attention as an intuitive and effective means to mitigating catastrophic mission-induced risk. However, how to manage coupled risk arising from competing fault processes, particularly when these modes are interdependent, has been rarely reported in existing works. To bridge this gap, this study delves into a dynamic risk control policy for continuously degrading systems operating under a random shock environment, which yields competing and dependent fault processes. An optimal mission termination policy is developed to minimize risk-centered losses throughout the mission execution, whose optimization problem constitutes a finite-time Markov decision process. Some critical structural properties associated with the optimal policy are derived, and by leveraging these structures, the alerting threshold for implementing mission termination procedure is formally established. Alternative risk control policies are introduced for comparison, and experimental evaluations substantiate the superior model capacity in risk mitigation. Full article
Show Figures

Figure 1

28 pages, 2383 KB  
Article
CIM-LP: A Credibility-Aware Incentive Mechanism Based on Long Short-Term Memory and Proximal Policy Optimization for Mobile Crowdsensing
by Sijia Mu and Huahong Ma
Electronics 2025, 14(16), 3233; https://doi.org/10.3390/electronics14163233 - 14 Aug 2025
Viewed by 213
Abstract
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other [...] Read more.
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other areas. However, the enthusiasm of participants and the quality of uploaded data directly affect the reliability and practical value of the sensing results. Therefore, the design of incentive mechanisms has become a core issue in driving the healthy operation of MCS. The existing research, when optimizing long-term utility rewards for participants, has often failed to fully consider dynamic changes in trustworthiness. It has typically relied on historical data from a single point in time, overlooking the long-term dependencies in the time series, which results in suboptimal decision-making and limits the overall efficiency and fairness of sensing tasks. To address this issue, a credibility-aware incentive mechanism based on long short-term memory and proximal policy optimization (CIM-LP) is proposed. The mechanism employs a Markov decision process (MDP) model to describe the decision-making process of the participants. Without access to global information, an incentive model combining long short-term memory (LSTM) networks and proximal policy optimization (PPO), collectively referred to as LSTM-PPO, is utilized to formulate the most reasonable and effective sensing duration strategy for each participant, aiming to maximize the utility reward. After task completion, the participants’ credibility is dynamically updated by evaluating the quality of the uploaded data, which then adjusts their utility rewards for the next phase. Simulation results based on real datasets show that compared with several existing incentive algorithms, the CIM-LP mechanism increases the average utility of the participants by 6.56% to 112.76% and the task completion rate by 16.25% to 128.71%, demonstrating its significant advantages in improving data quality and task completion efficiency. Full article
Show Figures

Figure 1

14 pages, 460 KB  
Article
Modeling Local Search Metaheuristics Using Markov Decision Processes
by Rubén Ruiz-Torrubiano, Deepak Dhungana, Sarita Paudel and Himanshu Buckchash
Algorithms 2025, 18(8), 512; https://doi.org/10.3390/a18080512 - 14 Aug 2025
Viewed by 187
Abstract
Local search metaheuristics like tabu search or simulated annealing are popular heuristic optimization algorithms for finding near-optimal solutions for combinatorial optimization problems. However, it is still challenging for researchers and practitioners to analyze their behavior and systematically choose one over a vast set [...] Read more.
Local search metaheuristics like tabu search or simulated annealing are popular heuristic optimization algorithms for finding near-optimal solutions for combinatorial optimization problems. However, it is still challenging for researchers and practitioners to analyze their behavior and systematically choose one over a vast set of possible metaheuristics for the particular problem at hand. In this paper, we introduce a theoretical framework based on Markov Decision Processes (MDPs) for analyzing local search metaheuristics. This framework not only helps in providing convergence results for individual algorithms but also provides an explicit characterization of the exploration–exploitation tradeoff and a theory-grounded guidance for practitioners for choosing an appropriate metaheuristic for the problem at hand. We present this framework in detail and show how to apply it in the case of hill climbing and the simulated annealing algorithm, including computational experiments. Full article
Show Figures

Figure 1

20 pages, 1694 KB  
Article
Green Network Slicing Architecture Based on 5G-IoT and Next-Generation Technologies
by Mariame Amine, Abdellatif Kobbane, Jalel Ben-Othman and Mohammed El Koutbi
Appl. Sci. 2025, 15(16), 8938; https://doi.org/10.3390/app15168938 - 13 Aug 2025
Viewed by 382
Abstract
The rapid expansion of device connectivity and the increasing demand for data traffic have become pivotal aspects of our daily lives, especially within the Internet of Things (IoT) ecosystem. Consequently, operators are striving to identify the most innovative and robust solutions capable of [...] Read more.
The rapid expansion of device connectivity and the increasing demand for data traffic have become pivotal aspects of our daily lives, especially within the Internet of Things (IoT) ecosystem. Consequently, operators are striving to identify the most innovative and robust solutions capable of accommodating these escalating requirements. The emergence of the sliced fifth-generation mobile network (sliced 5G) offers a promising architecture that leverages a novel Radio Access Technology known as New Radio (NR), promising significantly enhanced data rate experiences. By integrating the network slicing (NS) architecture, greater flexibility and isolation are introduced into the preexisting infrastructure. The isolation effect of NS is particularly advantageous in mitigating interference between slices, as it empowers each slice to function independently. This paper addresses the user association challenge within a sliced 5G (NR)-IoT network. To this end, we present an Unconstrained-Markov Decision Process (U-MDP) model formulation of the problem. Subsequently, we propose the U-MDP association algorithm, which aims to determine the optimal user-to-slice associations. Unlike existing approaches that typically rely on static user association or separate optimization strategies, our U-MDP algorithm dynamically optimizes user-to-slice associations within a sliced 5G-IoT architecture, thereby enhancing adaptability to varying network conditions and improving overall system performance. Our numerical simulations validate the theoretical model and demonstrate the effectiveness of our proposed solution in enhancing overall system performance, all while upholding the quality of service requirements for all devices. Full article
Show Figures

Figure 1

17 pages, 789 KB  
Article
Modeling Marshaling Yard Processes with M/HypoK/1/m Queuing Model Under Failure Conditions
by Abate Sewagegn and Michal Dorda
Appl. Sci. 2025, 15(16), 8873; https://doi.org/10.3390/app15168873 - 12 Aug 2025
Viewed by 190
Abstract
This study presents a comprehensive analysis of the M/HypoK/1/m queuing model to evaluate the performance of marshaling yards in freight rail classification systems. The model effectively captures the complex, multi-phase nature of service and repair processes by incorporating hypo-exponential probability [...] Read more.
This study presents a comprehensive analysis of the M/HypoK/1/m queuing model to evaluate the performance of marshaling yards in freight rail classification systems. The model effectively captures the complex, multi-phase nature of service and repair processes by incorporating hypo-exponential probability distributions. The marshaling yard is modeled as a finite-capacity, single-server queue subject to potential server failures, reflecting real-world disruptions. Two complementary methodological frameworks are employed: a mathematical model based on continuous-time Markov chains (CTMCs) and a simulation model constructed using Colored Petri Nets (CPNs). In the analytical approach, both service time and repair time follow hypo-exponential distributions, which are used to approximate the gamma distribution. The simulation model built in CPN Tools allows for dynamic visualization and performance evaluation. In the CPN model, we applied a gamma distribution, which allowed us to evaluate the accuracy of the approximation implemented in the analytical model. The result indicated that utilization of the marshaling yard in primary shunting was approximately 23.81%, and with secondary shunting, 22.53%. The study output proves that the hypo-exponential distribution is able to approximate the gamma distribution. This dual-framework approach, combining analytics with simulation, provides a deeper understanding of system behavior, supporting data-driven decisions for capacity planning, failure mitigation, and operational optimization in freight rail networks. Full article
(This article belongs to the Special Issue New Technologies in Public Transport and Logistics)
Show Figures

Figure 1

26 pages, 2752 KB  
Article
Intelligent Impedance Strategy for Force–Motion Control of Robotic Manipulators in Unknown Environments via Expert-Guided Deep Reinforcement Learning
by Hui Shao, Weishi Hu, Li Yang, Wei Wang, Satoshi Suzuki and Zhiwei Gao
Processes 2025, 13(8), 2526; https://doi.org/10.3390/pr13082526 - 11 Aug 2025
Viewed by 582
Abstract
In robotic force–motion interaction tasks, ensuring stable and accurate force tracking in environments with unknown impedance and time-varying contact dynamics remains a key challenge. Addressing this, the study presents an intelligent impedance control (IIC) strategy that integrates model-based insights with deep reinforcement learning [...] Read more.
In robotic force–motion interaction tasks, ensuring stable and accurate force tracking in environments with unknown impedance and time-varying contact dynamics remains a key challenge. Addressing this, the study presents an intelligent impedance control (IIC) strategy that integrates model-based insights with deep reinforcement learning (DRL) to improve adaptability and robustness in complex manipulation scenarios. The control problem is formulated as a Markov Decision Process (MDP), and the Deep Deterministic Policy Gradient (DDPG) algorithm is employed to learn continuous impedance policies. To accelerate training and improve convergence stability, an expert-guided initialization strategy is introduced based on iterative error feedback, providing a weak-model-based demonstration to guide early exploration. To rigorously assess the impact of contact uncertainties on system behavior, a comprehensive performance analysis is conducted by utilizing a time- and frequency-domain approach, offering deep insights into how impedance modulation shapes both transient dynamics and steady-state accuracy across varying environmental conditions. A high-fidelity simulation platform based on MATLAB (version 2021b) multi-toolbox co-simulation is developed to emulate realistic robotic contact conditions. Quantitative results show that the IIC framework significantly reduces settling time, overshoot, and undershoot under dynamic contact conditions, while maintaining stability and generalization across a broad range of environments. Full article
Show Figures

Figure 1

24 pages, 1233 KB  
Article
DRL-Based Scheduling for AoI Minimization in CR Networks with Perfect Sensing
by Juan Sun, Shubin Zhang and Xinjie Yu
Entropy 2025, 27(8), 855; https://doi.org/10.3390/e27080855 - 11 Aug 2025
Viewed by 254
Abstract
Age of Information (AoI) is a newly introduced metric that quantifies the freshness and timeliness of data, playing a crucial role in applications reliant on time-sensitive information. Minimizing AoI through optimal scheduling is challenging, especially in energy-constrained Internet of Things (IoT) networks. In [...] Read more.
Age of Information (AoI) is a newly introduced metric that quantifies the freshness and timeliness of data, playing a crucial role in applications reliant on time-sensitive information. Minimizing AoI through optimal scheduling is challenging, especially in energy-constrained Internet of Things (IoT) networks. In this work, we begin by analyzing a simplified cognitive radio network (CRN) where a single secondary user (SU) harvests RF energy from the primary user and transmits status update packets when the PU spectrum is available. Time is divided into equal time slots, and the SU performs either energy harvesting, spectrum sensing, or status update transmission in each slot. To optimize the AoI within the CRN, we formulate the sequential decision-making process as a partially observable Markov decision process (POMDP) and employ dynamic programming to determine optimal actions. Then, we extend our investigation to evaluate the long-term average weighted sum of AoIs for a multi-SU CRN. Unlike the single-SU scenario, decisions must be made regarding which SU performs sensing and which SU forwards the status update packs. Given the partially observable nature of the PU spectrum, we propose an enhanced Deep Q-Network (DQN) algorithm. Simulation results demonstrate that the proposed policies significantly outperform the myopic policy. Additionally, we analyze the effect of various parameter settings on system performance. Full article
(This article belongs to the Section Information Theory, Probability and Statistics)
Show Figures

Figure 1

27 pages, 1523 KB  
Article
Reinforcement Learning-Based Agricultural Fertilization and Irrigation Considering N2O Emissions and Uncertain Climate Variability
by Zhaoan Wang, Shaoping Xiao, Jun Wang, Ashwin Parab and Shivam Patel
AgriEngineering 2025, 7(8), 252; https://doi.org/10.3390/agriengineering7080252 - 7 Aug 2025
Viewed by 493
Abstract
Nitrous oxide (N2O) emissions from agriculture are rising due to increased fertilizer use and intensive farming, posing a major challenge for climate mitigation. This study introduces a novel reinforcement learning (RL) framework to optimize farm management strategies that balance [...] Read more.
Nitrous oxide (N2O) emissions from agriculture are rising due to increased fertilizer use and intensive farming, posing a major challenge for climate mitigation. This study introduces a novel reinforcement learning (RL) framework to optimize farm management strategies that balance crop productivity with environmental impact, particularly N2O emissions. By modeling agricultural decision-making as a partially observable Markov decision process (POMDP), the framework accounts for uncertainties in environmental conditions and observational data. The approach integrates deep Q-learning with recurrent neural networks (RNNs) to train adaptive agents within a simulated farming environment. A Probabilistic Deep Learning (PDL) model was developed to estimate N2O emissions, achieving a high Prediction Interval Coverage Probability (PICP) of 0.937 within a 95% confidence interval on the available dataset. While the PDL model’s generalizability is currently constrained by the limited observational data, the RL framework itself is designed for broad applicability, capable of extending to diverse agricultural practices and environmental conditions. Results demonstrate that RL agents reduce N2O emissions without compromising yields, even under climatic variability. The framework’s flexibility allows for future integration of expanded datasets or alternative emission models, ensuring scalability as more field data becomes available. This work highlights the potential of artificial intelligence to advance climate-smart agriculture by simultaneously addressing productivity and sustainability goals in dynamic real-world settings. Full article
(This article belongs to the Special Issue Implementation of Artificial Intelligence in Agriculture)
Show Figures

Figure 1

Back to TopTop