Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (277)

Search Parameters:
Keywords = Markov decision process (MDP)

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
27 pages, 2189 KB  
Article
Miss-Triggered Content Cache Replacement Under Partial Observability: Transformer-Decoder Q-Learning
by Hakho Kim, Teh-Jen Sun and Eui-Nam Huh
Mathematics 2025, 13(19), 3217; https://doi.org/10.3390/math13193217 - 7 Oct 2025
Viewed by 82
Abstract
Content delivery networks (CDNs) face steadily rising, uneven demand, straining heuristic cache replacement. Reinforcement learning (RL) is promising, but most work assumes a fully observable Markov Decision Process (MDP), unrealistic under delayed, partial, and noisy signals. We model cache replacement as a Partially [...] Read more.
Content delivery networks (CDNs) face steadily rising, uneven demand, straining heuristic cache replacement. Reinforcement learning (RL) is promising, but most work assumes a fully observable Markov Decision Process (MDP), unrealistic under delayed, partial, and noisy signals. We model cache replacement as a Partially Observable MDP (POMDP) and present the Miss-Triggered Cache Transformer (MTCT), a Transformer-decoder Q-learning agent that encodes recent histories with self-attention. MTCT invokes its policy only on cache misses to align compute with informative events and uses a delayed-hit reward to propagate information from hits. A compact, rank-based action set (12 actions by default) captures popularity–recency trade-offs with complexity independent of cache capacity. We evaluate MTCT on a real trace (MovieLens) and two synthetic workloads (Mandelbrot–Zipf, Pareto) against Adaptive Replacement Cache (ARC), Windowed TinyLFU (W-TinyLFU), classical heuristics, and Double Deep Q-Network (DDQN). MTCT achieves the best or statistically comparable cache-hit rates on most cache sizes; e.g., on MovieLens at M=600, it reaches 0.4703 (DDQN 0.4436, ARC 0.4513). Miss-triggered inference also lowers mean wall-clock time per episode; Transformer inference is well suited to modern hardware acceleration. Ablations support CL=50 and show that finer action grids improve stability and final accuracy. Full article
(This article belongs to the Section E1: Mathematics and Computer Science)
Show Figures

Figure 1

31 pages, 1841 KB  
Article
Joint Scheduling and Placement for Vehicular Intelligent Applications Under QoS Constraints: A PPO-Based Precedence-Preserving Approach
by Wei Shi and Bo Chen
Mathematics 2025, 13(19), 3130; https://doi.org/10.3390/math13193130 - 30 Sep 2025
Viewed by 146
Abstract
The increasing demand for low-latency, computationally intensive vehicular applications, such as autonomous navigation and real-time perception, has led to the adoption of cloud–edge–vehicle infrastructures. These applications are often modeled as Directed Acyclic Graphs (DAGs) with interdependent subtasks, where precedence constraints enforce causal ordering [...] Read more.
The increasing demand for low-latency, computationally intensive vehicular applications, such as autonomous navigation and real-time perception, has led to the adoption of cloud–edge–vehicle infrastructures. These applications are often modeled as Directed Acyclic Graphs (DAGs) with interdependent subtasks, where precedence constraints enforce causal ordering while allowing concurrency. We propose a task offloading framework that decomposes applications into precedence-constrained subtasks and formulates the joint scheduling and offloading problem as a Markov Decision Process (MDP) to capture the latency–energy trade-off. The system state incorporates vehicle positions, wireless link quality, server load, and task-buffer status. To address the high dimensionality and sequential nature of scheduling, we introduce DepSchedPPO, a dependency-aware sequence-to-sequence policy that processes subtasks in topological order and generates placement decisions using action masking to ensure partial-order feasibility. This policy is trained using Proximal Policy Optimization (PPO) with clipped surrogates, ensuring stable and sample-efficient learning under dynamic task dependencies. Extensive simulations show that our approach consistently reduces task latency, energy consumption and QOS compared to conventional heuristic and DRL-based methods. The proposed solution demonstrates strong applicability to real-time vehicular scenarios such as autonomous navigation, cooperative sensing, and edge-based perception. Full article
Show Figures

Figure 1

23 pages, 2613 KB  
Article
Learning to Balance Mixed Adversarial Attacks for Robust Reinforcement Learning
by Mustafa Erdem and Nazım Kemal Üre
Mach. Learn. Knowl. Extr. 2025, 7(4), 108; https://doi.org/10.3390/make7040108 - 24 Sep 2025
Viewed by 458
Abstract
Reinforcement learning agents are highly susceptible to adversarial attacks that can severely compromise their performance. Although adversarial training is a common countermeasure, most existing research focuses on defending against single-type attacks targeting either observations or actions. This narrow focus overlooks the complexity of [...] Read more.
Reinforcement learning agents are highly susceptible to adversarial attacks that can severely compromise their performance. Although adversarial training is a common countermeasure, most existing research focuses on defending against single-type attacks targeting either observations or actions. This narrow focus overlooks the complexity of real-world mixed attacks, where an agent’s perceptions and resulting actions are perturbed simultaneously. To systematically study these threats, we introduce the Action and State-Adversarial Markov Decision Process (ASA-MDP), which models the interaction as a zero-sum game between the agent and an adversary attacking both states and actions. Using this framework, we show that agents trained conventionally or against single-type attacks remain highly vulnerable to mixed perturbations. Moreover, we identify a key challenge in this setting: a naive mixed-type adversary often fails to effectively balance its perturbations across modalities during training, limiting the agent’s robustness. To address this, we propose the Action and State-Adversarial Proximal Policy Optimization (ASA-PPO) algorithm, which enables the adversary to learn a balanced strategy, distributing its attack budget across both state and action spaces. This, in turn, enhances the robustness of the trained agent against a wide range of adversarial scenarios. Comprehensive experiments across diverse environments demonstrate that policies trained with ASA-PPO substantially outperform baselines—including standard PPO and single-type adversarial methods—under action-only, observation-only, and, most notably, mixed-attack conditions. Full article
Show Figures

Figure 1

24 pages, 3359 KB  
Article
A Unified Scheduling Model for Agile Earth Observation Satellites Based on DQG and PPO
by Mengmeng Qin, Zhanpeng Xu, Xuesheng Zhao, Wenbin Sun, Wenlan Xie and Qingping Liu
Aerospace 2025, 12(9), 844; https://doi.org/10.3390/aerospace12090844 - 18 Sep 2025
Viewed by 294
Abstract
Agile Earth Observation Satellites (AEOSs), with their maneuverability, can flexibly observe point, line and region targets. However, existing research typically requires distinct algorithms for each target type, lacking a unified modeling and solution framework, which hinders the ability to meet the demands of [...] Read more.
Agile Earth Observation Satellites (AEOSs), with their maneuverability, can flexibly observe point, line and region targets. However, existing research typically requires distinct algorithms for each target type, lacking a unified modeling and solution framework, which hinders the ability to meet the demands of rapid and coordinated observation of multiple target types in complex scenarios. To address these issues, this paper proposes a unified scheduling model for agile Earth observation satellites based on the Degenerate Quadtree Grid (DQG) and Proximal Policy Optimization (PPO), termed AEOSSP-USM. Firstly, the DQG is first employed to enable unified management and integrated modeling of point, line, and area targets; Secondly, traditional time window calculations based on longitude and latitude are replaced with grid code-based computations using DQG; Finally, the PPO algorithm, a deep reinforcement learning method, is introduced to formulate AEOSSP-USM as a Markov Decision Process (MDP), enabling efficient problem solving. Experimental results demonstrate that the proposed method effectively realizes unified scheduling of heterogeneous targets, improving imaging quality about 3 times, reducing energy consumption by 10%, decreasing memory usage more than 90%, and enhancing computational efficiency by 35 times compared to conventional longitude-latitude strip algorithm. Full article
(This article belongs to the Section Astronautics & Space Science)
Show Figures

Figure 1

34 pages, 2973 KB  
Article
A Markov Decision Process and Adapted Particle Swarm Optimization-Based Approach for the Hydropower Dispatch Problem—Jirau Hydropower Plant Case Study
by Mateus Santos, Marcelo Fonseca, José Bernardes, Lenio Prado, Thiago Abreu, Edson Bortoni and Guilherme Bastos
Energies 2025, 18(18), 4919; https://doi.org/10.3390/en18184919 - 16 Sep 2025
Viewed by 337
Abstract
This work focuses on optimizing energy dispatch in a hydroelectric power plant (HPP) with a large number of generating units (GUs) and uncertainties caused by sediment accumulation in the water intakes. The study was realized at Jirau HPP, and integrates Markov Decision Processes [...] Read more.
This work focuses on optimizing energy dispatch in a hydroelectric power plant (HPP) with a large number of generating units (GUs) and uncertainties caused by sediment accumulation in the water intakes. The study was realized at Jirau HPP, and integrates Markov Decision Processes (MDPs) and Particle Swarm Optimization (PSO) to minimize losses and enhance the performance of the plant’s GUs. Given the complexity of managing the huge number of units (50) and mitigating load losses from sediment accumulation, this approach enables real-time decision-making and optimizes energy dispatch. The methodology involves modeling the operational characteristics of the GUs, developing an objective function to minimize water consumption and maximize energy efficiency, and utilizing MDPs and PSO to find globally optimal solutions. Our results show that this methodology improves efficiency, reducing the turbinated flow by 0.9% while increasing energy generation by 0.34% and overall yield by 0.33% compared to the HPP traditional method of dispatch over the analyzed period. This strategy could be adapted to varying operational conditions, and could provide a reliable framework for hydropower dispatch optimization. Full article
(This article belongs to the Section F: Electrical Engineering)
Show Figures

Figure 1

24 pages, 5614 KB  
Article
Efficient Target Assignment via Binarized SHP Path Planning and Plasticity-Aware RL in Urban Adversarial Scenarios
by Xiyao Ding, Hao Chen, Yu Wang, Dexing Wei, Ke Fu, Linyue Liu, Benke Gao, Quan Liu and Jian Huang
Appl. Sci. 2025, 15(17), 9630; https://doi.org/10.3390/app15179630 - 1 Sep 2025
Viewed by 500
Abstract
Accurate and feasible target assignment in an urban environment without road networks remains challenging. Existing methods exhibit critical limitations: computational inefficiency preventing real-time decision-making requirements and poor cross-scenario generalization, yielding task-specific policies that lack adaptability. To achieve efficient target assignment in urban adversarial [...] Read more.
Accurate and feasible target assignment in an urban environment without road networks remains challenging. Existing methods exhibit critical limitations: computational inefficiency preventing real-time decision-making requirements and poor cross-scenario generalization, yielding task-specific policies that lack adaptability. To achieve efficient target assignment in urban adversarial scenarios, we propose an efficient traversable path generation method requiring only binarized images, along with four key constraint models serving as optimization objectives. Moreover, we model this optimization problem as a Markov decision process (MDP) and introduce the generalization sequential proximal policy optimization (GSPPO) algorithm within the reinforcement learning (RL) framework. Specifically, GSPPO integrates an exploration history representation module (EHR) and a neuron-specific plasticity enhancement module (NPE). EHR incorporates exploration history into the policy learning loop, which significantly improves learning efficiency. To mitigate the plasticity loss in neural networks, we propose an NPE module, which boosts the model’s representational capability and generalization across diverse tasks. Experiments demonstrate that our approach reduces planning time by four orders of magnitude compared to the online planning method. Against the benchmark algorithm, it achieves 94.16% higher convergence performance, 33.54% shorter assignment path length, 51.96% lower threat value, and 40.71% faster total time. Our approach supports real-time military reconnaissance and will also facilitate rescue operations in complex cities. Full article
Show Figures

Figure 1

17 pages, 2179 KB  
Article
Federated Multi-Agent DRL for Task Offloading in Vehicular Edge Computing
by Hongwei Zhao, Yu Li, Zhixi Pang and Zihan Ma
Electronics 2025, 14(17), 3501; https://doi.org/10.3390/electronics14173501 - 1 Sep 2025
Viewed by 877
Abstract
With the expansion of vehicle-to-everything (V2X) networks and the rising demand for intelligent services, vehicle edge computing encounters heightened requirements for more efficient task offloading. This study proposes a task offloading technique that utilizes federated collaboration and multi-agent deep reinforcement learning to reduce [...] Read more.
With the expansion of vehicle-to-everything (V2X) networks and the rising demand for intelligent services, vehicle edge computing encounters heightened requirements for more efficient task offloading. This study proposes a task offloading technique that utilizes federated collaboration and multi-agent deep reinforcement learning to reduce system latency and energy consumption. The task offloading issue is formulated as a Markov decision process (MDP), and a framework utilizing the Multi-Agent Dueling Double Deep Q-Network (MAD3QN) is developed to facilitate agents in making optimal offloading decisions inside intricate environments. Secondly, Federated Learning (FL) is implemented during the training phase, leveraging local training outcomes from many vehicles to enhance the global model, thus augmenting the learning efficiency of the agents. Experimental results indicate that, compared to conventional baseline algorithms, the proposed method decreases latency and energy consumption by at least 10% and 9%, respectively, while enhancing the average reward by at least 21%. Full article
Show Figures

Figure 1

18 pages, 3066 KB  
Article
A Tree-Based Search Algorithm with Global Pheromone and Local Signal Guidance for Scientific Chart Reasoning
by Min Zhou, Zhiheng Qi, Tianlin Zhu, Jan Vijg and Xiaoshui Huang
Mathematics 2025, 13(17), 2739; https://doi.org/10.3390/math13172739 - 26 Aug 2025
Viewed by 593
Abstract
Chart reasoning, a critical task for automating data interpretation in domains such as aiding scientific data analysis and medical diagnostics, leverages large-scale vision language models (VLMs) to interpret chart images and answer natural language questions, enabling semantic understanding that enhances knowledge accessibility and [...] Read more.
Chart reasoning, a critical task for automating data interpretation in domains such as aiding scientific data analysis and medical diagnostics, leverages large-scale vision language models (VLMs) to interpret chart images and answer natural language questions, enabling semantic understanding that enhances knowledge accessibility and supports data-driven decision making across diverse domains. In this work, we formalize chart reasoning as a sequential decision-making problem governed by a Markov Decision Process (MDP), thereby providing a mathematically grounded framework for analyzing visual question answering tasks. While recent advances such as multi-step reasoning with Monte Carlo tree search (MCTS) offer interpretable and stochastic planning capabilities, these methods often suffer from redundant path exploration and inefficient reward propagation. To address these challenges, we propose a novel algorithmic framework that integrates a pheromone-guided search strategy inspired by Ant Colony Optimization (ACO). In our approach, chart reasoning is cast as a combinatorial optimization problem over a dynamically evolving search tree, where path desirability is governed by pheromone concentration functions that capture global phenomena across search episodes and are reinforced through trajectory-level rewards. Transition probabilities are further modulated by local signals, which are evaluations derived from the immediate linguistic feedback of large language models. This enables fine grained decision making at each step while preserving long-term planning efficacy. Extensive experiments across four benchmark datasets, ChartQA, MathVista, GRAB, and ChartX, demonstrate the effectiveness of our approach, with multi-agent reasoning and pheromone guidance yielding success rate improvements of +18.4% and +7.6%, respectively. Full article
(This article belongs to the Special Issue Multimodal Deep Learning and Its Application in Healthcare)
Show Figures

Figure 1

28 pages, 2209 KB  
Article
A Reinforcement Learning Hyper-Heuristic with Cumulative Rewards for Dual-Peak Time-Varying Network Optimization in Heterogeneous Multi-Trip Vehicle Routing
by Xiaochuan Wang, Na Li and Xingchen Jin
Algorithms 2025, 18(9), 536; https://doi.org/10.3390/a18090536 - 22 Aug 2025
Viewed by 817
Abstract
Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization [...] Read more.
Urban logistics face complexity due to traffic congestion, fleet heterogeneity, warehouse constraints, and driver workload balancing, especially in the Heterogeneous Multi-Trip Vehicle Routing Problem with Time Windows and Time-Varying Networks (HMTVRPTW-TVN). We develop a mixed-integer linear programming (MILP) model with dual-peak time discretization and exact linearization for heterogeneous fleet coordination. Given the NP-hard nature, we propose a Hyper-Heuristic based on Cumulative Reward Q-Learning (HHCRQL), integrating reinforcement learning with heuristic operators in a Markov Decision Process (MDP). The algorithm dynamically selects operators using a four-dimensional state space and a cumulative reward function combining timestep and fitness. Experiments show that, for small instances, HHCRQL achieves solutions within 3% of Gurobi’s optimum when customer nodes exceed 15, outperforming Large Neighborhood Search (LNS) and LNS with Simulated Annealing (LNSSA) with stable, shorter runtime. For large-scale instances, HHCRQL reduces gaps by up to 9.17% versus Iterated Local Search (ILS), 6.74% versus LNS, and 5.95% versus LNSSA, while maintaining relatively stable runtime. Real-world validation using Shanghai logistics data reduces waiting times by 35.36% and total transportation times by 24.68%, confirming HHCRQL’s effectiveness, robustness, and scalability. Full article
Show Figures

Figure 1

52 pages, 15058 KB  
Article
Optimizing Autonomous Vehicle Navigation Through Reinforcement Learning in Dynamic Urban Environments
by Mohammed Abdullah Alsuwaiket
World Electr. Veh. J. 2025, 16(8), 472; https://doi.org/10.3390/wevj16080472 - 18 Aug 2025
Viewed by 994
Abstract
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates [...] Read more.
Autonomous vehicle (AV) navigation in dynamic urban environments faces challenges such as unpredictable traffic conditions, varying road user behaviors, and complex road networks. This study proposes a novel reinforcement learning-based framework that enhances AV decision making through spatial-temporal context awareness. The framework integrates Proximal Policy Optimization (PPO) and Graph Neural Networks (GNNs) to effectively model urban features like intersections, traffic density, and pedestrian zones. A key innovation is the urban context-aware reward mechanism (UCARM), which dynamically adapts the reward structure based on traffic rules, congestion levels, and safety considerations. Additionally, the framework incorporates a Dynamic Risk Assessment Module (DRAM), which uses Bayesian inference combined with Markov Decision Processes (MDPs) to proactively evaluate collision risks and guide safer navigation. The framework’s performance was validated across three datasets—Argoverse, nuScenes, and CARLA. Results demonstrate significant improvements: An average travel time of 420 ± 20 s, a collision rate of 3.1%, and energy consumption of 11,833 ± 550 J in Argoverse; 410 ± 20 s, 2.5%, and 11,933 ± 450 J in nuScenes; and 450 ± 25 s, 3.6%, and 13,000 ± 600 J in CARLA. The proposed method achieved an average navigation success rate of 92.5%, consistently outperforming baseline models in safety, efficiency, and adaptability. These findings indicate the framework’s robustness and practical applicability for scalable AV deployment in real-world urban traffic conditions. Full article
(This article belongs to the Special Issue Modeling for Intelligent Vehicles)
Show Figures

Figure 1

20 pages, 2083 KB  
Article
Maritime Mobile Edge Computing for Sporadic Tasks: A PPO-Based Dynamic Offloading Strategy
by Yanglong Sun, Wenqian Luo, Zhiping Xu, Bo Lin, Weijian Xu and Weipeng Liu
Mathematics 2025, 13(16), 2643; https://doi.org/10.3390/math13162643 - 17 Aug 2025
Viewed by 455
Abstract
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and [...] Read more.
Maritime mobile edge computing (MMEC) technology enables the deployment of high-precision, computationally intensive object detection tasks on resource-constrained edge devices. However, dynamic network conditions and limited communication resources significantly degrade the performance of static offloading strategies, leading to increased task blocking probability and delays. This paper proposes a scheduling and offloading strategy tailored for MMEC scenarios driven by object detection tasks, which explicitly considers (1) the hierarchical structure of object detection models, and (2) the sporadic nature of maritime observation tasks. To minimize average task completion time under varying task arrival patterns, we formulate the average blocking delay minimization problem as a Markov Decision Process (MDP). Then, we propose an Orthogonalization-Normalization Proximal Policy Optimization (ON-PPO) algorithm, in which task category states are orthogonally encoded and system states are normalized. Experiments demonstrate that ON-PPO effectively learns policy parameters, mitigates interference between tasks of different categories during training, and adapts efficiently to sporadic task arrivals. Simulation results show that, compared to baseline algorithms, ON-PPO maintains stable task queues and achieves a 22.9% reduction in average task latency. Full article
Show Figures

Figure 1

28 pages, 2383 KB  
Article
CIM-LP: A Credibility-Aware Incentive Mechanism Based on Long Short-Term Memory and Proximal Policy Optimization for Mobile Crowdsensing
by Sijia Mu and Huahong Ma
Electronics 2025, 14(16), 3233; https://doi.org/10.3390/electronics14163233 - 14 Aug 2025
Viewed by 319
Abstract
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other [...] Read more.
In the field of mobile crowdsensing (MCS), a large number of tasks rely on the participation of ordinary mobile device users for data collection and processing. This model has shown great potential for applications in environmental monitoring, traffic management, public safety, and other areas. However, the enthusiasm of participants and the quality of uploaded data directly affect the reliability and practical value of the sensing results. Therefore, the design of incentive mechanisms has become a core issue in driving the healthy operation of MCS. The existing research, when optimizing long-term utility rewards for participants, has often failed to fully consider dynamic changes in trustworthiness. It has typically relied on historical data from a single point in time, overlooking the long-term dependencies in the time series, which results in suboptimal decision-making and limits the overall efficiency and fairness of sensing tasks. To address this issue, a credibility-aware incentive mechanism based on long short-term memory and proximal policy optimization (CIM-LP) is proposed. The mechanism employs a Markov decision process (MDP) model to describe the decision-making process of the participants. Without access to global information, an incentive model combining long short-term memory (LSTM) networks and proximal policy optimization (PPO), collectively referred to as LSTM-PPO, is utilized to formulate the most reasonable and effective sensing duration strategy for each participant, aiming to maximize the utility reward. After task completion, the participants’ credibility is dynamically updated by evaluating the quality of the uploaded data, which then adjusts their utility rewards for the next phase. Simulation results based on real datasets show that compared with several existing incentive algorithms, the CIM-LP mechanism increases the average utility of the participants by 6.56% to 112.76% and the task completion rate by 16.25% to 128.71%, demonstrating its significant advantages in improving data quality and task completion efficiency. Full article
Show Figures

Figure 1

14 pages, 460 KB  
Article
Modeling Local Search Metaheuristics Using Markov Decision Processes
by Rubén Ruiz-Torrubiano, Deepak Dhungana, Sarita Paudel and Himanshu Buckchash
Algorithms 2025, 18(8), 512; https://doi.org/10.3390/a18080512 - 14 Aug 2025
Viewed by 371
Abstract
Local search metaheuristics like tabu search or simulated annealing are popular heuristic optimization algorithms for finding near-optimal solutions for combinatorial optimization problems. However, it is still challenging for researchers and practitioners to analyze their behavior and systematically choose one over a vast set [...] Read more.
Local search metaheuristics like tabu search or simulated annealing are popular heuristic optimization algorithms for finding near-optimal solutions for combinatorial optimization problems. However, it is still challenging for researchers and practitioners to analyze their behavior and systematically choose one over a vast set of possible metaheuristics for the particular problem at hand. In this paper, we introduce a theoretical framework based on Markov Decision Processes (MDPs) for analyzing local search metaheuristics. This framework not only helps in providing convergence results for individual algorithms but also provides an explicit characterization of the exploration–exploitation tradeoff and a theory-grounded guidance for practitioners for choosing an appropriate metaheuristic for the problem at hand. We present this framework in detail and show how to apply it in the case of hill climbing and the simulated annealing algorithm, including computational experiments. Full article
Show Figures

Figure 1

20 pages, 1694 KB  
Article
Green Network Slicing Architecture Based on 5G-IoT and Next-Generation Technologies
by Mariame Amine, Abdellatif Kobbane, Jalel Ben-Othman and Mohammed El Koutbi
Appl. Sci. 2025, 15(16), 8938; https://doi.org/10.3390/app15168938 - 13 Aug 2025
Viewed by 634
Abstract
The rapid expansion of device connectivity and the increasing demand for data traffic have become pivotal aspects of our daily lives, especially within the Internet of Things (IoT) ecosystem. Consequently, operators are striving to identify the most innovative and robust solutions capable of [...] Read more.
The rapid expansion of device connectivity and the increasing demand for data traffic have become pivotal aspects of our daily lives, especially within the Internet of Things (IoT) ecosystem. Consequently, operators are striving to identify the most innovative and robust solutions capable of accommodating these escalating requirements. The emergence of the sliced fifth-generation mobile network (sliced 5G) offers a promising architecture that leverages a novel Radio Access Technology known as New Radio (NR), promising significantly enhanced data rate experiences. By integrating the network slicing (NS) architecture, greater flexibility and isolation are introduced into the preexisting infrastructure. The isolation effect of NS is particularly advantageous in mitigating interference between slices, as it empowers each slice to function independently. This paper addresses the user association challenge within a sliced 5G (NR)-IoT network. To this end, we present an Unconstrained-Markov Decision Process (U-MDP) model formulation of the problem. Subsequently, we propose the U-MDP association algorithm, which aims to determine the optimal user-to-slice associations. Unlike existing approaches that typically rely on static user association or separate optimization strategies, our U-MDP algorithm dynamically optimizes user-to-slice associations within a sliced 5G-IoT architecture, thereby enhancing adaptability to varying network conditions and improving overall system performance. Our numerical simulations validate the theoretical model and demonstrate the effectiveness of our proposed solution in enhancing overall system performance, all while upholding the quality of service requirements for all devices. Full article
Show Figures

Figure 1

26 pages, 2752 KB  
Article
Intelligent Impedance Strategy for Force–Motion Control of Robotic Manipulators in Unknown Environments via Expert-Guided Deep Reinforcement Learning
by Hui Shao, Weishi Hu, Li Yang, Wei Wang, Satoshi Suzuki and Zhiwei Gao
Processes 2025, 13(8), 2526; https://doi.org/10.3390/pr13082526 - 11 Aug 2025
Viewed by 895
Abstract
In robotic force–motion interaction tasks, ensuring stable and accurate force tracking in environments with unknown impedance and time-varying contact dynamics remains a key challenge. Addressing this, the study presents an intelligent impedance control (IIC) strategy that integrates model-based insights with deep reinforcement learning [...] Read more.
In robotic force–motion interaction tasks, ensuring stable and accurate force tracking in environments with unknown impedance and time-varying contact dynamics remains a key challenge. Addressing this, the study presents an intelligent impedance control (IIC) strategy that integrates model-based insights with deep reinforcement learning (DRL) to improve adaptability and robustness in complex manipulation scenarios. The control problem is formulated as a Markov Decision Process (MDP), and the Deep Deterministic Policy Gradient (DDPG) algorithm is employed to learn continuous impedance policies. To accelerate training and improve convergence stability, an expert-guided initialization strategy is introduced based on iterative error feedback, providing a weak-model-based demonstration to guide early exploration. To rigorously assess the impact of contact uncertainties on system behavior, a comprehensive performance analysis is conducted by utilizing a time- and frequency-domain approach, offering deep insights into how impedance modulation shapes both transient dynamics and steady-state accuracy across varying environmental conditions. A high-fidelity simulation platform based on MATLAB (version 2021b) multi-toolbox co-simulation is developed to emulate realistic robotic contact conditions. Quantitative results show that the IIC framework significantly reduces settling time, overshoot, and undershoot under dynamic contact conditions, while maintaining stability and generalization across a broad range of environments. Full article
Show Figures

Figure 1

Back to TopTop