MDPI - Publisher of Open Access Journals

18 pages, 3117 KiB

Open AccessArticle

Intelligent Robot in Unknown Environments: Walk Path Using Q-Learning and Deep Q-Learning

by Mouna El Wafi, My Abdelkader Youssefi, Rachid Dakir and Mohamed Bakir

Automation 2025, 6(1), 12; https://doi.org/10.3390/automation6010012 - 18 Mar 2025

Viewed by 398

Autonomous navigation is essential for mobile robots to efficiently operate in complex environments. This study investigates Q-learning and Deep Q-learning to improve navigation performance. The research examines their effectiveness in complex maze configurations, focusing on how the epsilon-greedy strategy influences the agent’s ability [...] Read more.

Autonomous navigation is essential for mobile robots to efficiently operate in complex environments. This study investigates Q-learning and Deep Q-learning to improve navigation performance. The research examines their effectiveness in complex maze configurations, focusing on how the epsilon-greedy strategy influences the agent’s ability to reach its goal in minimal time using Q-learning. A distinctive aspect of this work is the adaptive tuning of hyperparameters, where alpha and gamma values are dynamically adjusted throughout training. This eliminates the need for manually fixed parameters and enables the learning algorithm to automatically determine optimal values, ensuring adaptability to diverse environments rather than being constrained to specific cases. By integrating neural networks, Deep Q-learning enhances decision-making in complex navigation tasks. Simulations carried out in MATLAB environments validate the proposed approach, illustrating its effectiveness in resource-constrained systems while preserving robust and efficient decision-making. Experimental results demonstrate that adaptive hyperparameter tuning significantly improves learning efficiency, leading to faster convergence and reduced navigation time. Additionally, Deep Q-learning exhibits superior performance in complex environments, showcasing enhanced decision-making capabilities in high-dimensional state spaces. These findings highlight the advantages of reinforcement learning-based navigation and emphasize how adaptive exploration strategies and dynamic parameter adjustments enhance performance across diverse scenarios. Full article

► Show Figures

Figure 1

17 pages, 2599 KiB

Open AccessArticle

Reinforcement Learning-Enhanced Adaptive Scheduling of Battery Energy Storage Systems in Energy Markets

by Yang Liu, Qiuyu Lu, Zhenfan Yu, Yue Chen and Yinguo Yang

Energies 2024, 17(21), 5425; https://doi.org/10.3390/en17215425 - 30 Oct 2024

Cited by 1 | Viewed by 1194

Abstract

Battery Energy Storage Systems (BESSs) play a vital role in modern power grids by optimally dispatching energy according to the price signal. This paper proposes a reinforcement learning-based model that optimizes BESS scheduling with the proposed Q-learning algorithm combined with an epsilon-greedy strategy. [...] Read more.

Battery Energy Storage Systems (BESSs) play a vital role in modern power grids by optimally dispatching energy according to the price signal. This paper proposes a reinforcement learning-based model that optimizes BESS scheduling with the proposed Q-learning algorithm combined with an epsilon-greedy strategy. The proposed epsilon-greedy strategy-based Q-learning algorithm can efficiently manage energy dispatching under uncertain price signals and multi-day operations without retraining. Simulations are conducted under different scenarios, considering electricity price fluctuations and battery aging conditions. Results show that the proposed algorithm demonstrates enhanced economic returns and adaptability compared to traditional methods, providing a practical solution for intelligent BESS scheduling that supports grid stability and the efficient use of renewable energy. Full article

(This article belongs to the Topic Intelligent, Flexible, and Effective Operation of Smart Grids with Novel Energy Technologies and Equipment)

► Show Figures

Figure 1

22 pages, 5750 KiB

Open AccessArticle

Deep Q-Learning-Based Smart Scheduling of EVs for Demand Response in Smart Grids

by Viorica Rozina Chifu, Tudor Cioara, Cristina Bianca Pop, Horia Gabriel Rusu and Ionut Anghel

Appl. Sci. 2024, 14(4), 1421; https://doi.org/10.3390/app14041421 - 8 Feb 2024

Cited by 8 | Viewed by 1771

Abstract

Economic and policy factors are driving the continuous increase in the adoption and usage of electrical vehicles (EVs). However, despite being a cleaner alternative to combustion engine vehicles, EVs have negative impacts on the lifespan of microgrid equipment and energy balance due to [...] Read more.

Economic and policy factors are driving the continuous increase in the adoption and usage of electrical vehicles (EVs). However, despite being a cleaner alternative to combustion engine vehicles, EVs have negative impacts on the lifespan of microgrid equipment and energy balance due to increased power demands and the timing of their usage. In our view, grid management should leverage on EV scheduling flexibility to support local network balancing through active participation in demand response programs. In this paper, we propose a model-free solution, leveraging deep Q-learning to schedule the charging and discharging activities of EVs within a microgrid to align with a target energy profile provided by the distribution system operator. We adapted the Bellman equation to assess the value of a state based on specific rewards for EV scheduling actions and used a neural network to estimate Q-values for available actions and the epsilon-greedy algorithm to balance exploitation and exploration to meet the target energy profile. The results are promising, showing the effectiveness of the proposed solution in scheduling the charging and discharging actions for a fleet of 30 EVs to align with the target energy profile in demand response programs, achieving a Pearson coefficient of 0.99. This solution also demonstrates a high degree of adaptability in effectively managing scheduling situations for EVs that involve dynamicity, influenced by various state-of-charge distributions and e-mobility features. Adaptability is achieved solely through learning from data without requiring prior knowledge, configurations, or fine-tuning. Full article

(This article belongs to the Special Issue Advances in Neural Networks and Deep Learning)

► Show Figures

Figure 1

23 pages, 12948 KiB

Open AccessArticle

Improved Robot Path Planning Method Based on Deep Reinforcement Learning

by Huiyan Han, Jiaqi Wang, Liqun Kuang, Xie Han and Hongxin Xue

Sensors 2023, 23(12), 5622; https://doi.org/10.3390/s23125622 - 15 Jun 2023

Cited by 17 | Viewed by 4174

Abstract

With the advancement of robotics, the field of path planning is currently experiencing a period of prosperity. Researchers strive to address this nonlinear problem and have achieved remarkable results through the implementation of the Deep Reinforcement Learning (DRL) algorithm DQN (Deep Q-Network). However, [...] Read more.

With the advancement of robotics, the field of path planning is currently experiencing a period of prosperity. Researchers strive to address this nonlinear problem and have achieved remarkable results through the implementation of the Deep Reinforcement Learning (DRL) algorithm DQN (Deep Q-Network). However, persistent challenges remain, including the curse of dimensionality, difficulties of model convergence and sparsity in rewards. To tackle these problems, this paper proposes an enhanced DDQN (Double DQN) path planning approach, in which the information after dimensionality reduction is fed into a two-branch network that incorporates expert knowledge and an optimized reward function to guide the training process. The data generated during the training phase are initially discretized into corresponding low-dimensional spaces. An “expert experience” module is introduced to facilitate the model’s early-stage training acceleration in the Epsilon–Greedy algorithm. To tackle navigation and obstacle avoidance separately, a dual-branch network structure is presented. We further optimize the reward function enabling intelligent agents to receive prompt feedback from the environment after performing each action. Experiments conducted in both virtual and real-world environments have demonstrated that the enhanced algorithm can accelerate model convergence, improve training stability and generate a smooth, shorter and collision-free path. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

16 pages, 2096 KiB

Open AccessArticle

A Novel Functional Electrical Stimulation-Induced Cycling Controller Using Reinforcement Learning to Optimize Online Muscle Activation Pattern

by Tiago Coelho-Magalhães, Christine Azevedo Coste and Henrique Resende-Martins

Sensors 2022, 22(23), 9126; https://doi.org/10.3390/s22239126 - 24 Nov 2022

Cited by 4 | Viewed by 2456

Abstract

This study introduces a novel controller based on a Reinforcement Learning (RL) algorithm for real-time adaptation of the stimulation pattern during FES-cycling. Core to our approach is the introduction of an RL agent that interacts with the cycling environment and learns through trial [...] Read more.

This study introduces a novel controller based on a Reinforcement Learning (RL) algorithm for real-time adaptation of the stimulation pattern during FES-cycling. Core to our approach is the introduction of an RL agent that interacts with the cycling environment and learns through trial and error how to modulate the electrical charge applied to the stimulated muscle groups according to a predefined policy and while tracking a reference cadence. Instead of a static stimulation pattern to be modified by a control law, we hypothesized that a non-stationary baseline set of parameters would better adjust the amount of injected electrical charge to the time-varying characteristics of the musculature. Overground FES-assisted cycling sessions were performed by a subject with spinal cord injury (SCI AIS-A, T8). For tracking a predefined pedaling cadence, two closed-loop control laws were simultaneously used to modulate the pulse intensity of the stimulation channels responsible for evoking the muscle contractions. First, a Proportional-Integral (PI) controller was used to control the current amplitude of the stimulation channels over an initial parameter setting with predefined pulse amplitude, width and fixed frequency parameters. In parallel, an RL algorithm with a decayed-epsilon-greedy strategy was implemented to randomly explore nine different variations of pulse amplitude and width parameters over the same stimulation setting, aiming to adjust the injected electrical charge according to a predefined policy. The performance of this global control strategy was evaluated in two different RL settings and explored in two different cycling scenarios. The participant was able to pedal overground for distances over 3.5 km, and the results evidenced the RL agent learned to modify the stimulation pattern according to the predefined policy and was simultaneously able to track a predefined pedaling cadence. Despite the simplicity of our approach and the existence of more sophisticated RL algorithms, our method can be used to reduce the time needed to define stimulation patterns. Our results suggest interesting research possibilities to be explored in the future to improve cycling performance since more efficient stimulation cost dynamics can be explored and implemented for the agent to learn. Full article

(This article belongs to the Section Wearables)

► Show Figures

Figure 1

28 pages, 5544 KiB

Open AccessArticle

Reinforcement Learning Made Affordable for Hardware Verification Engineers

by Alexandru Dinu and Petre Lucian Ogrutan

Micromachines 2022, 13(11), 1887; https://doi.org/10.3390/mi13111887 - 1 Nov 2022

Cited by 4 | Viewed by 2655

Abstract

Constrained random stimulus generation is no longer sufficient to fully simulate the functionality of a digital design. The increasing complexity of today’s hardware devices must be supported by powerful development and simulation environments, powerful computational mechanisms, and appropriate software to exploit them. Reinforcement [...] Read more.

Constrained random stimulus generation is no longer sufficient to fully simulate the functionality of a digital design. The increasing complexity of today’s hardware devices must be supported by powerful development and simulation environments, powerful computational mechanisms, and appropriate software to exploit them. Reinforcement learning, a powerful technique belonging to the field of artificial intelligence, provides the means to efficiently exploit computational resources to find even the least obvious correlations between configuration parameters, stimuli applied to digital design inputs, and their functional states. This paper, in which a novel software system is used to simplify the analysis of simulation outputs and the generation of input stimuli through reinforcement learning methods, provides important details about the setup of the proposed method to automate the verification process. By understanding how to properly configure a reinforcement algorithm to fit the specifics of a digital design, verification engineers can more quickly adopt this automated and efficient stimulus generation method (compared with classical verification) to bring the digital design to a desired functional state. The results obtained are most promising, with even 52 times fewer steps needed to reach a target state using reinforcement learning than when constrained random stimulus generation was used. Full article

(This article belongs to the Special Issue Hardware-Friendly Machine Learning and Its Applications)

► Show Figures

Figure 1

18 pages, 2809 KiB

Open AccessArticle

Intelligent Scheduling Method for Bulk Cargo Terminal Loading Process Based on Deep Reinforcement Learning

by Changan Li, Sirui Wu, Zhan Li, Yuxiao Zhang, Lijie Zhang and Luis Gomes

Electronics 2022, 11(9), 1390; https://doi.org/10.3390/electronics11091390 - 27 Apr 2022

Cited by 10 | Viewed by 3337

Abstract

Sea freight is one of the most important ways for the transportation and distribution of coal and other bulk cargo. This paper proposes a method for optimizing the scheduling efficiency of the bulk cargo loading process based on deep reinforcement learning. The process [...] Read more.

Sea freight is one of the most important ways for the transportation and distribution of coal and other bulk cargo. This paper proposes a method for optimizing the scheduling efficiency of the bulk cargo loading process based on deep reinforcement learning. The process includes a large number of states and possible choices that need to be taken into account, which are currently performed by skillful scheduling engineers on site. In terms of modeling, we extracted important information based on actual working data of the terminal to form the state space of the model. The yard information and the demand information of the ship are also considered. The scheduling output of each convey path from the yard to the cabin is the action of the agent. To avoid conflicts of occupying one machine at same time, certain restrictions are placed on whether the action can be executed. Based on Double DQN, an improved deep reinforcement learning method is proposed with a fully connected network structure and selected action sets according to the value of the network and the occupancy status of environment. To make the network converge more quickly, an improved new epsilon-greedy exploration strategy is also proposed, which uses different exploration rates for completely random selection and feasible random selection of actions. After training, an improved scheduling result is obtained when the tasks arrive randomly and the yard state is random. An important contribution of this paper is to integrate the useful features of the working time of the bulk cargo terminal into a state set, divide the scheduling process into discrete actions, and then reduce the scheduling problem into simple inputs and outputs. Another major contribution of this article is the design of a reinforcement learning algorithm for the bulk cargo terminal scheduling problem, and the training efficiency of the proposed algorithm is improved, which provides a practical example for solving bulk cargo terminal scheduling problems using reinforcement learning. Full article

(This article belongs to the Special Issue High Performance Control and Industrial Applications)

► Show Figures

Figure 1

16 pages, 904 KiB

Open AccessArticle

Adaptive Image Thresholding of Yellow Peppers for a Harvesting Robot

by Ahmad Ostovar, Ola Ringdahl and Thomas Hellström

Robotics 2018, 7(1), 11; https://doi.org/10.3390/robotics7010011 - 5 Feb 2018

Cited by 27 | Viewed by 7910

Abstract

The presented work is part of the H2020 project SWEEPER with the overall goal to develop a sweet pepper harvesting robot for use in greenhouses. As part of the solution, visual servoing is used to direct the manipulator towards the fruit. This requires [...] Read more.

The presented work is part of the H2020 project SWEEPER with the overall goal to develop a sweet pepper harvesting robot for use in greenhouses. As part of the solution, visual servoing is used to direct the manipulator towards the fruit. This requires accurate and stable fruit detection based on video images. To segment an image into background and foreground, thresholding techniques are commonly used. The varying illumination conditions in the unstructured greenhouse environment often cause shadows and overexposure. Furthermore, the color of the fruits to be harvested varies over the season. All this makes it sub-optimal to use fixed pre-selected thresholds. In this paper we suggest an adaptive image-dependent thresholding method. A variant of reinforcement learning (RL) is used with a reward function that computes the similarity between the segmented image and the labeled image to give feedback for action selection. The RL-based approach requires less computational resources than exhaustive search, which is used as a benchmark, and results in higher performance compared to a Lipschitzian based optimization approach. The proposed method also requires fewer labeled images compared to other methods. Several exploration-exploitation strategies are compared, and the results indicate that the Decaying Epsilon-Greedy algorithm gives highest performance for this task. The highest performance with the Epsilon-Greedy algorithm (

$ϵ$ = 0.7) reached 87% of the performance achieved by exhaustive search, with 50% fewer iterations than the benchmark. The performance increased to 91.5% using Decaying Epsilon-Greedy algorithm, with 73% less number of iterations than the benchmark. Full article

(This article belongs to the Special Issue Agriculture Robotics)

► Show Figures

Figure 1

17 pages, 640 KiB

Open AccessArticle

Ensemble of Filter-Based Rankers to Guide an Epsilon-Greedy Swarm Optimizer for High-Dimensional Feature Subset Selection

by Mohammad Bagher Dowlatshahi, Vali Derhami and Hossein Nezamabadi-pour

Information 2017, 8(4), 152; https://doi.org/10.3390/info8040152 - 22 Nov 2017

Cited by 36 | Viewed by 5542

Abstract

The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most [...] Read more.

The main purpose of feature subset selection is to remove irrelevant and redundant features from data, so that learning algorithms can be trained by a subset of relevant features. So far, many algorithms have been developed for the feature subset selection, and most of these algorithms suffer from two major problems in solving high-dimensional datasets: First, some of these algorithms search in a high-dimensional feature space without any domain knowledge about the feature importance. Second, most of these algorithms are originally designed for continuous optimization problems, but feature selection is a binary optimization problem. To overcome the mentioned weaknesses, we propose a novel hybrid filter-wrapper algorithm, called Ensemble of Filter-based Rankers to guide an Epsilon-greedy Swarm Optimizer (EFR-ESO), for solving high-dimensional feature subset selection. The Epsilon-greedy Swarm Optimizer (ESO) is a novel binary swarm intelligence algorithm introduced in this paper as a novel wrapper. In the proposed EFR-ESO, we extract the knowledge about the feature importance by the ensemble of filter-based rankers and then use this knowledge to weight the feature probabilities in the ESO. Experiments on 14 high-dimensional datasets indicate that the proposed algorithm has excellent performance in terms of both the error rate of the classification and minimizing the number of features. Full article

(This article belongs to the Special Issue Feature Selection for High-Dimensional Data)

► Show Figures

Figure 1

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (9)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI