Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = cart-pole

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 1163 KB  
Article
Decoupled Reinforcement Hybrid PPO–Sliding Control for Underactuated Systems: Application to Cart–Pole and Acrobot
by Yi-Jen Mon
Machines 2025, 13(7), 601; https://doi.org/10.3390/machines13070601 - 11 Jul 2025
Viewed by 402
Abstract
Underactuated systems, such as the Cart–Pole and Acrobot, pose significant control challenges due to their inherent nonlinearity and limited actuation. Traditional control methods often struggle to achieve stable and optimal performance in these complex scenarios. This paper presents a novel stable reinforcement learning [...] Read more.
Underactuated systems, such as the Cart–Pole and Acrobot, pose significant control challenges due to their inherent nonlinearity and limited actuation. Traditional control methods often struggle to achieve stable and optimal performance in these complex scenarios. This paper presents a novel stable reinforcement learning (RL) approach for underactuated systems, integrating advanced exploration–exploitation mechanisms and a refined policy optimization framework to address instability issues in RL-based control. The proposed method is validated through extensive experiments on two benchmark underactuated systems: the Cart–Pole and Acrobot. In the Cart–Pole task, the method achieves long-term balance with high stability, outperforming traditional RL algorithms such as the Proximal Policy Optimization (PPO) in average episode length and robustness to environmental disturbances. For the Acrobot, the approach enables reliable swing-up and near-vertical stabilization but cannot achieve sustained balance control beyond short time intervals due to residual dynamics and control limitations. A key contribution is the development of a hybrid PPO–sliding mode control strategy that enhances learning efficiency and stabilities for underactuated systems. Full article
Show Figures

Figure 1

11 pages, 1425 KB  
Article
Invariant-Based Inverse Engineering for Balanced Displacement of a Cartpole System
by Ion Lizuain, Ander Tobalina and Alvaro Rodriguez-Prieto
Mathematics 2025, 13(8), 1220; https://doi.org/10.3390/math13081220 - 8 Apr 2025
Viewed by 436
Abstract
Adiabaticity is a key concept in physics, but its applications in mechanical and control engineering remain underexplored. Adiabatic invariants ensure robust dynamics under slow changes, but they impose impractical time limitations. Shortcuts to Adiabaticity (STA) overcome these limitations by enabling fast operations with [...] Read more.
Adiabaticity is a key concept in physics, but its applications in mechanical and control engineering remain underexplored. Adiabatic invariants ensure robust dynamics under slow changes, but they impose impractical time limitations. Shortcuts to Adiabaticity (STA) overcome these limitations by enabling fast operations with minimal final excitations. In this work, we set a STA strategy based on dynamical invariants and inverse engineering to design the trajectory of a cartpole, a system characterized by its instability and repulsive potential. The trajectories found guarantee a balanced transport of the cartpole within the small oscillations regime. The results are compared to numerical simulations with the exact non-linear model to set the working domain of the designed protocol. Full article
Show Figures

Figure 1

32 pages, 1250 KB  
Article
Exploration-Driven Genetic Algorithms for Hyperparameter Optimisation in Deep Reinforcement Learning
by Bartłomiej Brzęk, Barbara Probierz and Jan Kozak
Appl. Sci. 2025, 15(4), 2067; https://doi.org/10.3390/app15042067 - 16 Feb 2025
Viewed by 1814
Abstract
This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the configuration space. [...] Read more.
This paper investigates the application of genetic algorithms (GAs) for hyperparameter optimisation in deep reinforcement learning (RL), focusing on the Deep Q-Learning (DQN) algorithm. This study aims to identify approaches that enhance RL model performance through the effective exploration of the configuration space. By comparing different GA methods for selection, crossover, and mutation, this study focuses on deep RL models. The results indicate that GA techniques emphasising the exploration of the configuration space yield significant improvements in optimisation efficiency, reducing training time and enhancing convergence. The most effective GA improved the fitness function value from 68.26 (initial best chromosome) to 979.16 after 200 iterations, demonstrating the efficacy of the proposed approach. Furthermore, variations in specific hyperparameters, such as learning rate, gamma, and update frequency, were shown to substantially affect the DQN model’s learning ability. These findings suggest that exploration-driven GA strategies outperform GA approaches with limited exploration, underscoring the critical role of selection and crossover methods in enhancing DQN model efficiency and performance. Moreover, a mini case study on the CartPole environment revealed that even a 5% sensor dropout impaired the performance of a GA-optimised RL agent, while a 20% dropout almost entirely halted improvements. Full article
(This article belongs to the Special Issue Recent Advances in Automated Machine Learning: 2nd Edition)
Show Figures

Figure 1

27 pages, 1396 KB  
Article
The Cart-Pole Application as a Benchmark for Neuromorphic Computing
by James S. Plank, Charles P. Rizzo, Chris A. White and Catherine D. Schuman
J. Low Power Electron. Appl. 2025, 15(1), 5; https://doi.org/10.3390/jlpea15010005 - 26 Jan 2025
Cited by 1 | Viewed by 1584
Abstract
The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic [...] Read more.
The cart-pole application is a well-known control application that is often used to illustrate reinforcement learning algorithms with conventional neural networks. An implementation of the application from OpenAI Gym is ubiquitous and popular. Spiking neural networks are the basis of brain-based, or neuromorphic computing. They are attractive, especially as agents for control applications, because of their very low size, weight and power requirements. We are motivated to help researchers in neuromorphic computing to be able to compare their work with common benchmarks, and in this paper we explore using the cart-pole application as a benchmark for spiking neural networks. We propose four parameter settings that scale the application in difficulty, in particular beyond the default parameter settings which do not pose a difficult test for AI agents. We propose achievement levels for AI agents that are trained with these settings. Next, we perform an experiment that employs the benchmark and its difficulty levels to evaluate the effectiveness of eight neuroprocessor settings on success with the application. Finally, we perform a detailed examination of eight example networks from this experiment, that achieve our goals on the difficulty levels, and comment on features that enable them to be successful. Our goal is to help researchers in neuromorphic computing to utilize the cart-pole application as an effective benchmark. Full article
Show Figures

Figure 1

17 pages, 2872 KB  
Article
Discrete Space Deep Reinforcement Learning Algorithm Based on Support Vector Machine Recursive Feature Elimination
by Chayoung Kim
Symmetry 2024, 16(8), 940; https://doi.org/10.3390/sym16080940 - 23 Jul 2024
Cited by 1 | Viewed by 1511
Abstract
Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image [...] Read more.
Algorithms for training agents with experience replay have advanced in several domains, primarily because prioritized experience replay (PER) developed from the double deep Q-network (DDQN) in deep reinforcement learning (DRL) has become a standard. PER-based algorithms have achieved significant success in the image and video domains. However, the exceptional results observed in images and videos are not as effective in many domains with simple action spaces and relatively small states, particularly in discrete action spaces with sparse rewards. Moreover, most advanced techniques may improve sampling efficiency using deep learning algorithms rather than reinforcement learning. However, there is growing evidence that deep learning algorithms cannot generalize during training. Therefore, this study proposes an algorithm suitable for discrete action space environments that uses the sample efficiency of PER based on DDQN but incorporates support vector machine recursive feature elimination (SVM-RFE) without enhancing the sampling efficiency through deep learning algorithms. The proposed algorithm exhibited considerable performance improvements in classical OpenAI Gym environments that did not use images or videos as inputs. In particular, simple discrete space environments with reflection symmetry, such as Cart–Pole, exhibited a faster and more stable learning process. These results suggest that the application of SVM-RFE, which leverages the orthogonality of support vector machines (SVMs) across learning patterns, can be appropriate when the data in the reinforcement learning environment demonstrate symmetry. Full article
(This article belongs to the Section Mathematics)
Show Figures

Figure 1

17 pages, 766 KB  
Article
Robust and Exponential Stabilization of a Cart–Pendulum System via Geometric PID Control
by Zhifei Zhang, Miaoxu Fang, Minrui Fei and Jinrong Li
Symmetry 2024, 16(1), 94; https://doi.org/10.3390/sym16010094 - 11 Jan 2024
Cited by 1 | Viewed by 2072
Abstract
This paper addresses the robust stabilization problem of a cart–pole system. The controlled dynamics of this interconnected system are deduced by following the analytic framework of Lagrangian mechanics, and the residual terms are formulated as a bias depending on the angle and angular [...] Read more.
This paper addresses the robust stabilization problem of a cart–pole system. The controlled dynamics of this interconnected system are deduced by following the analytic framework of Lagrangian mechanics, and the residual terms are formulated as a bias depending on the angle and angular velocity. A geometric definition of Proportional–Integral–Derivative (PID) control algorithm is proposed, and a Lyapunov function is explicitly constructed through two stages of variable change. Local exponential stability of the stable equilibrium is proved, and a criterion for parameter tuning is provided by ensuring an exponential decrease in the Lyapunov function. Enlarging the control parameters to infinity allows for the extension of attraction region almost to the half circle. The effectiveness of geometric PID controller and the local exponential stability of the resulting close system are verified by simulating a numerical example. Full article
Show Figures

Figure 1

9 pages, 468 KB  
Article
Optimal Shortcuts to Adiabatic Control by Lagrange Mechanics
by Lanlan Ma and Qian Kong
Entropy 2023, 25(5), 719; https://doi.org/10.3390/e25050719 - 26 Apr 2023
Cited by 2 | Viewed by 1748
Abstract
We combined an inverse engineering technique based on Lagrange mechanics and optimal control theory to design an optimal trajectory that can transport a cartpole in a fast and stable way. For classical control, we used the relative displacement between the ball and the [...] Read more.
We combined an inverse engineering technique based on Lagrange mechanics and optimal control theory to design an optimal trajectory that can transport a cartpole in a fast and stable way. For classical control, we used the relative displacement between the ball and the trolley as the controller to study the anharmonic effect of the cartpole. Under this constraint, we used the time minimization principle in optimal control theory to find the optimal trajectory, and the solution of time minimization is the bang-bang form, which ensures that the pendulum is in a vertical upward position at the initial and the final moments and oscillates in a small angle range. Full article
(This article belongs to the Special Issue Quantum Control and Quantum Computing)
Show Figures

Figure 1

17 pages, 4023 KB  
Article
Signal Novelty Detection as an Intrinsic Reward for Robotics
by Martin Kubovčík, Iveta Dirgová Luptáková and Jiří Pospíchal
Sensors 2023, 23(8), 3985; https://doi.org/10.3390/s23083985 - 14 Apr 2023
Cited by 3 | Viewed by 2526
Abstract
In advanced robot control, reinforcement learning is a common technique used to transform sensor data into signals for actuators, based on feedback from the robot’s environment. However, the feedback or reward is typically sparse, as it is provided mainly after the task’s completion [...] Read more.
In advanced robot control, reinforcement learning is a common technique used to transform sensor data into signals for actuators, based on feedback from the robot’s environment. However, the feedback or reward is typically sparse, as it is provided mainly after the task’s completion or failure, leading to slow convergence. Additional intrinsic rewards based on the state visitation frequency can provide more feedback. In this study, an Autoencoder deep learning neural network was utilized as novelty detection for intrinsic rewards to guide the search process through a state space. The neural network processed signals from various types of sensors simultaneously. It was tested on simulated robotic agents in a benchmark set of classic control OpenAI Gym test environments (including Mountain Car, Acrobot, CartPole, and LunarLander), achieving more efficient and accurate robot control in three of the four tasks (with only slight degradation in the Lunar Lander task) when purely intrinsic rewards were used compared to standard extrinsic rewards. By incorporating autoencoder-based intrinsic rewards, robots could potentially become more dependable in autonomous operations like space or underwater exploration or during natural disaster response. This is because the system could better adapt to changing environments or unexpected situations. Full article
(This article belongs to the Special Issue Intelligent Sensing System and Robotics)
Show Figures

Graphical abstract

15 pages, 1696 KB  
Article
Towards a Broad-Persistent Advising Approach for Deep Interactive Reinforcement Learning in Robotic Environments
by Hung Son Nguyen, Francisco Cruz and Richard Dazeley
Sensors 2023, 23(5), 2681; https://doi.org/10.3390/s23052681 - 1 Mar 2023
Viewed by 2740
Abstract
Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to [...] Read more.
Deep Reinforcement Learning (DeepRL) methods have been widely used in robotics to learn about the environment and acquire behaviours autonomously. Deep Interactive Reinforcement 2 Learning (DeepIRL) includes interactive feedback from an external trainer or expert giving advice to help learners choose actions to speed up the learning process. However, current research has been limited to interactions that offer actionable advice to only the current state of the agent. Additionally, the information is discarded by the agent after a single use, which causes a duplicate process at the same state for a revisit. In this paper, we present Broad-Persistent Advising (BPA), an approach that retains and reuses the processed information. It not only helps trainers give more general advice relevant to similar states instead of only the current state, but also allows the agent to speed up the learning process. We tested the proposed approach in two continuous robotic scenarios, namely a cart pole balancing task and a simulated robot navigation task. The results demonstrated that the agent’s learning speed increased, as evidenced by the rising reward points of up to 37%, while maintaining the number of interactions required for the trainer, in comparison to the DeepIRL approach. Full article
(This article belongs to the Special Issue Advances in Intelligent Robotics Systems Based Machine Learning)
Show Figures

Figure 1

17 pages, 2897 KB  
Article
Spiking Neural-Networks-Based Data-Driven Control
by Yuxiang Liu and Wei Pan
Electronics 2023, 12(2), 310; https://doi.org/10.3390/electronics12020310 - 7 Jan 2023
Cited by 8 | Viewed by 4481
Abstract
Machine learning can be effectively applied in control loops to make optimal control decisions robustly. There is increasing interest in using spiking neural networks (SNNs) as the apparatus for machine learning in control engineering because SNNs can potentially offer high energy efficiency, and [...] Read more.
Machine learning can be effectively applied in control loops to make optimal control decisions robustly. There is increasing interest in using spiking neural networks (SNNs) as the apparatus for machine learning in control engineering because SNNs can potentially offer high energy efficiency, and new SNN-enabling neuromorphic hardware is being rapidly developed. A defining characteristic of control problems is that environmental reactions and delayed rewards must be considered. Although reinforcement learning (RL) provides the fundamental mechanisms to address such problems, implementing these mechanisms in SNN learning has been underexplored. Previously, spike-timing-dependent plasticity learning schemes (STDP) modulated by factors of temporal difference (TD-STDP) or reward (R-STDP) have been proposed for RL with SNN. Here, we designed and implemented an SNN controller to explore and compare these two schemes by considering cart-pole balancing as a representative example. Although the TD-based learning rules are very general, the resulting model exhibits rather slow convergence, producing noisy and imperfect results even after prolonged training. We show that by integrating the understanding of the dynamics of the environment into the reward function of R-STDP, a robust SNN-based controller can be learned much more efficiently than TD-STDP. Full article
(This article belongs to the Special Issue Design, Dynamics and Control of Robots)
Show Figures

Figure 1

16 pages, 2493 KB  
Article
Optimal Agent Search Using Surrogate-Assisted Genetic Algorithms
by Seung-Soo Shin and Yong-Hyuk Kim
Mathematics 2023, 11(1), 230; https://doi.org/10.3390/math11010230 - 2 Jan 2023
Cited by 2 | Viewed by 2730
Abstract
An intelligent agent is a program that can make decisions or perform a service based on its environment, user input, and experiences. Due to the complexity of its state and action spaces, agents are approximated by deep neural networks (DNNs), and it can [...] Read more.
An intelligent agent is a program that can make decisions or perform a service based on its environment, user input, and experiences. Due to the complexity of its state and action spaces, agents are approximated by deep neural networks (DNNs), and it can be optimized using methods such as deep reinforcement learning and evolution strategies. However, these methods include simulation-based evaluations in the optimization process, and they are inefficient if the simulation cost is high. In this study, we propose surrogate-assisted genetic algorithms (SGAs), whose surrogate models are used in the fitness evaluation of genetic algorithms, and the surrogates also predict cumulative rewards for an agent’s DNN parameters. To improve the SGAs, we applied stepwise improvements that included multiple surrogates, data standardization, and sampling with dimensional reduction. We conducted experiments using the proposed SGAs in benchmark environments such as cart-pole balancing and lunar lander, and successfully found optimal solutions and significantly reduced computing time. The computing time was reduced by 38% and 95%, in the cart-pole balancing and lunar lander problems, respectively. For the lunar lander problem, an agent with approximately 4% better quality than that found by a gradient-based method was even found. Full article
(This article belongs to the Special Issue Swarm and Evolutionary Computation—Bridging Theory and Practice)
Show Figures

Figure 1

12 pages, 3965 KB  
Article
Integrating the Generative Adversarial Network for Decision Making in Reinforcement Learning for Industrial Robot Agents
by Neelabh Paul, Vaibhav Tasgaonkar, Rahee Walambe and Ketan Kotecha
Robotics 2022, 11(6), 150; https://doi.org/10.3390/robotics11060150 - 9 Dec 2022
Cited by 2 | Viewed by 3442
Abstract
Many robotics systems carrying certain payloads are employed in manufacturing industries for pick and place tasks. The system experiences inefficiency if more or less weight is introduced. If a different payload is introduced (either due to a change in the load or a [...] Read more.
Many robotics systems carrying certain payloads are employed in manufacturing industries for pick and place tasks. The system experiences inefficiency if more or less weight is introduced. If a different payload is introduced (either due to a change in the load or a change in the parameters of the robot system), the robot must be re-trained with the new weight/parameters and the new network must be trained. Parameters such as the robot weight, length of limbs, or new payload may vary for an agent depending on the circumstance. Parameter changes pose a problem to the agent in achieving the same goal it is expected to achieve with the original parameters. Hence, it becomes mandatory to re-train the agent with the new parameters in order for it to achieve its goal. This research proposes a novel framework for the adaption of varying conditions on a robot agent in a given simulated environment without any retraining. Utilizing the properties of Generative Adversarial Network (GAN), the agent is able to train only once with reinforcement learning and by tweaking the noise vector of the generator in the GAN network, the agent can adapt to new conditions accordingly and demonstrate similar performance as if it were trained with the new physical attributes using reinforcement learning. A simple CartPole environment is considered for the experimentation, and it is shown that with the propose approached the agent remains stable for more iterations. The approach can be extended to the real world in the future. Full article
(This article belongs to the Special Issue Industrial Robotics in Industry 4.0)
Show Figures

Figure 1

32 pages, 1060 KB  
Article
BEERL: Both Ends Explanations for Reinforcement Learning
by Ahmad Terra, Rafia Inam and Elena Fersman
Appl. Sci. 2022, 12(21), 10947; https://doi.org/10.3390/app122110947 - 28 Oct 2022
Cited by 6 | Viewed by 3398
Abstract
Deep Reinforcement Learning (RL) is a black-box method and is hard to understand because the agent employs a neural network (NN). To explain the behavior and decisions made by the agent, different eXplainable RL (XRL) methods are developed; for example, feature importance methods [...] Read more.
Deep Reinforcement Learning (RL) is a black-box method and is hard to understand because the agent employs a neural network (NN). To explain the behavior and decisions made by the agent, different eXplainable RL (XRL) methods are developed; for example, feature importance methods are applied to analyze the contribution of the input side of the model, and reward decomposition methods are applied to explain the components of the output end of the RL model. In this study, we present a novel method to connect explanations from both input and output ends of a black-box model, which results in fine-grained explanations. Our method exposes the reward prioritization to the user, which in turn generates two different levels of explanation and allows RL agent reconfigurations when unwanted behaviors are observed. The method further summarizes the detailed explanations into a focus value that takes into account all reward components and quantifies the fulfillment of the explanation of desired properties. We evaluated our method by applying it to a remote electrical telecom-antenna-tilt use case and two openAI gym environments: lunar lander and cartpole. The results demonstrated fine-grained explanations by detailing input features’ contributions to certain rewards and revealed biases of the reward components, which are then addressed by adjusting the reward’s weights. Full article
(This article belongs to the Special Issue Explainable Artificial Intelligence (XAI))
Show Figures

Figure 1

17 pages, 812 KB  
Article
Weibull-Open-World (WOW) Multi-Type Novelty Detection in CartPole3D
by Terrance E. Boult, Nicolas M. Windesheim, Steven Zhou, Christopher Pereyda and Lawrence B. Holder
Algorithms 2022, 15(10), 381; https://doi.org/10.3390/a15100381 - 18 Oct 2022
Cited by 6 | Viewed by 2517
Abstract
Algorithms for automated novelty detection and management are of growing interest but must address the inherent uncertainty from variations in non-novel environments while detecting the changes from the novelty. This paper expands on a recent unified framework to develop an operational theory for [...] Read more.
Algorithms for automated novelty detection and management are of growing interest but must address the inherent uncertainty from variations in non-novel environments while detecting the changes from the novelty. This paper expands on a recent unified framework to develop an operational theory for novelty that includes multiple (sub)types of novelty. As an example, this paper explores the problem of multi-type novelty detection in a 3D version of CartPole, wherein the cart Weibull-Open-World control-agent (WOW-agent) is confronted by different sub-types/levels of novelty from multiple independent agents moving in the environment. The WOW-agent must balance the pole and detect and characterize the novelties while adapting to maintain that balance. The approach develops static, dynamic, and prediction-error measures of dissimilarity to address different signals/sources of novelty. The WOW-agent uses the Extreme Value Theory, applied per dimension of the dissimilarity measures, to detect outliers and combines different dimensions to characterize the novelty. In blind/sequestered testing, the system detects nearly 100% of the non-nuisance novelties, detects many nuisance novelties, and shows it is better than novelty detection using a Gaussian-based approach. We also show the WOW-agent’s lookahead collision avoiding control is significantly better than a baseline Deep-Q-learning Networktrained controller. Full article
(This article belongs to the Collection Feature Paper in Algorithms and Complexity Theory)
Show Figures

Figure 1

17 pages, 5138 KB  
Article
Deep Q-Learning Network with Bayesian-Based Supervised Expert Learning
by Chayoung Kim
Symmetry 2022, 14(10), 2134; https://doi.org/10.3390/sym14102134 - 13 Oct 2022
Cited by 4 | Viewed by 2633
Abstract
Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able [...] Read more.
Deep reinforcement learning (DRL) algorithms interact with the environment and have achieved considerable success in several decision-making problems. However, DRL requires a significant number of data before it can achieve adequate performance. Moreover, it might have limited applicability when DRL agents are able to learn in a real-world environment. Therefore, some algorithms combine DRL agents with supervised learning and leverage previous additional knowledge. Some have integrated a deep Q-learning network with a behavioral cloning model that can exploit supervised learning as prior learning. The algorithm proposed in this study is also based on these methods and is intended to update the loss function of the existing technique into a Bayesian approach. The supervised loss function used in existing algorithms and the loss function based on the Bayesian method proposed in this study differ in terms of the utilization of prior knowledge. Using prior knowledge and not using prior knowledge, such as the cross entropy being symmetric. As a result of the various OpenAI Gym environments, such as Cart-Pole and MountainCar, the learning convergence performance was improved. In particular, the proposed method can be applied to achieve fairly stable learning during the early stage when learning in a sparse environment is uncertain. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop