MDPI - Publisher of Open Access Journals

Search Results (2)

Search Parameters:
Keywords = Deep Dyna-Q

Order results

Result details

Results per page

Show export options Show export options

Select all

Export citation of selected articles as:

16 pages, 8397 KB

Open AccessArticle

Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)

by Almira Budiyanto, Keisuke Azetsu and Nobutomo Matsunaga

Automation 2024, 5(4), 597-612; https://doi.org/10.3390/automation5040034 - 27 Nov 2024

Cited by 1 | Viewed by 1765

Abstract

A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots. Full article

► Show Figures

Figure 1

22 pages, 27075 KB

Open AccessArticle

Deep Dyna-Q for Rapid Learning and Improved Formation Achievement in Cooperative Transportation

by Almira Budiyanto and Nobutomo Matsunaga

Automation 2023, 4(3), 210-231; https://doi.org/10.3390/automation4030013 - 10 Jul 2023

Cited by 8 | Viewed by 3441

Abstract

Nowadays, academic research, disaster mitigation, industry, and transportation apply the cooperative multi-agent concept. A cooperative multi-agent system is a multi-agent system that works together to solve problems or maximise utility. The essential marks of formation control are how the multiple agents can reach the desired point while maintaining their position in the formation based on the dynamic conditions and environment. A cooperative multi-agent system closely relates to the formation change issue. It is necessary to change the arrangement of multiple agents according to the environmental conditions, such as when avoiding obstacles, applying different sizes and shapes of tracks, and moving different sizes and shapes of transport objects. Reinforcement learning is a good method to apply in a formation change environment. On the other hand, the complex formation control process requires a long learning time. This paper proposed using the Deep Dyna-Q algorithm to speed up the learning process while improving the formation achievement rate by tuning the parameters of the Deep Dyna-Q algorithm. Even though the Deep Dyna-Q algorithm has been used in many applications, it has not been applied in an actual experiment. The contribution of this paper is the application of the Deep Dyna-Q algorithm in formation control in both simulations and actual experiments. This study successfully implements the proposed method and investigates formation control in simulations and actual experiments. In the actual experiments, the Nexus robot with a robot operating system (ROS) was used. To confirm the communication between the PC and robots, camera processing, and motor controller, the velocities from the simulation were directly given to the robots. The simulations could give the same goal points as the actual experiments, so the simulation results approach the actual experimental results. The discount rate and learning rate values affected the formation change achievement rate, collision number among agents, and collisions between agents and transport objects. For learning rate comparison, DDQ (0.01) consistently outperformed DQN. DQN obtained the maximum −170 reward in about 130,000 episodes, while DDQ (0.01) could achieve this value in 58,000 episodes and achieved a maximum −160 reward. The application of an MEC (model error compensator) in the actual experiment successfully reduced the error movement of the robots so that the robots could produce the formation change appropriately. Full article

(This article belongs to the Topic Target Tracking, Guidance, and Navigation for Autonomous Systems)

► Show Figures

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI