P-DRL: A Framework for Multi-UAVs Dynamic Formation Control under Operational Uncertainty and Unknown Environment
Abstract
:1. Introduction
- A new theorem for UAV pairwise formation control is proposed based on an analysis of the conflict–collision relationship between multi-UAVs. Based on this, the task of multi-UAV synergetic formation control is broken down into multiple pairwise UAV synergetic formation control tasks using the dynamic pairing algorithm we designed, thus reducing the training difficulty for the Agent in the DRL model.
- A detailed deep reinforcement learning model of synergetic formation control for a UAV pair is proposed, including the reward function with collision-avoidance intensified, state transform, state–action space, etc.
- A general framework, P-DRL, is proposed to solve the problem of multiple UAV (10–20) dynamic formation control, which can be used for the task of simultaneous real-time dynamic multi-UAV formation control in complex environments with operational uncertainty and obstacles unknown.
2. Research Basis
2.1. Related Work
2.2. Motivation
- (1)
- Learning difficulty increases with the number of UAVs being controlled simultaneously.
- (2)
- Training difficulty in improving the non-collision success rate.
3. Problem Formation
3.1. Problem Definition
- (1)
- Objectives: A group of UAVs (≥10) needs to arrange or maintain a specific configuration synergetically, where drones have a random initial state, which may be stationary or in motion.
- (2)
- Decision variables: The only decision variable is the rotor speed of the UAV. For example, a hexacopter with six propeller rotors has six decision variables, because it can theoretically control the speed of every single rotor. Similarly, a quadcopter (quadrotor UAV) has four decision variables. Drones achieve most postures such as climb, descent, and roll based on the adjustment of their rotor speed.
- (3)
- Constraints: Firstly, the maneuver of UAVs must meet their performance constraints. Secondly, UAVs cannot collide with each other during their formation process, and they cannot collide with other obstacles in the airspace.
- (4)
- Assumptions: Some communication factors such as signal transmission interference/delay are not considered, but some external interference like wind and inner control errors made by UAV systems need to be considered. All of the obstacles’ positions in the airspace are generated randomly, and the positions are unknown before the formation task begins. We further express the assumption as:
- Operational uncertainty: The next state of the UAV will not be formed completely according to the current state and control action, but it follows a normal distribution according to a specific variance, which is Pr[|, ] ≠ 1.
- Unknown environment: Obstacles in the operating environment cannot be predicted before the UAV approaches it/before the process of formation control, and they become knowable when the UAV approaches it in the process of formation control.
3.2. Pairwise Control Theorem
- (1)
- Same objectives.
- (2)
- Same constraints.
3.3. Single UAV Control Model
3.4. DRL Model for UAV Pairwise Formation Control
3.4.1. Environment
- (a)
- State of the UAV pair.
- (b)
- Action of the UAV pair.
- (c)
- State transforming.
3.4.2. Agent
3.4.3. Reward
- (1)
- UAV safety reward.
- (2)
- Obstacle safety reward.
- (3)
- Formation reward.
3.4.4. Interaction
4. Approach
4.1. P-DRL Framework
- Single UAV control model: this model is used to define the performance of UAVs, such as the maximum rotor speed, the range of rolling angle, the body configuration used for collision detection, and the state transforming function used in the DRL model.
- DRL model for pairwise formation control: this is the model used for training the Agent in the synergistic control of two UAVs, including the state, action space setting, and reward shaping in the Environment, the decision policy, the architecture of deep neuron networks in the Agent, and the Environment–Agent intersection mode.
- Algorithm 1—dynamic pairing: converts the formation control problem of a UAV fleet into a synergetic formation control problem involving multiple UAV pairs. This reduces the difficulty of Agent training and allows the control scenario to be solved by the Agent.
- Algorithm 2—Agent training: trains the Agent based on the reward returned by the Environment–Agent intersection.
- Implement: for every timestamp, the dynamic pairing algorithm chooses a UAV pair composed of two UAVs, then, the Agent allocates an action for each of them, loops until all of the UAVs have been paired, then turns to the next timestamp.
4.2. Dynamic Pairing
Algorithm 1: Dynamic pairing algorithm for n UAVs. |
4.3. Agent Training
Algorithm 2: Agent (A3C structure) training algorithm for UAV pairwise control. |
4.4. Implement
5. Simulation and Results
5.1. Background
5.2. Simulations
5.2.1. Training the Agent
5.2.2. Formation Control
- (1)
- Scenario 1: Formation Shaping Control
- (2)
- Scenario 2: Formation Reconfiguration Control
5.3. Performance Analysis
5.3.1. Success Rate of Non-Collision Formation Control
- (1)
- Success Rate sensitivity analysis by changing control frequency.
- (2)
- Success rate with collision avoidance measure.
5.3.2. Average Formatting Speed of UAVs
5.3.3. Real-Time Performance
- (1)
- Only on software (without communication).
- (2)
- On software and hardware (with communication).
6. Discussion and Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wu, J.; Luo, C.; Luo, Y.; Li, K. Distributed UAV Swarm Formation and Collision Avoidance Strategies Over Fixed and Switching Topologies. IEEE Trans. Cybern. 2022, 52, 10969–10979. [Google Scholar] [CrossRef] [PubMed]
- Wu, Y.; Gou, J.; Ji, H.; Deng, J. Hierarchical Mission Replanning for Multiple UAV Formations Performing Tasks in Dynamic Situation. Comput. Commun. 2023, 200, 132–148. [Google Scholar] [CrossRef]
- Du, W.; Guo, T.; Chen, J.; Li, B.; Zhu, G.; Cao, X. Cooperative Pursuit of Unauthorized UAVs in Urban Airspace via Multi-agent Reinforcement Learning. Transp. Res. Part C Emerg. Technol. 2021, 128, 103122. [Google Scholar] [CrossRef]
- Meng, Q.; Qu, Q.; Chen, K.; Yi, T. Multi-UAV Path Planning Based on Cooperative Co-Evolutionary Algorithms with Adaptive Decision Variable Selection. Drones 2024, 8, 435. [Google Scholar] [CrossRef]
- Zhang, J.; Zhang, H.; Zhou, J.; Hua, M.; Zhong, G.; Liu, H. Adaptive Collision Avoidance for Multiple UAVs in Urban Environments. Drones 2023, 7, 2024050715. [Google Scholar] [CrossRef]
- Felix, B.; Stratis, K.; Madeline, C.; Roberto, P.; Mike, B. A Taxonomy of Validation Strategies to Ensure the Safe Operation of Highly Automated Vehicles. J. Intell. Transp. Syst. 2022, 26, 14–33. [Google Scholar] [CrossRef]
- Guanetti, J.; Kim, Y.; Borrelli, F. Control of Connected and Automated Vehicles: State of the Art and Future Challenges. Annu. Rev. Control 2018, 45, 18–40. [Google Scholar] [CrossRef]
- Pan, Z.; Zhang, C.; Xia, Y.; Xiong, H.; Shao, X. An Improved Artificial Potential Field Method for Path Planning and Formation Control of the Multi-UAV Systems. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 1129–1133. [Google Scholar] [CrossRef]
- Zhang, X.; Li, H.; Zhu, G.; Zhang, Y.; Wang, C.; Wang, Y.; Su, C.Y. Finite-Time Adaptive Quantized Control for Quadrotor Aerial Vehicle with Full States Constraints and Validation on QDrone Experimental Platform. Drones 2024, 8, 264. [Google Scholar] [CrossRef]
- Yu, Y.; Chen, J.; Zheng, Z.; Yuan, J. Distributed Finite-Time ESO-Based Consensus Control for Multiple Fixed-Wing UAVs Subjected to External Disturbances. Drones 2024, 8, 260. [Google Scholar] [CrossRef]
- Patiño, D.; Mayya, S.; Calderon, J.; Daniilidis, K.; Saldaña, D. Learning to Navigate in Turbulent Flows with Aerial Robot Swarms: A Cooperative Deep Reinforcement Learning Approach. IEEE Robot. Autom. Lett. 2023, 8, 4219–4226. [Google Scholar] [CrossRef]
- Qi, Z.; Ziyang, Z.; Huajun, G.; Hongbo, C.; Rong, L.; Jicheng, L. UAV Formation Control based on Dueling Double DQN. J. Beijing Univ. Aeronaut. Astronaut. 2023, 49, 2137–2146. [Google Scholar] [CrossRef]
- La, H.M.; Lim, R.; Sheng, W. Multirobot Cooperative Learning for Predator Avoidance. IEEE Trans. Control Syst. Technol. 2015, 23, 52–63. [Google Scholar] [CrossRef]
- Xiang, X.; Yan, C.; Wang, C.; Yin, D. Coordination Control Method for Fixed-wing UAV Formation Through Deep Reinforcement Learning. Acta Aeronaut. Astronaut. Sin. 2021, 42, 524009. [Google Scholar] [CrossRef]
- Lombaerts, T.; Looye, G.; Chu, Q.; Mulder, J. Design and Simulation of Fault Tolerant Flight Control Based on a Physical Approach. Aerosp. Sci. Technol. 2012, 23, 151–171. [Google Scholar] [CrossRef]
- Liao, F.; Teo, R.; Wang, J.L.; Dong, X.; Lin, F.; Peng, K. Distributed Formation and Reconfiguration Control of VTOL UAVs. IEEE Trans. Control Syst. Technol. 2017, 25, 270–277. [Google Scholar] [CrossRef]
- Gu, Z.; Song, B.; Fan, Y.; Chen, X. Design and Verification of UAV Formation Controller based on Leader-Follower Method. In Proceedings of the 2022 7th International Conference on Automation, Control and Robotics Engineering (CACRE), Virutal, 15–16 July 2022; pp. 38–44. [Google Scholar] [CrossRef]
- Liu, C.; Wu, X.; Mao, B. Formation Tracking of Second-Order Multi-Agent Systems with Multiple Leaders Based on Sampled Data. IEEE Trans. Circuits Syst. II Express Briefs 2021, 68, 331–335. [Google Scholar] [CrossRef]
- Bianchi, D.; Borri, A.; Cappuzzo, F.; Di Gennaro, S. Quadrotor Trajectory Control Based on Energy-Optimal Reference Generator. Drones 2024, 8, 29. [Google Scholar] [CrossRef]
- Liu, S.; Huang, F.; Yan, B.; Zhang, T.; Liu, R.; Liu, W. Optimal Design of Multimissile Formation Based on an Adaptive SA-PSO Algorithm. Aerospace 2022, 9, 21. [Google Scholar] [CrossRef]
- Kada, B.; Khalid, M.; Shaikh, M.S. Distributed cooperative control of autonomous multi-agent UAV systems using smooth control. J. Syst. Eng. Electron. 2020, 31, 1297–1307. [Google Scholar] [CrossRef]
- Kang, C.; Xu, J.; Bian, Y. Affine Formation Maneuver Control for Multi-Agent Based on Optimal Flight System. Appl. Sci. 2024, 14, 2292. [Google Scholar] [CrossRef]
- Brodecki, M.; Subbarao, K. Autonomous Formation Flight Control System Using In-Flight Sweet-Spot Estimation. J. Guid. Control Dyn. 2015, 38, 1083–1096. [Google Scholar] [CrossRef]
- Sun, G.; Zhou, R.; Xu, K.; Weng, Z.; Zhang, Y.; Dong, Z.; Wang, Y. Cooperative formation control of multiple aerial vehicles based on guidance route in a complex task environment. Chin. J. Aeronaut. 2020, 33, 701–720. [Google Scholar] [CrossRef]
- Zhang, Q.; Liu, H.H.T. Robust Nonlinear Close Formation Control of Multiple Fixed-Wing Aircraft. J. Guid. Control Dyn. 2021, 44, 572–586. [Google Scholar] [CrossRef]
- Dogan, A.; Venkataramanan, S. Nonlinear Control for Reconfiguration of Unmanned-Aerial-Vehicle Formation. J. Guid. Control Dyn. 2005, 28, 667–678. [Google Scholar] [CrossRef]
- Yu, Y.; Guo, J.; Ahn, C.K.; Xiang, Z. Neural Adaptive Distributed Formation Control of Nonlinear Multi-UAVs with Unmodeled Dynamics. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 9555–9561. [Google Scholar] [CrossRef]
- Lin, Z.; Yan, B.; Zhang, T.; Li, S.; Meng, Z.; Liu, S. Multi-Level Switching Control Scheme for Folding Wing VTOL UAV Based on Dynamic Allocation. Drones 2024, 8, 303. [Google Scholar] [CrossRef]
- Zhang, J.; Yan, J.; Zhang, P. Multi-UAV Formation Control Based on a Novel Back-Stepping Approach. IEEE Trans. Veh. Technol. 2020, 69, 2437–2448. [Google Scholar] [CrossRef]
- Hung, S.M.; Givigi, S.N. A Q-Learning Approach to Flocking with UAVs in a Stochastic Environment. IEEE Trans. Cybern. 2017, 47, 186–197. [Google Scholar] [CrossRef]
- Li, B.; Gan, Z.; Chen, D.; Sergey Aleksandrovich, D. UAV Maneuvering Target Tracking in Uncertain Environments Based on Deep Reinforcement Learning and Meta-Learning. Remote Sens. 2020, 12, 3789. [Google Scholar] [CrossRef]
- Li, R.; Zhang, L.; Han, L.; Wang, J. Multiple Vehicle Formation Control Based on Robust Adaptive Control Algorithm. IEEE Intell. Transp. Syst. Mag. 2017, 9, 41–51. [Google Scholar] [CrossRef]
- Xu, L.; Wang, T.; Cai, W.; Sun, C. UAV target following in complex occluded environments with adaptive multi-modal fusion. Appl. Intell. 2022, 53, 16998–17014. [Google Scholar] [CrossRef]
- Chen, H.; Duan, H. Multiple Unmanned Aerial Vehicle Autonomous Formation via Wolf Packs Mechanism. In Proceedings of the 2016 IEEE International Conference on Aircraft Utility Systems (AUS), Beijing, China, 10–12 October 2016; pp. 606–610. [Google Scholar] [CrossRef]
- Shi, G.; Hönig, W.; Shi, X.; Yue, Y.; Chung, S.J. Neural-Swarm2: Planning and Control of Heterogeneous Multirotor Swarms Using Learned Interactions. IEEE Trans. Robot. 2022, 38, 1063–1079. [Google Scholar] [CrossRef]
- Hu, H.; Wang, Q.l. Proximal Policy Optimization with an Integral Compensator for Quadrotor Control. Front. Inf. Technol. Electron. Eng. 2020, 21, 777–795. [Google Scholar] [CrossRef]
- Duan, H.; Luo, Q.; Shi, Y.; Ma, G. Hybrid Particle Swarm Optimization and Genetic Algorithm for Multi-UAV Formation Reconfiguration. IEEE Comput. Intell. Mag. 2013, 8, 16–27. [Google Scholar] [CrossRef]
- Wang, C.; Wang, J.; Shen, Y.; Zhang, X. Autonomous Navigation of UAVs in Large-Scale Complex Environments: A Deep Reinforcement Learning Approach. IEEE Trans. Veh. Technol. 2019, 68, 2124–2136. [Google Scholar] [CrossRef]
- Xu, G.; Jiang, W.; Wang, Z.; Wang, Y. Autonomous Obstacle Avoidance and Target Tracking of UAV Based on Deep Reinforcement Learning. J. Intell. Robot. Syst. 2022, 104, 60. [Google Scholar] [CrossRef]
- Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; A Bradford Book: Cambridge, MA, USA, 2018. [Google Scholar]
- DJI. Parameters of DJI phantom4 pro. Available online: https://www.dji.com/cn/phantom-4-pro-v2/specs (accessed on 4 July 2024).
- Hasselt, H.v.; Guez, A.; Silver, D. Deep Reinforcement Learning with Double Q-Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence, AAAI’16, Phoenix, AZ, USA, 12–17 February 2016; pp. 2094–2100. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.M.O.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
- Szczepanski, R. Safe Artificial Potential Field - Novel Local Path Planning Algorithm Maintaining Safe Distance from Obstacles. IEEE Robot. Autom. Lett. 2023, 8, 4823–4830. [Google Scholar] [CrossRef]
- Ju, C.; Luo, Q.; Yan, X. Path Planning Using Artificial Potential Field Method And A-star Fusion Algorithm. In Proceedings of the 2020 Global Reliability and Prognostics and Health Management (PHM-Shanghai), Shanghai, China, 16–18 October 2020; pp. 1–7. [Google Scholar] [CrossRef]
Parameter | Meaning | Value (Unit) |
---|---|---|
m | Mass of UAV | 1.375 (kg) |
Minimum pitch angle | /6 | |
Maximum pitch angle | +/6 | |
Minimum roll angle | /6 | |
Maximum roll angle | +/6 | |
Maximum rotating speed of rotors | 8100 (PRM) | |
Lift coefficient | 0.484 | |
b | Composite lift parameter | (N/) |
l | Length of the Rotor stick | 0.350 (m) |
x-axis moment of inertia | 0.152 (N· ) | |
y-axis moment of inertia | 0.152 (N· ) | |
z-axis moment of inertia | 0.0842 (N· ) | |
g | Acceleration of gravity | 9.807 (m/) |
Density of the atmosphere | 1.29 (kg/) | |
r | Length of the propeller | 0.0850 (m) |
Air resistance coefficient | 0.0427 | |
Inverse torque coefficient | 0.021 | |
Expectations of the position interfere | 0 | |
Variance of the position interfere | 0.5 |
Parameter | Meaning | Value (Unit) |
---|---|---|
Hyper-parameter in the policy | 0.03 | |
Hyper-parameter in the UAV and obstacle safety reward | −400 | |
Hyper-parameter in the UAV and obstacle safety reward | −20 | |
Hyper-parameter in the UAV and obstacle safety reward | −4 | |
Prediction time for the future state based on the current state | 1 (s) | |
Prediction time for the future state based on the current state | 5 (s) | |
Prediction time for the future state based on the current state | 10 (s) | |
d | The safety distance for key collision point of the UAV | 0.5 (m) |
D | Hyper-parameter in the Formation reward | 10 |
Learning rate of the Actor | 0.01 | |
Learning rate of the Critic | 0.01 | |
Discount rate of the reward | 0.95 |
UAV ID | Original Destination | Shaping Destination | Flying Destination |
---|---|---|---|
UAV1 | [1,3,0] | [20.83,3.33,80] | [35,2.67,60] |
UAV2 | [3,3,0] | [21.67,3.33,80] | [34,1.33,60] |
UAV3 | [1,1,0] | [23.33,1.67,80] | [36,1.33,60] |
UAV4 | [3,1,0] | [25,0,80] | [38,0,60] |
UAV5 | [1,−3,0] | [20,−5,80] | [34,−4,60] |
UAV6 | [3,−3,0] | [23.33,−0.83,80] | [36,−4/3,60] |
UAV7 | [3,−1,0] | [21.67,−3.33,80] | [36,0,60] |
UAV8 | [1,−1,0] | [20.83,−3.33,80] | [35,-2.67,60] |
UAV9 | [−3,1,0] | [16.67,0.83,80] | [32,1.33,60] |
UAV10 | [−3,3,0] | [0.33,1.67,80] | [33,2.67,60] |
UAV11 | [−1,3,0] | [20,5,80] | [34,4,60] |
UAV12 | [−1,1,0] | [19.17,3.33,80] | [34,−1.33,60] |
UAV13 | [−3,−1,0] | [15,0,80] | [30,0,60] |
UAV14 | [−3,−3,0] | [16.67,-0.83,80] | [32.−1.33,60] |
UAV15 | [−1,−1,0] | [18.33,−1.67,80] | [32,0,60] |
UAV16 | [−1,−3,0] | [19.17,−3.33,80] | [33,−2.67,60] |
UAV17 | [−4,−3,0] | [15,−5,80] | [38,5.67,60] |
UAV18 | [−4,−1,0] | [25,−5,80] | [30,5.67,60] |
UAV19 | [−4,1,0] | [15,5,80] | [38,−5.67,60] |
UAV20 | [−4,3,0] | [25,5,80] | [30,−5,60] |
Obstacle1 | – | [6,0,0,50,0.8] | [28,2.5,60,85,0.9] |
Obstacle2 | – | [10,2,0,60,0.6] | [27,−2.5,60,85,0.8] |
Obstacle3 | – | [12,−2,0,60,0.8] | – |
Method | Non-Collision Susscessful Rate (%) | Average | Average Trajectory Output Time (s) | ||
---|---|---|---|---|---|
With Collision | Without Collision Avoidance | Speed (m/s) | Only Software | Software and Hardware | |
Avoidance Measure | Measure (Frequency: 5 Hz) | ||||
P-A3C | 91.7–96.2 | 95.32 | 9.24 | 7.104 | 7.171 |
P-DDQN | 90.0–96.2 | 94.51 | 9.11 | 6.992 | 7.070 |
P-AC | 90.1–96.3 | 94.49 | 10.11 | 7.104 | 7.170 |
APF | 100 (Theoretically) | – | 7.76 | 1227.6 | – |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, J.; Zhang, H.; Hua, M.; Wang, F.; Yi, J. P-DRL: A Framework for Multi-UAVs Dynamic Formation Control under Operational Uncertainty and Unknown Environment. Drones 2024, 8, 475. https://doi.org/10.3390/drones8090475
Zhou J, Zhang H, Hua M, Wang F, Yi J. P-DRL: A Framework for Multi-UAVs Dynamic Formation Control under Operational Uncertainty and Unknown Environment. Drones. 2024; 8(9):475. https://doi.org/10.3390/drones8090475
Chicago/Turabian StyleZhou, Jinlun, Honghai Zhang, Mingzhuang Hua, Fei Wang, and Jia Yi. 2024. "P-DRL: A Framework for Multi-UAVs Dynamic Formation Control under Operational Uncertainty and Unknown Environment" Drones 8, no. 9: 475. https://doi.org/10.3390/drones8090475
APA StyleZhou, J., Zhang, H., Hua, M., Wang, F., & Yi, J. (2024). P-DRL: A Framework for Multi-UAVs Dynamic Formation Control under Operational Uncertainty and Unknown Environment. Drones, 8(9), 475. https://doi.org/10.3390/drones8090475