Multiple-UAV Reinforcement Learning Algorithm Based on Improved PPO in Ray Framework

Open Access

4 July 2022

journal article
research article
Published by MDPI AG in Drones

Vol. 6 (7), 166
https://doi.org/10.3390/drones6070166

Abstract

Distributed multi-agent collaborative decision-making technology is the key to general artificial intelligence. This paper takes the self-developed Unity3D collaborative combat environment as the test scenario, setting a task that requires heterogeneous unmanned aerial vehicles (UAVs) to perform a distributed decision-making and complete cooperation task. Aiming at the problem of the traditional proximal policy optimization (PPO) algorithm’s poor performance in the field of complex multi-agent collaboration scenarios based on the distributed training framework Ray, the Critic network in the PPO algorithm is improved to learn a centralized value function, and the muti-agent proximal policy optimization (MAPPO) algorithm is proposed. At the same time, the inheritance training method based on course learning is adopted to improve the generalization performance of the algorithm. In the experiment, MAPPO can obtain the highest average accumulate reward compared with other algorithms and can complete the task goal with the fewest steps after convergence, which fully demonstrates that the MAPPO algorithm outperforms the state-of-the-art.

Keywords

This publication has 17 references indexed in Scilit:

In Situ MIMO-WPT Recharging of UAVs Using Intelligent Flying Energy Sources
Drones, 2021
Path Following Control for UAV Using Deep Reinforcement Learning Approach
Guidance, Navigation and Control, 2021
Heterogeneous formation control of multiple rotorcrafts with unknown dynamics by reinforcement learning
Information Sciences, 2021
Multi-agent deep reinforcement learning with type-based hierarchical group communication
Applied Intelligence, 2021
Research on the Multiagent Joint Proximal Policy Optimization Algorithm Controlling Cooperative Fixed-Wing UAV Obstacle Avoidance
Sensors, 2020
Boundary‐aware vehicle tracking upon UAV
Electronics Letters, 2020
A biologically-inspired reinforcement learning based intelligent distributed flocking control for Multi-Agent Systems in presence of uncertain system and dynamic environment
IFAC Journal of Systems and Control, 2020
Deep-Reinforcement-Learning-Based Autonomous UAV Navigation With Sparse Rewards
IEEE Internet of Things Journal, 2020
Self-Paced Prioritized Curriculum Learning With Coverage Penalty in Deep Reinforcement Learning
IEEE Transactions on Neural Networks and Learning Systems, 2018
Simple statistical gradient-following algorithms for connectionist reinforcement learning
Machine Learning, 1992

Cited by 12 articles