End-to-end deep reinforcement learning (DRL) for quadrotor control promises many benefits -- easy deployment, task generalization and real-time execution capability. Prior end-to-end DRL-based methods have showcased the ability to deploy learned controllers onto single quadrotors or quadrotor teams maneuvering in simple, obstacle-free environments. However, the addition of obstacles increases the number of possible interactions exponentially, thereby increasing the difficulty of training RL policies. In this work, we propose an end-to-end DRL approach to control quadrotor swarms in environments with obstacles. We provide our agents a curriculum and a replay buffer of the clipped collision episodes to improve performance in obstacle-rich environments. We implement an attention mechanism to attend to the neighbor robots and obstacle interactions - the first successful demonstration of this mechanism on policies for swarm behavior deployed on severely compute-constrained hardware. Our work is the first work that demonstrates the possibility of learning neighbor-avoiding and obstacle-avoiding control policies trained with end-to-end DRL that transfers zero-shot to real quadrotors. Our approach scales to 32 robots with 80% obstacle density in simulation and 8 robots with 20% obstacle density in physical deployment. Video demonstrations are available on the project website at: https://sites.google.com/view/obst-avoid-swarm-rl.
翻译:端到端深度强化学习(DRL)在四旋翼控制中具有诸多优势——易于部署、任务泛化能力强及实时执行能力。现有基于端到端DRL的方法已展示了将学习到的控制器部署到单个四旋翼或简单无障碍环境中编队飞行的四旋翼集群的能力。然而,障碍物的加入使潜在交互呈指数级增长,从而增加了强化学习策略的训练难度。本研究提出一种端到端DRL方法,用于在含障碍物环境中控制四旋翼无人机集群。我们为智能体设计了课程学习机制和裁剪碰撞经验回放缓冲区,以提升其在密集障碍环境中的性能。我们引入注意力机制来关注邻近机器人与障碍物的交互——这是首次成功将注意力机制应用于部署在严重计算受限硬件上的集群行为策略中。本研究首次证明了通过端到端DRL训练得到的避邻避障控制策略能够零样本迁移至真实四旋翼平台。我们的方法在仿真中可扩展至32台机器人(障碍物密度80%),在物理部署中可支持8台机器人(障碍物密度20%)。项目演示视频参见:https://sites.google.com/view/obst-avoid-swarm-rl。