This paper addresses the problem of guiding a quadrotor through a predefined sequence of waypoints in cluttered environments, aiming to minimize the flight time while avoiding collisions. Previous approaches either suffer from prolonged computational time caused by solving complex non-convex optimization problems or are limited by the inherent smoothness of polynomial trajectory representations, thereby restricting the flexibility of movement. In this work, we present a safe reinforcement learning approach for autonomous drone racing with time-optimal flight in cluttered environments. The reinforcement learning policy, trained using safety and terminal rewards specifically designed to enforce near time-optimal and collision-free flight, outperforms current state-of-the-art algorithms. Additionally, experimental results demonstrate the efficacy of the proposed approach in achieving both minimum flight time and obstacle avoidance objectives in complex environments, with a commendable $66.7\%$ success rate in unseen, challenging settings.
翻译:本文研究了在杂乱环境中引导四旋翼飞行器通过预定航点序列的问题,旨在最小化飞行时间的同时避免碰撞。现有方法或因求解复杂非凸优化问题导致计算时间过长,或受限于多项式轨迹表示固有的平滑性,从而限制了运动的灵活性。本工作提出了一种用于杂乱环境中时间最优飞行的自主无人机竞速安全强化学习方法。该强化学习策略通过专门设计的安全与终端奖励进行训练,以确保近乎时间最优且无碰撞的飞行,其性能超越了当前最先进的算法。此外,实验结果证明了所提方法在复杂环境中同时实现最小飞行时间和避障目标的有效性,在未见过的挑战性场景中取得了值得称赞的$66.7\%$成功率。