SPIRAL: Self-Play Incremental Racing Algorithm for Learning in Multi-Drone Competitions

from arxiv, \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

This paper introduces SPIRAL (Self-Play Incremental Racing Algorithm for Learning), a novel approach for training autonomous drones in multi-agent racing competitions. SPIRAL distinctively employs a self-play mechanism to incrementally cultivate complex racing behaviors within a challenging, dynamic environment. Through this self-play core, drones continuously compete against increasingly proficient versions of themselves, naturally escalating the difficulty of competitive interactions. This progressive learning journey guides agents from mastering fundamental flight control to executing sophisticated cooperative multi-drone racing strategies. Our method is designed for versatility, allowing integration with any state-of-the-art Deep Reinforcement Learning (DRL) algorithms within its self-play framework. Simulations demonstrate the significant advantages of SPIRAL and benchmark the performance of various DRL algorithms operating within it. Consequently, we contribute a versatile, scalable, and self-improving learning framework to the field of autonomous drone racing. SPIRAL's capacity to autonomously generate appropriate and escalating challenges through its self-play dynamic offers a promising direction for developing robust and adaptive racing strategies in multi-agent environments. This research opens new avenues for enhancing the performance and reliability of autonomous racing drones in increasingly complex and competitive scenarios.

翻译：本文介绍了SPIRAL（自博弈增量式竞速学习算法），这是一种在多智能体竞速竞赛中训练自主无人机的新方法。SPIRAL独特地采用自博弈机制，在具有挑战性的动态环境中逐步培养复杂的竞速行为。通过这一自博弈核心，无人机持续与能力不断增强的自身版本竞争，自然地提升了竞争交互的难度。这一渐进式学习过程引导智能体从掌握基础飞行控制，逐步发展到执行复杂的多无人机协同竞速策略。我们的方法设计灵活，允许在其自博弈框架内集成任何先进的深度强化学习（DRL）算法。仿真实验证明了SPIRAL的显著优势，并对在其框架下运行的各种DRL算法进行了性能基准测试。因此，我们为自主无人机竞速领域贡献了一个通用、可扩展且能自我提升的学习框架。SPIRAL通过其自博弈动态，能够自主生成适宜且逐步升级的挑战，这为在多智能体环境中开发鲁棒且自适应的竞速策略提供了有前景的方向。这项研究为在日益复杂和竞争激烈的场景中提升自主竞速无人机的性能与可靠性开辟了新途径。