Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl
翻译:自主系统在孤立环境或仿真中已实现超人性能,但在共享、动态的真实物理空间中仍显脆弱。这一失败源于物理应用中占主导地位的单智能体范式——该范式将其他参与者忽略或视为环境噪声,从而阻碍了有效协调。本文证明,多智能体强化学习为现实世界交互提供了必要的安全支撑。我们以高速四旋翼竞速作为高风险测试平台,训练智能体在可变数量竞速者中应对复杂气动交互与策略机动。通过联赛制自博弈,智能体逐步演化出先进的预测性行为,包括主动避碰、超车以及处理多智能体物理交互(含气动下洗效应)。在超过22米/秒的多机竞速中,我们的智能体不仅击败了冠军级人类飞行员,还将碰撞率相较于最先进的单智能体基线降低50%。关键的是,通过多样化人工智能体的训练,实现了对更安全人机交互的零样本泛化。这些结果表明,通往鲁棒机器人共存的路径不在于孤立的安全约束,而在于多智能体交互的严苛要求。多媒体资料见:https://rpg.ifi.uzh.ch/marl