Simulation agents are essential for designing and testing systems that interact with humans, such as autonomous vehicles (AVs). These agents serve various purposes, from benchmarking AV performance to stress-testing the system's limits, but all use cases share a key requirement: reliability. A simulation agent should behave as intended by the designer, minimizing unintended actions like collisions that can compromise the signal-to-noise ratio of analyses. As a foundation for reliable sim agents, we propose scaling self-play to thousands of scenarios on the Waymo Open Motion Dataset under semi-realistic limits on human perception and control. Training from scratch on a single GPU, our agents nearly solve the full training set within a day. They generalize effectively to unseen test scenes, achieving a 99.8% goal completion rate with less than 0.8% combined collision and off-road incidents across 10,000 held-out scenarios. Beyond in-distribution generalization, our agents show partial robustness to out-of-distribution scenes and can be fine-tuned in minutes to reach near-perfect performance in those cases. Demonstrations of agent behaviors can be found at this link. We open-source both the pre-trained agents and the complete code base. Demonstrations of agent behaviors can be found at \url{https://sites.google.com/view/reliable-sim-agents}.
翻译:仿真智能体对于设计和测试与人类交互的系统(如自动驾驶车辆)至关重要。这些智能体用途广泛,从基准测试自动驾驶性能到压力测试系统极限,但所有应用场景都有一个共同的关键要求:可靠性。仿真智能体应按照设计者的意图行为,最大限度地减少碰撞等意外动作,以免影响分析的信噪比。作为可靠仿真智能体的基础,我们提出在Waymo开放运动数据集上,在人类感知与控制的半现实限制条件下,将自博弈扩展到数千个场景。通过在单GPU上从零开始训练,我们的智能体在一天内几乎完全解决了整个训练集。它们能有效泛化到未见过的测试场景,在10,000个保留场景中实现了99.8%的目标完成率,碰撞与偏离道路的综合事故率低于0.8%。除分布内泛化外,我们的智能体对分布外场景表现出部分鲁棒性,并可通过数分钟微调在这些场景中达到接近完美的性能。智能体行为演示可见此链接。我们开源了预训练智能体与完整代码库。智能体行为演示可见 \url{https://sites.google.com/view/reliable-sim-agents}。