Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviours. In this work, we take an alternative approach and propose CtRL-Sim, a method that leverages return-conditioned offline reinforcement learning to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through a physics-enhanced Nocturne simulator to generate a diverse offline reinforcement learning dataset, annotated with various reward terms. With this dataset, we train a return-conditioned multi-agent behaviour model that allows for fine-grained manipulation of agent behaviours by modifying the desired returns for the various reward components. This capability enables the generation of a wide range of driving behaviours beyond the scope of the initial dataset, including adversarial behaviours. We demonstrate that CtRL-Sim can generate diverse and realistic safety-critical scenarios while providing fine-grained control over agent behaviours.
翻译:在仿真环境中评估自动驾驶系统通常涉及回放真实世界记录的交通驾驶日志。然而,从离线数据回放的智能体不具备反应能力且难以直观控制。现有方法通过依赖启发式规则或真实世界数据的生成模型来应对这些挑战,但这些方法要么缺乏真实性,要么需要昂贵的迭代采样过程来控制生成的行为。在本研究中,我们采用一种替代方法,提出了CtRL-Sim,该方法利用回报条件化离线强化学习来高效生成反应式且可控的交通智能体。具体而言,我们通过物理增强的Nocturne模拟器处理真实世界驾驶数据,生成一个多样化的离线强化学习数据集,并使用多种奖励项进行标注。基于该数据集,我们训练了一个回报条件化的多智能体行为模型,该模型允许通过修改不同奖励组件的期望回报来精细操控智能体行为。此能力能够生成超出初始数据集范围的广泛驾驶行为,包括对抗性行为。我们证明,CtRL-Sim能够生成多样且真实的安全关键场景,同时提供对智能体行为的精细控制。