Evaluating autonomous vehicle stacks (AVs) in simulation typically involves replaying driving logs from real-world recorded traffic. However, agents replayed from offline data are not reactive and hard to intuitively control. Existing approaches address these challenges by proposing methods that rely on heuristics or generative models of real-world data but these approaches either lack realism or necessitate costly iterative sampling procedures to control the generated behaviours. In this work, we take an alternative approach and propose CtRL-Sim, a method that leverages return-conditioned offline reinforcement learning (RL) to efficiently generate reactive and controllable traffic agents. Specifically, we process real-world driving data through a physics-enhanced Nocturne simulator to generate a diverse offline RL dataset, annotated with various rewards. With this dataset, we train a return-conditioned multi-agent behaviour model that allows for fine-grained manipulation of agent behaviours by modifying the desired returns for the various reward components. This capability enables the generation of a wide range of driving behaviours beyond the scope of the initial dataset, including adversarial behaviours. We show that CtRL-Sim can generate realistic safety-critical scenarios while providing fine-grained control over agent behaviours.
翻译:在仿真环境中评估自动驾驶系统(AVs)通常涉及回放来自真实世界记录交通的驾驶日志。然而,从离线数据回放的智能体不具备反应能力,且难以进行直观控制。现有方法通过依赖启发式规则或真实世界数据的生成模型来应对这些挑战,但这些方法要么缺乏真实性,要么需要成本高昂的迭代采样过程来控制生成的行为。在本工作中,我们采用一种替代方法,提出了CtRL-Sim,该方法利用回报条件化离线强化学习(RL)来高效生成反应式且可控的交通智能体。具体而言,我们通过一个物理增强的Nocturne模拟器处理真实世界驾驶数据,以生成一个多样化的离线RL数据集,并标注了多种奖励。利用该数据集,我们训练了一个回报条件化的多智能体行为模型,该模型允许通过修改不同奖励组件的期望回报来对智能体行为进行细粒度操控。这种能力使得能够生成超出初始数据集范围的广泛驾驶行为,包括对抗性行为。我们证明,CtRL-Sim能够生成真实的安全关键场景,同时提供对智能体行为的细粒度控制。