A longstanding challenge for self-driving development is simulating dynamic driving scenarios seeded from recorded driving logs. In pursuit of this functionality, we apply tools from discrete sequence modeling to model how vehicles, pedestrians and cyclists interact in driving scenarios. Using a simple data-driven tokenization scheme, we discretize trajectories to centimeter-level resolution using a small vocabulary. We then model the multi-agent sequence of motion tokens with a GPT-like encoder-decoder that is autoregressive in time and takes into account intra-timestep interaction between agents. Scenarios sampled from our model exhibit state-of-the-art realism; our model tops the Waymo Sim Agents Benchmark, surpassing prior work along the realism meta metric by 3.3% and along the interaction metric by 9.9%. We ablate our modeling choices in full autonomy and partial autonomy settings, and show that the representations learned by our model can quickly be adapted to improve performance on nuScenes. We additionally evaluate the scalability of our model with respect to parameter count and dataset size, and use density estimates from our model to quantify the saliency of context length and intra-timestep interaction for the traffic modeling task.
翻译:自动驾驶开发中的一项长期挑战是如何从记录的驾驶日志中生成动态驾驶场景。为实现这一功能,我们运用离散序列建模工具来模拟车辆、行人和骑行者在驾驶场景中的交互方式。通过一种简单的数据驱动分词方案,我们利用小型词汇表将轨迹离散化至厘米级精度。随后,我们使用类似GPT的编码器-解码器架构对多智能体运动令牌序列进行建模,该模型在时间维度上自回归,并考虑了时间步内智能体之间的交互。从我们模型中采样的场景展现出最先进的真实感;该模型在Waymo Sim Agents基准测试中名列前茅,在真实感元指标上超越先前工作3.3%,在交互指标上超越9.9%。我们在完全自主和部分自主设置下对建模选择进行了消融实验,并表明模型学到的表示可快速适应以提升在nuScenes上的性能。此外,我们评估了模型在参数量和数据集规模方面的可扩展性,并利用模型的密度估计量化了上下文长度和时间步内交互对交通建模任务的重要性。