Simulating realistic behaviors of traffic agents is pivotal for efficiently validating the safety of autonomous driving systems. Existing data-driven simulators primarily use an encoder-decoder architecture to encode the historical trajectories before decoding the future. However, the heterogeneity between encoders and decoders complicates the models, and the manual separation of historical and future trajectories leads to low data utilization. Given these limitations, we propose BehaviorGPT, a homogeneous and fully autoregressive Transformer designed to simulate the sequential behavior of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future" by modeling each time step as the "current" one for motion generation, leading to a simpler, more parameter- and data-efficient agent simulator. We further introduce the Next-Patch Prediction Paradigm (NP3) to mitigate the negative effects of autoregressive modeling, in which models are trained to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. Despite having merely 3M model parameters, BehaviorGPT won first place in the 2024 Waymo Open Sim Agents Challenge with a realism score of 0.7473 and a minADE score of 1.4147, demonstrating its exceptional performance in traffic agent simulation.
翻译:交通参与者的真实行为仿真是高效验证自动驾驶系统安全性的关键。现有数据驱动的仿真器主要采用编码器-解码器架构,先对历史轨迹编码再解码未来轨迹。然而,编码器与解码器之间的异构性增加了模型复杂度,且人为划分历史与未来轨迹导致数据利用率低下。针对这些局限,我们提出BehaviorGPT——一种同质化、完全自回归的Transformer架构,用于模拟多智能体的序列行为。本方法的关键创新在于摒弃了传统的“历史”与“未来”划分,通过将每个时间步建模为运动生成的“当前”状态,构建出更简洁、参数效率更高且数据利用率更优的智能体仿真器。我们进一步提出下一片段预测范式(NP3)以缓解自回归建模的负面影响,该范式通过训练模型在轨迹片段层级进行推理,从而捕获长程时空交互关系。尽管仅包含300万参数,BehaviorGPT在2024年Waymo开放仿真智能体挑战赛中以0.7473的真实性得分和1.4147的最小平均位移误差夺得冠军,彰显了其在交通参与者仿真领域的卓越性能。