Simulating realistic interactions among traffic agents is crucial for efficiently validating the safety of autonomous driving systems. Existing leading simulators primarily use an encoder-decoder structure to encode the historical trajectories for future simulation. However, such a paradigm complicates the model architecture, and the manual separation of history and future trajectories leads to low data utilization. To address these challenges, we propose Behavior Generative Pre-trained Transformers (BehaviorGPT), a decoder-only, autoregressive architecture designed to simulate the sequential motion of multiple agents. Crucially, our approach discards the traditional separation between "history" and "future," treating each time step as the "current" one, resulting in a simpler, more parameter- and data-efficient design that scales seamlessly with data and computation. Additionally, we introduce the Next-Patch Prediction Paradigm (NP3), which enables models to reason at the patch level of trajectories and capture long-range spatial-temporal interactions. BehaviorGPT ranks first across several metrics on the Waymo Sim Agents Benchmark, demonstrating its exceptional performance in multi-agent and agent-map interactions. We outperformed state-of-the-art models with a realism score of 0.741 and improved the minADE metric to 1.540, with an approximately 91.6% reduction in model parameters.
翻译:在自动驾驶系统中,高效验证安全性需要模拟交通参与者之间的真实交互。现有主流仿真器主要采用编码器-解码器结构对历史轨迹进行编码以预测未来运动,但该范式使模型架构复杂化,且人为划分历史与未来轨迹导致数据利用率低下。为解决这些问题,我们提出行为生成预训练变换器(BehaviorGPT),这是一种仅含解码器的自回归架构,专为模拟多智能体的序列运动而设计。关键创新在于摒弃传统“历史”与“未来”的划分,将每个时间步均视为“当前”状态,从而形成更简洁、参数与数据效率更高的设计,并能随数据与计算资源无缝扩展。此外,我们提出下一片段预测范式(NP3),使模型能在轨迹片段层级进行推理,捕获长程时空交互。BehaviorGPT 在 Waymo 仿真智能体基准测试的多个指标中均位列第一,展现了其在多智能体及智能体-地图交互方面的卓越性能。我们以 0.741 的真实性评分超越现有最优模型,并将最小平均位移误差指标提升至 1.540,同时模型参数量减少约 91.6%。