Traffic simulation aims to learn a policy for traffic agents that, when unrolled in closed-loop, faithfully recovers the joint distribution of trajectories observed in the real world. Inspired by large language models, tokenized multi-agent policies have recently become the state-of-the-art in traffic simulation. However, they are typically trained through open-loop behavior cloning, and thus suffer from covariate shift when executed in closed-loop during simulation. In this work, we present Closest Among Top-K (CAT-K) rollouts, a simple yet effective closed-loop fine-tuning strategy to mitigate covariate shift. CAT-K fine-tuning only requires existing trajectory data, without reinforcement learning or generative adversarial imitation. Concretely, CAT-K fine-tuning enables a small 7M-parameter tokenized traffic simulation policy to outperform a 102M-parameter model from the same model family, achieving the top spot on the Waymo Sim Agent Challenge leaderboard at the time of submission. The code is available at https://github.com/NVlabs/catk.
翻译:交通仿真的目标是学习交通智能体的策略,使其在闭环展开时能够准确还原现实世界中观测到的轨迹联合分布。受大型语言模型的启发,令牌化多智能体策略最近已成为交通仿真的最先进方法。然而,这些策略通常通过开环行为克隆进行训练,因此在仿真期间以闭环方式执行时会遭受协变量偏移的影响。本工作提出Top-K最近邻(CAT-K)展开策略,这是一种简单而有效的闭环微调方法,用于缓解协变量偏移问题。CAT-K微调仅需利用现有轨迹数据,无需强化学习或生成对抗模仿。具体而言,通过CAT-K微调,一个仅含700万参数的令牌化交通仿真策略能够超越同模型系列中具有1.02亿参数的模型,在提交时登顶Waymo仿真智能体挑战赛排行榜。代码已发布于https://github.com/NVlabs/catk。