Supervised open-loop training has been widely adopted for training traffic simulation models; however, it fails to capture the inherently dynamic, multi-agent interactions common in complex driving scenarios. We introduce RLFTSim, a reinforcement-learning-based fine-tuning framework that enhances scenario realism by aligning simulator rollouts with real-world data distributions and provides a method for distilling goal-conditioned controllability in scenario generation. We instantiate RLFTSim on top of a pre-trained simulation model, design a reward that balances fidelity and controllability, and perform comprehensive experiments on the Waymo Open Motion Dataset. Our results show improvements in realism, achieving state-of-the-art performance. Compared with other heuristic search-based fine-tuning methods, RLFTSim requires significantly fewer samples due to a proposed low-variance and dense reward signal, and it directly addresses the realism alignment issue by design. We also demonstrate the effectiveness of our approach for distilling traffic simulation controllability through goal conditioning. The project page is available at https://ehsan-ami.github.io/rlftsim.
翻译:监督式开环训练已被广泛用于训练交通仿真模型,但该方法无法捕捉复杂驾驶场景中固有的动态多智能体交互。我们提出RLFTSim——一种基于强化学习的微调框架,通过将仿真器 rollout 与真实数据分布对齐来增强场景真实性,并提供一种在场景生成中蒸馏目标条件可控性的方法。我们在预训练仿真模型上实例化RLFTSim,设计平衡保真度与可控性的奖励函数,并在Waymo开放运动数据集上进行全面实验。结果表明,我们的方法在真实性上有所提升,达到了最先进的性能。与其他基于启发式搜索的微调方法相比,RLFTSim因提出的低方差密集奖励信号而需要显著更少的样本,且其设计直接解决了真实性对齐问题。我们还通过目标条件化证明了该方法在蒸馏交通仿真可控性方面的有效性。项目页面详见https://ehsan-ami.github.io/rlftsim。