Current trajectory prediction models are primarily trained in an open-loop manner, which often leads to covariate shift and compounding errors when deployed in real-world, closed-loop settings. Furthermore, relying on static datasets or non-reactive log-replay simulators severs the interactive loop, preventing the ego agent from learning to actively negotiate surrounding traffic. In this work, we propose an on-policy closed-loop training paradigm optimized for high-frequency, receding horizon ego prediction. To ground the ego prediction in a realistic representation of traffic interactions and to achieve reactive consistency, we introduce a goal-oriented, transformer-based scene decoder, resulting in an inherently reactive training simulation. By exposing the ego agent to a mixture of open-loop data and simulated, self-induced states, the model learns recovery behaviors to correct its own execution errors. Extensive evaluation demonstrates that closed-loop training significantly enhances collision avoidance capabilities at high replanning frequencies, yielding relative collision rate reductions of up to 27.0% on nuScenes and 79.5% in dense DeepScenario intersections compared to open-loop baselines. Additionally, we show that a hybrid simulation combining reactive with non-reactive surrounding agents achieves optimal balance between immediate interactivity and long-term behavioral stability.
翻译:当前轨迹预测模型主要采用开环方式训练,这导致模型在实际部署到闭环场景时易出现协变量偏移和累积误差。此外,依赖静态数据集或非反应式日志回放模拟器会切断交互回路,使自车无法学习主动协调周围交通流。本文提出一种基于策略的闭环训练范式,针对高频滚动时域的自车预测进行优化。为了将自车预测建立在真实交通交互表征基础上并实现反应一致性,我们引入了一种基于Transformer的面向目标场景解码器,从而构建内在反应式训练仿真环境。通过让自车同时接触开环数据与模拟的自诱发状态,模型可学习纠正自身执行误差的恢复行为。大量实验表明,闭环训练显著提升了高频重规划场景下的避碰能力——与开环基线相比,在nuScenes数据集上碰撞率相对降低27.0%,在密集DeepScenario交叉口场景中降低79.5%。此外,我们证明:将反应式与非反应式周围车辆相结合的混合仿真模式,可在即时交互性与长期行为稳定性之间达成最优平衡。