Current open-loop trajectory models struggle in real-world autonomous driving because minor initial deviations often cascade into compounding errors, pushing the agent into out-of-distribution states. While fully differentiable closed-loop simulators attempt to address this, they suffer from shortcut learning: the loss gradients flow backward through induced state inputs, inadvertently leaking future ground truth information directly into the model's own previous predictions. The model exploits these signals to artificially avoid drift, non-causally "regretting" past mistakes rather than learning genuinely reactive recovery. To address this, we introduce a detached receding horizon rollout. By explicitly severing the computation graph between simulation steps, the model learns genuine recovery behaviors from drifted states, forcing it to "rectify" mistakes rather than non-causally optimizing past predictions. Extensive evaluations on the nuScenes and DeepScenario datasets show our approach yields more robust recovery strategies, reducing target collisions by up to 33.24% compared to fully differentiable closed-loop training at high replanning frequencies. Furthermore, compared to standard open-loop baselines, our non-differentiable framework decreases collisions by up to 27.74% in dense environments while simultaneously improving multi-modal prediction diversity and lane alignment.
翻译:当前的开环轨迹模型在实际自动驾驶中表现不佳,因为微小的初始偏差通常会累积为复合误差,将智能体推向分布外状态。虽然全可微闭环模拟器试图解决这一问题,但它们存在捷径学习困境:损失梯度通过诱发状态输入反向传播,无意中将未来真实标签信息直接泄露给模型自身之前的预测。模型利用这些信号人为地避免漂移,非因果地“后悔”过去的错误,而非学习真正的反应性恢复。为解决该问题,我们提出了一种解耦的滚动时域展开方法。通过显式切断仿真步骤之间的计算图,模型能够从漂移状态中学习真正的恢复行为,迫使其“纠偏”错误而非非因果地优化过去预测。在nuScenes和DeepScenario数据集上的广泛评估表明,我们的方法能生成更鲁棒的恢复策略,在高重规划频率下,与全可微闭环训练相比,目标碰撞率最多降低33.24%。此外,与标准开环基线相比,我们的非可微框架在密集场景中碰撞率最高降低27.74%,同时提升了多模态预测多样性和车道对齐能力。