IDOL: Inverse-Dynamics-Guided Future Prediction for End-to-End Autonomous Driving

End-to-end autonomous driving has emerged as a compelling paradigm for learning planning directly from sensor observations, while recent world-model-based approaches further enrich this paradigm by enabling explicit reasoning about how the scene may evolve in the future. Yet future prediction alone does not guarantee better planning unless the predicted evolution can be converted into planning-relevant trajectory updates. Many current methods still forecast future scene states without explicitly decoding the motion implications hidden in state transitions. As a result, future reasoning often remains descriptively useful but only weakly coupled to executable motion generation. To address this limitation, we propose \mathbf{IDOL}, an inverse-dynamics-guided future prediction framework for world-model-based end-to-end planning in latent BEV space, where inverse dynamics serves as the key bridge between future prediction and trajectory optimization. IDOL first predicts multiple future latent scene states with a BEV world model, then applies an inverse dynamics model to adjacent latent futures to decode transition-aware trajectory features and recover planning-relevant motion deltas that explain how the latent world evolves over time. These inverse-dynamics-derived signals are used to optimize the planned trajectory, turning future forecasting from passive scene anticipation into actionable planning guidance. A lightweight closed-loop refinement module further improves long-horizon consistency by reusing the optimized trajectory for another round of future-aware reasoning. By introducing inverse dynamics into latent future reasoning, IDOL tightens the coupling between world modeling and planning. Extensive experiments on the NAVSIM v1 and NAVSIM v2 benchmarks show that IDOL achieves state-of-the-art performance among comparable methods.

翻译：端到端自动驾驶已成为直接从传感器观测学习规划的一种有前景范式，而近期基于世界模型的方法通过显式推理场景未来演化进一步丰富了这一范式。然而，仅依赖未来预测并不能保证更好的规划，除非预测的演化结果能被转化为与规划相关的轨迹更新。当前许多方法仍直接预测未来场景状态，而未显式解码状态转换中隐含的运动含义，导致未来推理虽具有描述性价值，却与可执行的运动生成耦合薄弱。针对这一局限，我们提出IDOL——一种面向基于世界模型的端到端规划的逆动力学引导未来预测框架，在潜在BEV空间中运行，其中逆动力学作为连接未来预测与轨迹优化的关键桥梁。IDOL首先通过BEV世界模型预测多个未来潜在场景状态，随后利用相邻潜在未来状态的逆动力学模型解码具有过渡感知特性的轨迹特征，并恢复可解释潜在世界随时间演化的规划相关运动增量。这些逆动力学衍生信号被用于优化规划轨迹，将未来预测从被动的场景预测转化为可执行的规划指导。轻量级闭环细化模块通过复用优化轨迹进行新一轮未来感知推理，进一步提升长期一致性。通过在潜在未来推理中引入逆动力学，IDOL强化了世界建模与规划之间的耦合。在NAVSIM v1和NAVSIM v2基准上的大量实验表明，IDOL在同类方法中达到了最优性能。