End-to-end autonomous driving has made significant progress by unifying perception, prediction, and planning within a single learning framework, achieving strong performance in short-horizon decision making. However, most existing E2E-AD methods remain confined to short-horizon planning and lack the ability to model long-term temporal dependencies, which severely limits their generalization and security in complex and highly interactive driving scenarios. In this work, we propose GraphWorld, an E2E-AD framework that explicitly enhances long-horizon planning through latent world modeling. We introduce an Ego-Centric Interaction Graph, which adaptively models critical neighboring agents based on spatial proximity, and propagates relational context to planning queries via cross-node cross-attention. We present a World-State-Conditioned Planning that learns ego-centric latent world representations by modeling interactions between an ego vehicle and surrounding agents. This latent world state captures key interaction dynamics and safety-relevant semantics, and serves as a conditioning signal to guide long-horizon, safety-aware trajectory planning. Extensive experiments on Bench2Drive, NAVSIMv1/2, and nuScenes demonstrate that GraphWorld significantly reduces collision rates and improves long-horizon planning performance, validating its effectiveness in complex driving environments.
翻译:端到端自动驾驶通过在单一学习框架内统一感知、预测与规划,在短时域决策中取得了显著进展。然而,现有大多数端到端自动驾驶方法仍局限于短时域规划,缺乏对长时域时序依赖关系的建模能力,这严重限制了其在复杂高交互驾驶场景中的泛化性与安全性。本文提出GraphWorld——一种通过隐式世界建模显式增强长时域规划的端到端自动驾驶框架。我们引入以自我为中心的交互图(Ego-Centric Interaction Graph),基于空间邻近性自适应建模关键邻域智能体,并通过跨节点交叉注意力将关系上下文传播至规划查询。进一步提出世界状态条件规划(World-State-Conditioned Planning),通过建模自我车辆与周围智能体间的交互来学习以自我为中心的隐式世界表征。该隐式世界状态捕捉关键交互动态与安全相关语义,并作为条件信号引导长时域安全感知轨迹规划。在Bench2Drive、NAVSIM v1/v2及nuScenes上的大量实验表明,GraphWorld显著降低碰撞率并提升长时域规划性能,验证了其在复杂驾驶环境中的有效性。