Data-efficient learning remains a central challenge in autonomous driving due to the high cost and safety risks of large-scale real-world interaction. Although world-model-based reinforcement learning enables policy optimization through latent imagination, existing approaches often lack explicit mechanisms to encode spatial and kinematic structure essential for driving tasks. In this work, we build upon the Recurrent State-Space Model (RSSM) and propose a kinematics-aware latent world model framework for autonomous driving. Vehicle kinematic information is incorporated into the observation encoder to ground latent transitions in physically meaningful motion dynamics, while geometry-aware supervision regularizes the RSSM latent state to capture task-relevant spatial structure beyond pixel reconstruction. The resulting structured latent dynamics improve long-horizon imagination fidelity and stabilize policy optimization. Experiments in a driving simulation benchmark demonstrate consistent gains over both model-free and pixel-based world-model baselines in terms of sample efficiency and driving performance. Ablation studies further verify that the proposed design enhances spatial representation quality within the latent space. These results suggest that integrating kinematic grounding into RSSM-based world models provides a scalable and physically grounded paradigm for autonomous driving policy learning.
翻译:数据高效学习仍然是自动驾驶领域的核心挑战,这主要源于大规模真实世界交互的高成本与安全风险。尽管基于世界模型的强化学习能够通过隐想象实现策略优化,但现有方法通常缺乏对驾驶任务至关重要的空间与运动学结构进行显式编码的机制。本研究基于循环状态空间模型(RSSM),提出了一种用于自动驾驶的运动学感知隐世界模型框架。车辆运动学信息被整合至观测编码器中,使隐状态转移建立在具有物理意义的运动动力学基础上;同时,几何感知监督对RSSM隐状态进行正则化,以捕获超越像素重建的任务相关空间结构。由此形成的结构化隐动力学提升了长时程想象的保真度并稳定了策略优化。在驾驶仿真基准上的实验表明,该方法在样本效率和驾驶性能方面均优于无模型及基于像素的世界模型基线。消融研究进一步验证了所提设计能够提升隐空间内的空间表征质量。这些结果表明,将运动学基础整合到基于RSSM的世界模型中,为自动驾驶策略学习提供了一个可扩展且具有物理基础的范式。