Latent Geometry Beyond Search: Amortizing Planning in World Models

Modern vision-based world models can represent observations as compact yet expressive latent manifolds, but fast goal-oriented planning in these spaces remains challenging. This raises a central question: when does a learned representation simplify control, rather than merely enabling prediction? We study this question in a pretrained LeWorldModel, whose latent geometry is regularized for smoothness and uniformity. Our key insight is that, under such geometry, planning can be amortized into a latent inverse-dynamics mapping instead of requiring online search. We therefore replace iterative planning with a lightweight Goal-Conditioned Inverse Dynamics Model (GC-IDM) that maps the current latent state, goal latent state, and remaining horizon directly to the next action. Empirically, across four benchmark environments spanning navigation, contact-rich manipulation, and continuous control, our controller matches or exceeds CEM in seven of eight environment-protocol settings while reducing per-decision cost by 100-130x. A broader sweep over test-time planners (CEM, MPPI, iCEM, and gradient-based methods) shows that this result is not specific to a particular optimizer. These findings suggest that much of the structure recovered by test-time planning is already locally encoded in the latent representation. More broadly, our results indicate that sufficiently structured latent spaces can shift part of the planning burden from online optimization to learned inference. Our code is publicly available at https://github.com/hdnndh/Latent-Geometry-Beyond-Search-Amortizing-Planning-in-World-Models .

翻译：现代基于视觉的世界模型能够将观测表示为紧凑且富有表现力的潜在流形，但在这些空间中进行快速目标导向规划仍具挑战性。这引出一个核心问题：学习的表示何时能简化控制，而不仅仅是实现预测？我们在预训练的LeWorld模型中研究该问题，该模型的潜在几何被正则化为平滑且均匀。关键洞见在于，在此类几何下，规划可被摊销为潜在逆动力学映射，而无需在线搜索。因此，我们用轻量级目标条件逆动力学模型（GC-IDM）替代迭代规划，该模型将当前潜在状态、目标潜在状态和剩余时域直接映射到下一动作。实验表明，在涵盖导航、接触式操作和连续控制的四个基准环境中，我们的控制器在八种环境-协议设置中的七项中达到或超越了CEM，同时将每步决策成本降低100-130倍。针对测试时规划器（CEM、MPPI、iCEM及梯度方法）的更广泛对比显示，该结果并非特定于某类优化器。这些发现表明，测试时规划恢复的结构大多已局部编码于潜在表示中。更广泛而言，我们的结果表明，足够结构化的潜在空间可将部分规划负担从在线优化转移至学习推理。代码开源于https://github.com/hdnndh/Latent-Geometry-Beyond-Search-Amortizing-Planning-in-World-Models 。