Learning good representations is essential for latent planning with world models. While pretrained visual encoders produce strong semantic visual features, they are not tailored to planning and contain information irrelevant -- or even detrimental -- to planning. Inspired by the perceptual straightening hypothesis in human visual processing, we introduce temporal straightening to improve representation learning for latent planning. Using a curvature regularizer that encourages locally straightened latent trajectories, we jointly learn an encoder and a predictor of a Joint-Embedding Predictive Architecture (JEPA) world model. We show that reducing curvature this way makes the Euclidean distance in latent space a better proxy for the geodesic distance and improves the conditioning of the planning objective. We demonstrate empirically that temporal straightening makes gradient-based planning more stable and yields significantly higher success rates across a suite of goal-reaching tasks. Our code is available at https://agenticlearning.ai/temporal-straightening.
翻译:学习良好的表征对于基于世界模型的潜在规划至关重要。虽然预训练的视觉编码器能够产生强大的语义视觉特征,但这些特征并非专门为规划而设计,且包含与规划无关甚至有害的信息。受人类视觉处理中感知直线化假说的启发,我们引入时间直线化以改进潜在规划中的表征学习。通过使用一种鼓励潜在轨迹局部直线化的曲率正则化器,我们联合学习了联合嵌入预测架构(JEPA)世界模型的编码器与预测器。研究表明,以这种方式降低曲率可使潜在空间中的欧氏距离更好地近似测地距离,并改善规划目标的条件。实验表明,时间直线化能使基于梯度的规划更加稳定,在一系列目标到达任务中显著提高成功率。我们的代码已开源,网址为https://agenticlearning.ai/temporal-straightening。