Real-world reinforcement learning demands adaptation to unseen environmental conditions without costly retraining. Contextual Markov Decision Processes (cMDP) model this challenge, but existing methods often require explicit context variables (e.g., friction, gravity), limiting their use when contexts are latent or hard to measure. We introduce Dynamics-Aligned Latent Imagination (DALI), a framework integrated within the Dreamer architecture that infers latent context representations from agent-environment interactions. By training a self-supervised encoder to predict forward dynamics, DALI generates actionable representations conditioning the world model and policy, bridging perception and control. We theoretically prove this encoder is essential for efficient context inference and robust generalization. DALI's latent space enables counterfactual consistency: Perturbing a gravity-encoding dimension alters imagined rollouts in physically plausible ways. On challenging cMDP benchmarks, DALI achieves significant gains over context-unaware baselines, often surpassing context-aware baselines in extrapolation tasks, enabling zero-shot generalization to unseen contextual variations.
翻译:现实世界的强化学习要求在不进行昂贵重训练的情况下适应未见过的环境条件。上下文马尔可夫决策过程(cMDP)对此挑战进行了建模,但现有方法通常需要显式的上下文变量(例如摩擦力、重力),当上下文是潜在或难以测量时,其应用受到限制。我们提出了动态对齐潜在想象(DALI),这是一个集成在Dreamer架构中的框架,它从智能体与环境的交互中推断潜在的上下文表示。通过训练一个自监督编码器来预测前向动态,DALI生成可操作的表示,用于调节世界模型和策略,从而弥合感知与控制。我们从理论上证明该编码器对于高效的上下文推断和鲁棒的泛化至关重要。DALI的潜在空间实现了反事实一致性:扰动一个编码重力的维度会以物理上合理的方式改变想象的推演轨迹。在具有挑战性的cMDP基准测试中,DALI相较于无上下文感知的基线方法取得了显著提升,在外推任务中常常超越有上下文感知的基线方法,从而实现了对未见过的上下文变化的零样本泛化。