Learning representations of underlying environmental dynamics from partial observations is a critical challenge in machine learning. In the context of Partially Observable Markov Decision Processes (POMDPs), state representations are often inferred from the history of past observations and actions. We demonstrate that incorporating future information is essential to accurately capture causal dynamics and enhance state representations. To address this, we introduce a Dynamical Variational Auto-Encoder (DVAE) designed to learn causal Markovian dynamics from offline trajectories in a POMDP. Our method employs an extended hindsight framework that integrates past, current, and multi-step future information within a factored-POMDP setting. Empirical results reveal that this approach uncovers the causal graph governing hidden state transitions more effectively than history-based and typical hindsight-based models.
翻译:从部分观测中学习底层环境动态的表示是机器学习中的一个关键挑战。在部分可观测马尔可夫决策过程(POMDPs)的背景下,状态表示通常是从过去的观测和动作历史中推断出来的。我们证明,整合未来信息对于准确捕捉因果动力学和增强状态表示至关重要。为此,我们引入了一种动态变分自编码器(DVAE),旨在从POMDP的离线轨迹中学习因果马尔可夫动力学。我们的方法采用了一个扩展的后见之明框架,该框架在因子化POMDP设置中整合了过去、当前和多步未来的信息。实证结果表明,与基于历史的模型和典型的基于后见之明的模型相比,此方法能更有效地揭示支配隐藏状态转移的因果图。