Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.
翻译:强化学习算法的两个理想特性是能够从相对较少的经验中学习,并能学习适用于一系列问题规范的策略。在因子化状态空间中,实现这两个目标的一种方法是学习状态抽象,即仅保留学习当前任务所需的必要变量。本文介绍了因果双模拟建模(Causal Bisimulation Modeling, CBM),该方法通过学习每个任务的动态和奖励函数中的因果关系,推导出最小化的、任务特定的抽象。CBM利用并改进了隐式建模,以训练一个高保真的因果动态模型,该模型可在相同环境中的所有任务中复用。在操作环境和Deepmind控制套件上的实证验证表明,CBM学习的隐式动态模型比显式模型更准确地识别了底层因果关系和状态抽象。此外,推导出的状态抽象使任务学习器能够达到接近神谕级别的样本效率,并在所有任务上优于基线方法。