Goal-conditioned planning benefits from learned low-dimensional representations of rich, high-dimensional observations. While compact latent representations, typically learned from variational autoencoders or inverse dynamics, enable goal-conditioned planning they ignore state affordances, thus hampering their sample-efficient planning capabilities. In this paper, we learn a representation that associates reachable states together for effective onward planning. We first learn a latent representation with multi-step inverse dynamics (to remove distracting information); and then transform this representation to associate reachable states together in $\ell_2$ space. Our proposals are rigorously tested in various simulation testbeds. Numerical results in reward-based and reward-free settings show significant improvements in sampling efficiency, and yields layered state abstractions that enable computationally efficient hierarchical planning.
翻译:目标条件规划受益于从丰富的高维观测中学习到的低维表示。虽然通常从变分自编码器或逆动力学中学习的紧凑隐表示支持目标条件规划,但它们忽略了状态可供性,从而阻碍了样本高效的规划能力。在本文中,我们学习一种将可达状态关联起来的表示,以实现有效的向前规划。我们首先通过多步逆动力学学习一个隐表示(以去除干扰信息),然后将该表示转换为在$\ell_2$空间中关联可达状态。我们的提议在各种模拟测试平台中经过了严格测试。在基于奖励和无奖励环境中的数值结果显示,在采样效率上有显著提升,并产生了分层状态抽象,从而支持计算高效的分层规划。