Sample-efficiency and reliability remain major bottlenecks toward wide adoption of reinforcement learning algorithms in continuous settings with high-dimensional perceptual inputs. Toward addressing these challenges, we introduce a new theoretical framework, RichCLD (Rich-Observation RL with Continuous Latent Dynamics), in which the agent performs control based on high-dimensional observations, but the environment is governed by low-dimensional latent states and Lipschitz continuous dynamics. Our main contribution is a new algorithm for this setting that is provably statistically and computationally efficient. The core of our algorithm is a new representation learning objective; we show that prior representation learning schemes tailored to discrete dynamics do not naturally extend to the continuous setting. Our new objective is amenable to practical implementation, and empirically, we find that it compares favorably to prior schemes in a standard evaluation protocol. We further provide several insights into the statistical complexity of the RichCLD framework, in particular proving that certain notions of Lipschitzness that admit sample-efficient learning in the absence of rich observations are insufficient in the rich-observation setting.
翻译:样本效率和可靠性仍然是强化学习算法在具有高维感知输入的连续场景中广泛采用的主要瓶颈。为应对这些挑战,我们提出了一个新的理论框架RichCLD(基于连续潜在动态的富观测强化学习),在该框架中,智能体基于高维观测执行控制,但环境由低维潜在状态和Lipschitz连续动态所支配。我们的主要贡献是为该场景设计了一种在统计和计算上均被证明高效的新算法。该算法的核心是一个新的表示学习目标;我们证明,先前针对离散动态设计的表示学习方案无法自然地扩展到连续场景。我们的新目标易于实际实现,并且通过实证研究发现,在标准评估协议中,其性能优于现有方案。我们进一步对RichCLD框架的统计复杂性提出了若干见解,特别证明了某些在无富观测情况下允许样本高效学习的Lipschitz性质概念,在富观测场景中是不充分的。