We focus on the problem of imitation learning from visual observations, where the learning agent has access to videos of experts as its sole learning source. The challenges of this framework include the absence of expert actions and the partial observability of the environment, as the ground-truth states can only be inferred from pixels. To tackle this problem, we first conduct a theoretical analysis of imitation learning in partially observable environments. We establish upper bounds on the suboptimality of the learning agent with respect to the divergence between the expert and the agent latent state-transition distributions. Motivated by this analysis, we introduce an algorithm called Latent Adversarial Imitation from Observations, which combines off-policy adversarial imitation techniques with a learned latent representation of the agent's state from sequences of observations. In experiments on high-dimensional continuous robotic tasks, we show that our algorithm matches state-of-the-art performance while providing significant computational advantages. Additionally, we show how our method can be used to improve the efficiency of reinforcement learning from pixels by leveraging expert videos. To ensure reproducibility, we provide free access to our code.
翻译:我们聚焦于通过视觉观察进行模仿学习的问题,其中学习智能体仅能获取专家视频作为学习来源。该框架的挑战包括专家动作缺失以及环境的局部可观测性——真实状态仅能从像素中推断。为解决这一问题,我们首先对部分可观测环境中的模仿学习进行理论分析,建立了专家与智能体潜状态转移分布之间的散度与学习智能体次优性上界的关联。基于该分析,我们提出一种名为"潜对抗性观察模仿"的算法,该算法将离策略对抗模仿技术与从观测序列中学习到的智能体状态潜表示相结合。在高维连续机器人任务的实验中,我们证明该算法在达到最先进性能的同时具有显著的计算优势。此外,我们还展示了如何通过利用专家视频提升基于像素的强化学习效率。为确保可重复性,我们开放了代码的免费访问权限。