Visual imitation learning enables reinforcement learning agents to learn to behave from expert visual demonstrations such as videos or image sequences, without explicit, well-defined rewards. Previous research either adopted supervised learning techniques or induce simple and coarse scalar rewards from pixels, neglecting the dense information contained in the image demonstrations. In this work, we propose to measure the expertise of various local regions of image samples, or called \textit{patches}, and recover multi-dimensional \textit{patch rewards} accordingly. Patch reward is a more precise rewarding characterization that serves as a fine-grained expertise measurement and visual explainability tool. Specifically, we present Adversarial Imitation Learning with Patch Rewards (PatchAIL), which employs a patch-based discriminator to measure the expertise of different local parts from given images and provide patch rewards. The patch-based knowledge is also used to regularize the aggregated reward and stabilize the training. We evaluate our method on DeepMind Control Suite and Atari tasks. The experiment results have demonstrated that PatchAIL outperforms baseline methods and provides valuable interpretations for visual demonstrations.
翻译:视觉模仿学习使强化学习智能体能够从专家视觉示范(如视频或图像序列)中学习行为,而无需显式且定义明确的奖励。以往的研究要么采用监督学习技术,要么从像素中推导出简单粗糙的标量奖励,忽视了图像示范中包含的密集信息。在这项工作中,我们提出衡量图像样本中不同局部区域的专家水平(称为“补丁”),并据此恢复多维度的“补丁奖励”。补丁奖励是一种更为精确的奖励表征,可作为细粒度的专家水平度量工具和视觉可解释性工具。具体而言,我们提出了基于补丁奖励的对抗模仿学习(PatchAIL),该方法采用基于补丁的判别器来衡量给定图像中不同局部区域的专家水平,并提供补丁奖励。同时,补丁知识还被用于规范化聚合奖励并稳定训练过程。我们在DeepMind控制套件和Atari任务上评估了该方法。实验结果表明,PatchAIL优于基线方法,并为视觉示范提供了有价值的解释。