Approaches for teaching learning agents via human demonstrations have been widely studied and successfully applied to multiple domains. However, the majority of imitation learning work utilizes only behavioral information from the demonstrator, i.e. which actions were taken, and ignores other useful information. In particular, eye gaze information can give valuable insight towards where the demonstrator is allocating visual attention, and holds the potential to improve agent performance and generalization. In this work, we propose Gaze Regularized Imitation Learning (GRIL), a novel context-aware, imitation learning architecture that learns concurrently from both human demonstrations and eye gaze to solve tasks where visual attention provides important context. We apply GRIL to a visual navigation task, in which an unmanned quadrotor is trained to search for and navigate to a target vehicle in a photorealistic simulated environment. We show that GRIL outperforms several state-of-the-art gaze-based imitation learning algorithms, simultaneously learns to predict human visual attention, and generalizes to scenarios not present in the training data. Supplemental videos can be found at project https://sites.google.com/view/gaze-regularized-il/ and code at https://github.com/ravikt/gril.
翻译:通过人类示范来训练学习智能体的方法已被广泛研究并成功应用于多个领域。然而,大多数模仿学习工作仅利用示范者的行为信息(即执行了哪些动作),而忽略了其他有用信息。特别是,目光信息可以揭示示范者如何分配视觉注意力,并有望提升智能体性能与泛化能力。本文提出了一种新颖的上下文感知模仿学习架构——目光正则化模仿学习(GRIL),它同时从人类示范和目光中学习,以解决视觉注意力提供重要上下文的任务。我们将GRIL应用于视觉导航任务:在照片级逼真的模拟环境中训练无人四旋翼飞行器搜索并导航至目标车辆。实验表明,GRIL优于多种最先进的目光基模仿学习算法,能同时学习预测人类视觉注意力,并泛化至训练数据中未出现的情景。补充视频可见项目网站 https://sites.google.com/view/gaze-regularized-il/ ,代码见 https://github.com/ravikt/gril。