There is a growing interest in developing automated agents that can work alongside humans. In addition to completing the assigned task, such an agent will undoubtedly be expected to behave in a manner that is preferred by the human. This requires the human to communicate their preferences to the agent. To achieve this, the current approaches either require the users to specify the reward function or the preference is interactively learned from queries that ask the user to compare behavior. The former approach can be challenging if the internal representation used by the agent is inscrutable to the human while the latter is unnecessarily cumbersome for the user if their preference can be specified more easily in symbolic terms. In this work, we propose PRESCA (PREference Specification through Concept Acquisition), a system that allows users to specify their preferences in terms of concepts that they understand. PRESCA maintains a set of such concepts in a shared vocabulary. If the relevant concept is not in the shared vocabulary, then it is learned. To make learning a new concept more feedback efficient, PRESCA leverages causal associations between the target concept and concepts that are already known. In addition, we use a novel data augmentation approach to further reduce required feedback. We evaluate PRESCA by using it on a Minecraft environment and show that it can effectively align the agent with the user's preference.
翻译:随着自主智能体与人类协作的需求日益增长,除完成指定任务外,此类智能体无疑还需以符合人类偏好的方式行动。这要求人类能够向智能体传达其偏好。当前方法要么需要用户指定奖励函数,要么通过要求用户比较行为表现的查询来交互式学习偏好。前者在智能体内部表征对人类而言难以理解时存在挑战,后者若用户偏好可通过符号化方式更简便地指定,则显得过于繁琐。本文提出PRESCA(通过概念获取实现偏好指定)系统,允许用户以其理解的概念形式指定偏好。PRESCA在共享词汇表中维护一组此类概念。若相关概念不在共享词汇表中,则进行学习。为使新概念学习更具反馈效率,PRESCA利用目标概念与已知概念之间的因果关联。此外,我们采用新颖的数据增强方法进一步减少所需反馈。通过在Minecraft环境中的评估,我们证明PRESCA能有效使智能体行为与用户偏好对齐。