When demonstrating a task, human tutors pedagogically modify their behavior by either "showing" the task rather than just "doing" it (exaggerating on relevant parts of the demonstration) or by giving demonstrations that best disambiguate the communicated goal. Analogously, human learners pragmatically infer the communicative intent of the tutor: they interpret what the tutor is trying to teach them and deduce relevant information for learning. Without such mechanisms, traditional Learning from Demonstration (LfD) algorithms will consider such demonstrations as sub-optimal. In this paper, we investigate the implementation of such mechanisms in a tutor-learner setup where both participants are artificial agents in an environment with multiple goals. Using pedagogy from the tutor and pragmatism from the learner, we show substantial improvements over standard learning from demonstrations.
翻译:在演示任务时,人类导师会通过“展示”任务而非仅仅是“执行”任务(在演示的相关部分进行夸张),或通过提供最能消除所传达目标歧义的演示,来教学性地调整自身行为。同理,人类学习者会语用地推断导师的沟通意图:他们解读导师试图传授的内容,并推导出用于学习的相关信息。若缺乏此类机制,传统的从演示中学习算法会将此类演示视为次优。本文探讨了在导师-学习者设置中实现此类机制的方式,该设置中双方均为多目标环境下的人造智能体。通过采用导师的教学性与学习者的语用性,我们展示了相较于标准从演示中学习的显著改进。