We consider the problem of learning to perform a task from demonstrations given by teachers or experts, when some of the experts' demonstrations might be adversarial and demonstrate an incorrect way to perform the task. We propose a novel technique that can identify parts of demonstrated trajectories that have not been significantly modified by the adversary and utilize them for learning, using temporally extended policies or options. We first define a trajectory divergence measure based on the spatial and temporal features of demonstrated trajectories to detect and discard parts of the trajectories that have been significantly modified by an adversarial expert, and, could degrade the learner's performance, if used for learning, We then use an options-based algorithm that partitions trajectories and learns only from the parts of trajectories that have been determined as admissible. We provide theoretical results of our technique to show that repairing partial trajectories improves the sample efficiency of the demonstrations without degrading the learner's performance. We then evaluate the proposed algorithm for learning to play an Atari-like, computer-based game called LunarLander in the presence of different types and degrees of adversarial attacks of demonstrated trajectories. Our experimental results show that our technique can identify adversarially modified parts of the demonstrated trajectories and successfully prevent the learning performance from degrading due to adversarial demonstrations.
翻译:我们考虑在部分专家示范可能具有对抗性、并展示错误任务执行方式时,如何从专家给定示范中学习执行任务的问题。我们提出一种新颖技术,能够识别示范轨迹中未被对手显著修改的部分,并利用时间扩展策略或选项进行学习。首先,基于示范轨迹的空间和时间特征定义轨迹偏离度量,以检测并丢弃被对抗性专家显著修改的部分(这些部分若用于学习可能降低学习者性能)。随后,采用基于选项的算法,将轨迹分割并仅从被判定为可接受的轨迹部分进行学习。我们提供了该技术的理论结果,证明修复部分轨迹能在不降低学习者性能的前提下提升示范的样本效率。最后,在存在不同类型和程度的对抗攻击示范轨迹条件下,评估所提算法在类似Atari的电脑游戏LunarLander中的学习表现。实验结果表明,我们的技术能够识别示范轨迹中被对抗性修改的部分,并成功阻止因对抗性示范导致的学习性能下降。