We consider the problem of learning to perform a task from demonstrations given by teachers or experts, when some of the experts' demonstrations might be adversarial and demonstrate an incorrect way to perform the task. We propose a novel technique that can identify parts of demonstrated trajectories that have not been significantly modified by the adversary and utilize them for learning, using temporally extended policies or options. We first define a trajectory divergence measure based on the spatial and temporal features of demonstrated trajectories to detect and discard parts of the trajectories that have been significantly modified by an adversarial expert, and, could degrade the learner's performance, if used for learning, We then use an options-based algorithm that partitions trajectories and learns only from the parts of trajectories that have been determined as admissible. We provide theoretical results of our technique to show that repairing partial trajectories improves the sample efficiency of the demonstrations without degrading the learner's performance. We then evaluate the proposed algorithm for learning to play an Atari-like, computer-based game called LunarLander in the presence of different types and degrees of adversarial attacks of demonstrated trajectories. Our experimental results show that our technique can identify adversarially modified parts of the demonstrated trajectories and successfully prevent the learning performance from degrading due to adversarial demonstrations.
翻译:我们考虑在部分专家演示可能具有对抗性并展示错误任务执行方式的情况下,从教师或专家提供的演示中学习执行任务的问题。我们提出一种新颖技术,该技术能够识别演示轨迹中未被对手显著修改的部分,并利用时间扩展策略或选项进行学习。首先,我们基于演示轨迹的空间和时间特征定义一种轨迹发散度量,用于检测并丢弃被对抗性专家显著修改(若用于学习可能降低学习者性能)的轨迹部分;随后,我们采用基于选项的算法对轨迹进行划分,仅从被判定为可接纳的轨迹部分进行学习。我们提供了该技术的理论结果,表明修复部分轨迹能在不降低学习者性能的前提下提升演示的样本效率。最后,我们在存在不同类型和程度对抗攻击的演示轨迹条件下,评估所提算法在学习类似Atari的计算机游戏LunarLander时的表现。实验结果表明,我们的技术能识别演示轨迹中被对抗性修改的部分,并成功防止学习性能因对抗性演示而下降。