Some imitation learning methods combine behavioural cloning with self-supervision to infer actions from state pairs. However, most rely on a large number of expert trajectories to increase generalisation and human intervention to capture key aspects of the problem, such as domain constraints. In this paper, we propose Continuous Imitation Learning from Observation (CILO), a new method augmenting imitation learning with two important features: (i) exploration, allowing for more diverse state transitions, requiring less expert trajectories and resulting in fewer training iterations; and (ii) path signatures, allowing for automatic encoding of constraints, through the creation of non-parametric representations of agents and expert trajectories. We compared CILO with a baseline and two leading imitation learning methods in five environments. It had the best overall performance of all methods in all environments, outperforming the expert in two of them.
翻译:一些模仿学习方法将行为克隆与自监督相结合,从状态对中推断动作。然而,大多数方法依赖大量专家轨迹来提升泛化能力,并需要人工干预来捕捉问题的关键方面,例如领域约束。本文提出基于观察的连续模仿学习方法,这是一种增强模仿学习的新方法,具备两个重要特性:(i)探索性,允许更丰富的状态转移,减少对专家轨迹的依赖并降低训练迭代次数;(ii)路径签名,通过构建智能体与专家轨迹的非参数化表示,实现约束的自动编码。我们在五种环境中将CILO与一个基线方法和两种领先的模仿学习方法进行比较。在所有环境中,CILO均表现出最佳的综合性能,并在其中两种环境中超越了专家表现。