Imitation learning considerably simplifies policy synthesis compared to alternative approaches by exploiting access to expert demonstrations. For such imitation policies, errors away from the training samples are particularly critical. Even rare slip-ups in the policy action outputs can compound quickly over time, since they lead to unfamiliar future states where the policy is still more likely to err, eventually causing task failures. We revisit simple supervised ``behavior cloning'' for conveniently training the policy from nothing more than pre-recorded demonstrations, but carefully design the model class to counter the compounding error phenomenon. Our ``memory-consistent neural network'' (MCNN) outputs are hard-constrained to stay within clearly specified permissible regions anchored to prototypical ``memory'' training samples. We provide a guaranteed upper bound for the sub-optimality gap induced by MCNN policies. Using MCNNs on 9 imitation learning tasks, with MLP, Transformer, and Diffusion backbones, spanning dexterous robotic manipulation and driving, proprioceptive inputs and visual inputs, and varying sizes and types of demonstration data, we find large and consistent gains in performance, validating that MCNNs are better-suited than vanilla deep neural networks for imitation learning applications. Website: https://sites.google.com/view/mcnn-imitation
翻译:模仿学习通过利用专家示范数据,相比其他方法显著简化了策略合成过程。对于此类模仿策略而言,偏离训练样本的误差尤为关键。即使是策略动作输出中的微小失误,也可能随时间快速累积——因为这些失误会导致策略更易出错的未知未来状态,最终引发任务失败。我们重新审视简单的监督式"行为克隆"方法,仅依靠预录示范即可便捷地训练策略,但通过精心设计模型类别来抑制误差累积现象。我们的"记忆一致性神经网络"(MCNN)输出受到硬约束,始终保持在以原型"记忆"训练样本锚定的明确允许区域内。我们为MCNN策略引发的次优性差距提供了严格上界保证。在涵盖灵巧机器人操作与驾驶、本体感觉与视觉输入、不同规模与类型示范数据的9个模仿学习任务中,采用MLP、Transformer和扩散骨干网络的MCNN均展现出显著且一致的性能提升,验证了MCNN相较于传统深度神经网络更适用于模仿学习应用。网站:https://sites.google.com/view/mcnn-imitation