Imitation Learning (IL) is a sample efficient paradigm for robot learning using expert demonstrations. However, policies learned through IL suffer from state distribution shift at test time, due to compounding errors in action prediction which lead to previously unseen states. Choosing an action representation for the policy that minimizes this distribution shift is critical in imitation learning. Prior work propose using temporal action abstractions to reduce compounding errors, but they often sacrifice policy dexterity or require domain-specific knowledge. To address these trade-offs, we introduce HYDRA, a method that leverages a hybrid action space with two levels of action abstractions: sparse high-level waypoints and dense low-level actions. HYDRA dynamically switches between action abstractions at test time to enable both coarse and fine-grained control of a robot. In addition, HYDRA employs action relabeling to increase the consistency of actions in the dataset, further reducing distribution shift. HYDRA outperforms prior imitation learning methods by 30-40% on seven challenging simulation and real world environments, involving long-horizon tasks in the real world like making coffee and toasting bread. Videos are found on our website: https://tinyurl.com/3mc6793z
翻译:模仿学习是一种利用专家演示实现机器人高效样本学习的范式。然而,通过模仿学习得到的策略在测试时会出现状态分布偏移问题——这是由于动作预测中的累积误差导致机器人进入未见过的状态。选择能在策略中最小化这种分布偏移的动作表征,对模仿学习至关重要。先前研究提出使用时间动作抽象来减少累积误差,但往往以牺牲策略灵活性或依赖领域知识为代价。为解决这些权衡问题,我们提出HYDRA方法,该方法利用具有两层动作抽象的混合动作空间:稀疏的高层级路点与密集的低层级动作。HYDRA在测试时动态切换动作抽象层级,使机器人既能实现粗粒度控制也能实现细粒度控制。此外,HYDRA采用动作重标记技术增强数据集中动作的一致性,进一步减少分布偏移。在七个具有挑战性的仿真与真实世界环境中(涉及现实世界长周期任务,如煮咖啡和烤面包),HYDRA相较先前模仿学习方法实现了30-40%的性能提升。相关视频见项目网站:https://tinyurl.com/3mc6793z