Reinforcement Learning (RL) has shown promising results learning policies for complex tasks, but can often suffer from low sample efficiency and limited transfer. We introduce the Hierarchy of Interaction Skills (HIntS) algorithm, which uses learned interaction detectors to discover and train a hierarchy of skills that manipulate factors in factored environments. Inspired by Granger causality, these unsupervised detectors capture key events between factors to sample efficiently learn useful skills and transfer those skills to other related tasks -- tasks where many reinforcement learning techniques struggle. We evaluate HIntS on a robotic pushing task with obstacles -- a challenging domain where other RL and HRL methods fall short. The learned skills not only demonstrate transfer using variants of Breakout, a common RL benchmark, but also show 2-3x improvement in both sample efficiency and final performance compared to comparable RL baselines. Together, HIntS demonstrates a proof of concept for using Granger-causal relationships for skill discovery.
翻译:强化学习在解决复杂任务的策略学习中展现出可喜成果,但常面临样本效率低下和迁移能力受限的问题。我们提出交互技能层次(HIntS)算法,该算法利用学得的交互检测器来发现并训练能操控因子化环境中各因子的分层技能。受格兰杰因果启发的无监督检测器可捕捉因子间的关键事件,从而高效采样地学习有用技能,并将这些技能迁移至其他相关任务——这些任务正是许多强化学习技术难以应对的领域。我们在带障碍物的机器人推箱子任务上评估HIntS——这是其他RL与HRL方法难以胜任的挑战性领域。所学技能不仅在使用经典RL基准测试Breakout变体时展现出迁移能力,更在样本效率和最终性能上相较同类RL基线模型实现2-3倍提升。综上,HIntS验证了利用格兰杰因果关系进行技能发现的概念可行性。