While reinforcement learning (RL) has achieved great success in acquiring complex skills solely from environmental interactions, it assumes that resets to the initial state are readily available at the end of each episode. Such an assumption hinders the autonomous learning of embodied agents due to the time-consuming and cumbersome workarounds for resetting in the physical world. Hence, there has been a growing interest in autonomous RL (ARL) methods that are capable of learning from non-episodic interactions. However, existing works on ARL are limited by their reliance on prior data and are unable to learn in environments where task-relevant interactions are sparse. In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC). With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods, even the ones that leverage demonstrations.
翻译:尽管强化学习(RL)仅通过与环境的交互就能在获取复杂技能方面取得巨大成功,但它假设每个回合结束时可以轻松重置到初始状态。这一假设阻碍了具身智能体的自主学习,因为在物理世界中重置需要耗时且繁琐的替代方案。因此,能够从非回合制交互中学习的自主强化学习(ARL)方法日益受到关注。然而,现有ARL工作受限于对先前数据的依赖,且无法在任务相关交互稀疏的环境中学习。相比之下,我们提出了一种基于隐式双向课程(IBC)的无需演示的ARL算法。通过一个根据学习进度条件性激活的辅助智能体与基于最优传输的双向目标课程,我们的方法超越了以往方法,甚至包括那些利用演示的方法。