While reinforcement learning (RL) has achieved great success in acquiring complex skills solely from environmental interactions, it assumes that resets to the initial state are readily available at the end of each episode. Such an assumption hinders the autonomous learning of embodied agents due to the time-consuming and cumbersome workarounds for resetting in the physical world. Hence, there has been a growing interest in autonomous RL (ARL) methods that are capable of learning from non-episodic interactions. However, existing works on ARL are limited by their reliance on prior data and are unable to learn in environments where task-relevant interactions are sparse. In contrast, we propose a demonstration-free ARL algorithm via Implicit and Bi-directional Curriculum (IBC). With an auxiliary agent that is conditionally activated upon learning progress and a bidirectional goal curriculum based on optimal transport, our method outperforms previous methods, even the ones that leverage demonstrations.
翻译:尽管强化学习(RL)在仅通过环境交互获取复杂技能方面取得了巨大成功,但其假设每个回合结束时均可直接重置至初始状态。这一假设阻碍了具身智能体的自主学习,因为物理世界中的重置过程需要耗时且繁琐的变通方案。因此,能够从非回合制交互中学习的自主强化学习方法正日益受到关注。然而,现有自主强化学习研究受限于对先验数据的依赖,且无法在任务相关交互稀疏的环境中学习。为此,我们提出一种无需示范的自主强化学习算法——隐式双向课程(IBC)。该方法通过一个根据学习进度条件性激活的辅助智能体,以及基于最优传输的双向目标课程,在性能上超越了以往方法,甚至包括那些利用示范的方法。