We present a fully autonomous real-world RL framework for mobile manipulation that can learn policies without extensive instrumentation or human supervision. This is enabled by 1) task-relevant autonomy, which guides exploration towards object interactions and prevents stagnation near goal states, 2) efficient policy learning by leveraging basic task knowledge in behavior priors, and 3) formulating generic rewards that combine human-interpretable semantic information with low-level, fine-grained observations. We demonstrate that our approach allows Spot robots to continually improve their performance on a set of four challenging mobile manipulation tasks, obtaining an average success rate of 80% across tasks, a 3-4 improvement over existing approaches. Videos can be found at https://continual-mobile-manip.github.io/
翻译:我们提出了一种完全自主的真实世界强化学习框架,用于移动操作任务,该框架无需大量仪器配置或人工监督即可学习策略。这一能力得益于以下三点:1) 任务相关自主性,引导探索朝向物体交互,并防止在目标状态附近停滞;2) 通过在行为先验中利用基本任务知识实现高效策略学习;3) 构建通用奖励函数,将人类可解释的语义信息与底层细粒度观测相结合。我们证明,该方法使得Spot机器人能够在一组四个具有挑战性的移动操作任务上持续提升性能,平均任务成功率达到了80%,相比现有方法提升了3-4倍。相关视频可在 https://continual-mobile-manip.github.io/ 查看。