In this work, we first formulate the problem of robotic water scooping using goal-conditioned reinforcement learning. This task is particularly challenging due to the complex dynamics of fluid and the need to achieve multi-modal goals. The policy is required to successfully reach both position goals and water amount goals, which leads to a large convoluted goal state space. To overcome these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum throughout the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently adapt to noisy real-robot water-scooping scenarios with diverse physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
翻译:本文首先利用目标条件强化学习对机器人舀水问题进行形式化建模。由于流体动力学复杂性及多模态目标需求,该任务极具挑战性。策略需同时达成位置目标与水量目标,导致目标状态空间高度复杂且耦合。为应对这些挑战,我们提出GOATS(目标采样自适应舀取方法),一种基于课程强化学习的方法,可学习高效且泛化的机器人舀取策略。具体而言,采用目标分解奖励函数设计,通过插值位置目标分布与水量目标分布构建学习课程。因此,所提方法在仿真中优于基线模型,在储水罐初始状态1000种变化及大规模目标状态空间条件下,碗舀取与桶舀取任务的水量误差分别降低至5.46%与8.71%。除仿真环境有效性外,该方法还能高效适应不同物理配置与未知场景的真实机器人舀水任务,展现出卓越的有效性与泛化能力。相关视频见项目页面:https://sites.google.com/view/goatscooping。