In this work, we first formulate the problem of robotic water scooping using goal-conditioned reinforcement learning. This task is particularly challenging due to the complex dynamics of fluid and the need to achieve multi-modal goals. The policy is required to successfully reach both position goals and water amount goals, which leads to a large convoluted goal state space. To overcome these challenges, we introduce Goal Sampling Adaptation for Scooping (GOATS), a curriculum reinforcement learning method that can learn an effective and generalizable policy for robot scooping tasks. Specifically, we use a goal-factorized reward formulation and interpolate position goal distributions and amount goal distributions to create curriculum throughout the learning process. As a result, our proposed method can outperform the baselines in simulation and achieves 5.46% and 8.71% amount errors on bowl scooping and bucket scooping tasks, respectively, under 1000 variations of initial water states in the tank and a large goal state space. Besides being effective in simulation environments, our method can efficiently adapt to noisy real-robot water-scooping scenarios with diverse physical configurations and unseen settings, demonstrating superior efficacy and generalizability. The videos of this work are available on our project page: https://sites.google.com/view/goatscooping.
翻译:本文首先利用目标条件强化学习对机器人水中铲取问题进行建模。由于流体动力学的高度复杂性以及多模态目标的需求,该任务极具挑战性。策略需同时实现位置目标与水量目标,导致目标状态空间呈现高度非线性耦合。为克服上述困难,我们提出铲取目标采样自适应方法(GOATS)——一种课程强化学习方法,可学习铲取任务中高效且可泛化的策略。具体而言,我们采用目标分解式奖励函数,通过插值位置目标分布与水量目标分布构建训练课程。实验表明,在包含1000种初始水箱状态变化及大目标状态空间的仿真环境中,本方法在碗状铲取和桶状铲取任务中分别达到5.46%和8.71%的水量误差,其性能显著优于基线模型。除仿真有效性外,该方法还能高效适应真实机器人噪声场景中多样化物理构型与未观测环境,展现出卓越的效能与泛化能力。相关视频详见项目主页:https://sites.google.com/view/goatscooping。