Grasping moving objects is a challenging task that combines multiple submodules such as object pose predictor, arm motion planner, etc. Each submodule operates under its own set of meta-parameters. For example, how far the pose predictor should look into the future (i.e., look-ahead time) and the maximum amount of time the motion planner can spend planning a motion (i.e., time budget). Many previous works assign fixed values to these parameters either heuristically or through grid search; however, at different moments within a single episode of dynamic grasping, the optimal values should vary depending on the current scene. In this work, we learn a meta-controller through reinforcement learning to control the look-ahead time and time budget dynamically. Our extensive experiments show that the meta-controller improves the grasping success rate (up to 12% in the most cluttered environment) and reduces grasping time, compared to the strongest baseline. Our meta-controller learns to reason about the reachable workspace and maintain the predicted pose within the reachable region. In addition, it assigns a small but sufficient time budget for the motion planner. Our method can handle different target objects, trajectories, and obstacles. Despite being trained only with 3-6 randomly generated cuboidal obstacles, our meta-controller generalizes well to 7-9 obstacles and more realistic out-of-domain household setups with unseen obstacle shapes. Video is available at https://youtu.be/CwHq77wFQqI.
翻译:抓取移动物体是一项具有挑战性的任务,它结合了多个子模块,例如物体姿态预测器、手臂运动规划器等。每个子模块在其自身的一组元参数下运行。例如,姿态预测器应预测未来的时间长度(即前瞻时间),以及运动规划器可用于规划运动的最大时间量(即时间预算)。许多先前的工作通过启发式方法或网格搜索为这些参数分配固定值;然而,在动态抓取的单个情节内的不同时刻,最优值应根据当前场景而变化。在本工作中,我们通过强化学习学习一个元控制器,以动态控制前瞻时间和时间预算。我们的大量实验表明,与最强基线相比,该元控制器提高了抓取成功率(在最高杂乱环境中提升高达12%),并减少了抓取时间。我们的元控制器学会推理可达工作空间,并将预测姿态保持在可达区域内。此外,它为运动规划器分配了一个较小但足够的时间预算。我们的方法能够处理不同的目标物体、轨迹和障碍物。尽管仅使用3-6个随机生成的立方体障碍物进行训练,但我们的元控制器能很好地泛化到7-9个障碍物以及更真实的域外家庭设置中,其中包含未见过的障碍物形状。视频可在 https://youtu.be/CwHq77wFQqI 获取。