In this work, we focus on addressing the long-horizon manipulation tasks in densely cluttered scenes. Such tasks require policies to effectively manage severe occlusions among objects and continually produce actions based on visual observations. We propose a vision-based Hierarchical policy for Cluttered-scene Long-horizon Manipulation (HCLM). It employs a high-level policy and three options to select and instantiate three parameterized action primitives: push, pick, and place. We first train the pick and place options by behavior cloning (BC). Subsequently, we use hierarchical reinforcement learning (HRL) to train the high-level policy and push option. During HRL, we propose a Spatially Extended Q-update (SEQ) to augment the updates for the push option and a Two-Stage Update Scheme (TSUS) to alleviate the non-stationary transition problem in updating the high-level policy. We demonstrate that HCLM significantly outperforms baseline methods in terms of success rate and efficiency in diverse tasks. We also highlight our method's ability to generalize to more cluttered environments with more additional blocks.
翻译:本工作聚焦于解决密集杂乱场景中的长时域操作任务。此类任务要求策略有效管理物体间的严重遮挡,并基于视觉观测持续生成动作。我们提出了一种基于视觉的分层策略——密集杂乱场景长时域操作(HCLM)。该策略采用高层策略与三种选项,用于选择并实例化三类参数化动作基元:推、抓取与放置。首先通过行为克隆(BC)训练抓取与放置选项,随后利用分层强化学习(HRL)训练高层策略与推选项。在HRL过程中,我们提出空间扩展Q更新(SEQ)以增强推选项的更新效果,并设计两阶段更新方案(TSUS)缓解高层策略更新中的非平稳转移问题。实验表明,HCLM在多种任务的成功率与效率上显著优于基线方法。此外,我们证明了该方法对包含更多附加障碍物的高密集环境具有良好泛化能力。