We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill. We find that naively reusing the same pretrained WBC can reduce robustness over long horizons, as new skills and their compositions induce shifted state and command distributions. We address this with a simple data aggregation procedure that augments shared-WBC training with rollouts from closed-loop skill execution under domain randomization. To evaluate the approach, we introduce Humanoid Hanoi, a long-horizon Tower-of-Hanoi box rearrangement benchmark, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines. Project page: https://osudrl.github.io/Humanoid_Hanoi/
翻译:本研究提出一种基于技能的人形机器人箱体重排框架,通过在任务层面编排可复用技能实现长时域操作。在我们的架构中,所有技能均通过共享的、任务无关的全身控制器执行,为技能组合提供统一的闭环接口,这与为每个技能单独设计底层控制器的非共享方案形成对比。我们发现,直接复用预训练的共享全身控制器会降低长时域操作的鲁棒性,因为新技能及其组合会导致状态与指令分布发生偏移。为此,我们提出一种简单的数据聚合方法,通过在域随机化条件下采集闭环技能执行的轨迹数据来增强共享全身控制器的训练。为评估该方法,我们提出了"人形汉诺塔"——一个长时域汉诺塔式箱体重排基准测试,并在仿真和Digit V3人形机器人平台上进行实验验证。结果表明,该方法能实现完全自主的长时域重排任务,并通过量化对比验证了共享全身控制器方案相较于非共享基线方法的优势。项目页面:https://osudrl.github.io/Humanoid_Hanoi/