We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill. We find that naively reusing the same pretrained WBC can reduce robustness over long horizons, as new skills and their compositions induce shifted state and command distributions. We address this with a simple data aggregation procedure that augments shared-WBC training with rollouts from closed-loop skill execution under domain randomization. To evaluate the approach, we introduce \emph{Humanoid Hanoi}, a long-horizon Tower-of-Hanoi box rearrangement benchmark, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines.
翻译:本研究提出一种基于技能的人形机器人箱子重排框架,通过在任务层面序列化可复用技能实现长时程任务执行。在我们的架构中,所有技能均通过共享的、任务无关的全身控制器(WBC)执行,为技能组合提供统一的闭环接口,这与为每个技能单独设计底层控制器的非共享架构形成对比。研究发现,简单复用预训练的共享WBC会降低长时程任务中的鲁棒性,因为新技能及其组合会导致状态与指令分布发生偏移。我们通过简单的数据聚合方法解决该问题:在领域随机化条件下,利用闭环技能执行的轨迹数据增强共享WBC的训练。为评估该方法,我们提出“人形汉诺塔”——一个长时程汉诺塔式箱子重排基准测试,并在仿真和Digit V3人形机器人上报告实验结果。该方法实现了长时程全自主重排任务,并通过与非共享基线方法的对比量化了共享WBC架构的优势。