We investigate a skill-based framework for humanoid box rearrangement that enables long-horizon execution by sequencing reusable skills at the task level. In our architecture, all skills execute through a shared, task-agnostic whole-body controller (WBC), providing a consistent closed-loop interface for skill composition, in contrast to non-shared designs that use separate low-level controllers per skill. We find that naively reusing the same pretrained WBC can reduce robustness over long horizons, as new skills and their compositions induce shifted state and command distributions. We address this with a simple data aggregation procedure that augments shared-WBC training with rollouts from closed-loop skill execution under domain randomization. To evaluate the approach, we introduce \emph{Humanoid Hanoi}, a long-horizon Tower-of-Hanoi box rearrangement benchmark, and report results in simulation and on the Digit V3 humanoid robot, demonstrating fully autonomous rearrangement over extended horizons and quantifying the benefits of the shared-WBC approach over non-shared baselines.
翻译:本研究提出一种基于技能的类人箱体重排框架,通过任务层级可复用技能的序列化实现长时程任务执行。在我们的架构中,所有技能均通过共享的、任务无关的全身控制器(WBC)执行,为技能组合提供一致的闭环接口,这与为每个技能使用独立底层控制器的非共享设计形成对比。我们发现,简单复用预训练的WBC会降低长时程任务的鲁棒性,因为新技能及其组合会导致状态与指令分布发生偏移。我们通过简单的数据聚合方法解决该问题,即在领域随机化条件下利用闭环技能执行的轨迹数据增强共享WBC的训练。为评估该方法,我们提出《类人汉诺塔》——一个长时程汉诺塔式箱体重排基准测试,并在仿真和Digit V3类人机器人上报告实验结果,展示了该方法在长时程任务中实现全自主重排的能力,并量化了共享WBC方法相较于非共享基准方案的优势。