Humanoid robots promise general-purpose assistance, yet real-world humanoid loco-manipulation remains challenging because it requires whole-body stability, end-effector dexterity, and contact-aware interaction under frequent contact changes. In this work, we study dexterous, contact-rich humanoid loco-manipulation. We first develop an RL-based lower-body controller that serves as the stability backbone for whole-body execution during complex manipulation. Built on this controller, we develop a VR-based whole-body humanoid data collection system that integrates dexterous hands and tactile sensing for contact-rich manipulation. We then propose Humanoid Transformer with Touch Dreaming (HTD), a multimodal encoder--decoder Transformer that models touch as a core modality alongside multi-view vision and proprioception. HTD is trained in a single stage with behavioral cloning augmented by touch dreaming: in addition to predicting action chunks, the policy predicts future hand-joint forces and future tactile latents, with tactile-latent targets provided by an exponential moving average target encoder without requiring a separate tactile pretraining stage. This encourages the policy to learn contact-aware representations for dexterous manipulation. Across five real-world contact-rich tasks, HTD achieves a 90.9% relative improvement in average success rate over the stronger baseline. Ablation results further show that latent-space tactile prediction is more effective than raw tactile prediction, yielding a 30% relative gain in success rate. These results demonstrate that our touch-dreaming-enhanced learning system enables versatile, high-dexterity humanoid manipulation in the real world. More information and open-source materials are available at: humanoid-touch-dream.github.io.
翻译:人形机器人有望实现通用辅助功能,但现实世界中的人形移动操作仍具挑战性,因其需要全身稳定性、末端执行器灵巧性以及在频繁接触变化下的接触感知交互。本研究聚焦于灵巧且富含接触的人形移动操作。我们首先开发了一个基于强化学习的下半身控制器,作为复杂操作中全身执行的稳定性支柱。在此控制器基础上,我们构建了一套集成灵巧手和触觉传感的虚拟现实全身人形数据采集系统,用于富含接触的操作任务。随后,我们提出具有触觉梦境能力的人形Transformer(HTD),这是一种多模态编码器-解码器Transformer架构,将触觉作为与多视角视觉和本体感觉并列的核心模态。HTD通过行为克隆与触觉梦境增强进行单阶段训练:该策略不仅预测动作片段,还预测未来手部关节力及未来触觉潜在变量,其触觉潜在目标由指数移动平均目标编码器提供,无需单独的触觉预训练阶段。这促使策略学习适用于灵巧操作的接触感知表征。在五项真实世界富含接触的任务中,HTD相较于强基线实现了平均成功率90.9%的相对提升。消融实验进一步表明,潜在空间触觉预测优于原始触觉预测,带来30%的相对成功率增益。这些结果表明,我们的触觉梦境增强学习系统能够在现实世界中实现多用途、高灵巧性的人形机器人操作。更多信息及开源材料请访问:humanoid-touch-dream.github.io。