Humanoid robots that autonomously interact with physical environments over extended horizons represent a central goal of embodied intelligence. Existing approaches rely on reference motions or task-specific rewards, tightly coupling policies to particular object geometries and precluding multi-skill generalization within a single framework. A unified interaction representation enabling reference-free inference, geometric generalization, and long-horizon skill composition within one policy remains an open challenge. Here we show that Distance Field (DF) provides such a representation: LessMimic conditions a single whole-body policy on DF-derived geometric cues--surface distances, gradients, and velocity decompositions--removing the need for motion references, with interaction latents encoded via a Variational Auto-Encoder (VAE) and post-trained using Adversarial Interaction Priors (AIP) under Reinforcement Learning (RL). Through DAgger-style distillation that aligns DF latents with egocentric depth features, LessMimic further transfers seamlessly to vision-only deployment without motion capture (MoCap) infrastructure. A single LessMimic policy achieves 80--100% success across object scales from 0.4x to 1.6x on PickUp and SitStand where baselines degrade sharply, attains 62.1% success on 5 task instances trajectories, and remains viable up to 40 sequentially composed tasks. By grounding interaction in local geometry rather than demonstrations, LessMimic offers a scalable path toward humanoid robots that generalize, compose skills, and recover from failures in unstructured environments.
翻译:能够在物理环境中进行长时程自主交互的人形机器人,是具身智能的核心目标之一。现有方法依赖于参考运动或任务特定奖励,将策略与特定物体几何紧密耦合,阻碍了单一框架内的多技能泛化。一个统一的交互表示,能够在一个策略内实现无参考推理、几何泛化和长时程技能组合,仍然是一个开放的挑战。本文证明距离场提供了这样一种表示:LessMimic 将单一的全身策略建立在 DF 衍生的几何线索上——表面距离、梯度和速度分解——从而无需运动参考;交互潜变量通过变分自编码器编码,并在强化学习下使用对抗性交互先验进行后训练。通过将 DF 潜变量与以自我为中心的深度特征对齐的 DAgger 式蒸馏,LessMimic 进一步无缝迁移到仅依赖视觉的部署,无需运动捕捉基础设施。单一的 LessMimic 策略在 PickUp 和 SitStand 任务上,对于 0.4 倍至 1.6 倍尺度变化的物体,成功率达到了 80–100%,而基线方法性能急剧下降;在 5 个任务实例轨迹上取得了 62.1% 的成功率,并且在多达 40 个顺序组合的任务中依然有效。通过将交互建立在局部几何而非演示的基础上,LessMimic 为构建能够在非结构化环境中泛化、组合技能并从失败中恢复的人形机器人,提供了一条可扩展的路径。