Distilling humanoid locomotion control from offline datasets into deployable policies remains a challenge, as existing methods rely on privileged full-body states that require complex and often unreliable state estimation. We present Sensor-Conditioned Diffusion Policies (SCDP) that enables humanoid locomotion using only onboard sensors, eliminating the need for explicit state estimation. SCDP decouples sensing from supervision through mixed-observation training: diffusion model conditions on sensor histories while being supervised to predict privileged future state-action trajectories, enforcing the model to infer the motion dynamics under partial observability. We further develop restricted denoising, context distribution alignment, and context-aware attention masking to encourage implicit state estimation within the model and to prevent train-deploy mismatch. We validate SCDP on velocity-commanded locomotion and motion reference tracking tasks. In simulation, SCDP achieves near-perfect success on velocity control (99-100%) and 93% tracking success in AMASS test set, performing comparable to privileged baselines while using only onboard sensors. Finally, we deploy the trained policy on a real G1 humanoid at 50 Hz, demonstrating robust real robot locomotion without external sensing or state estimation.
翻译:将人形机器人步态控制从离线数据集蒸馏至可部署策略仍具挑战,现有方法依赖需要复杂且往往不可靠状态估计的全局特权状态。我们提出传感器条件扩散策略(SCDP),仅利用机载传感器即可实现人形机器人步态控制,无需显式状态估计。SCDP通过混合观测训练解耦感知与监督:扩散模型以传感器历史为条件,同时被监督预测特权未来状态-动作轨迹,迫使模型在部分可观测性下推断运动动力学。我们进一步开发了受限去噪、上下文分布对齐和上下文感知注意力掩码机制,以促进模型内部的隐式状态估计并防止训练-部署失配。我们在速度指令步态与运动参考跟踪任务上验证SCDP。在仿真中,SCDP实现速度控制近乎完美的成功率(99-100%),在AMASS测试集上达到93%的跟踪成功率,其性能与使用全局特权状态的基线方法相当,却仅使用机载传感器。最终,我们将训练策略以50Hz频率部署于真实G1人形机器人,展示了无需外部传感或状态估计的鲁棒真实机器人步态控制。