CLOT: Closed-Loop Global Motion Tracking for Whole-Body Humanoid Teleoperation

Long-horizon whole-body humanoid teleoperation remains challenging due to accumulated global pose drift, particularly on full-sized humanoids. Although recent learning-based tracking methods enable agile and coordinated motions, they typically operate in the robot's local frame and neglect global pose feedback, leading to drift and instability during extended execution. In this work, we present CLOT, a real-time whole-body humanoid teleoperation system that achieves closed-loop global motion tracking via high-frequency localization feedback. CLOT synchronizes operator and robot poses in a closed loop, enabling drift-free human-to-humanoid mimicry over long timehorizons. However, directly imposing global tracking rewards in reinforcement learning, often results in aggressive and brittle corrections. To address this, we propose a data-driven randomization strategy that decouples observation trajectories from reward evaluation, enabling smooth and stable global corrections. We further regularize the policy with an adversarial motion prior to suppress unnatural behaviors. To support CLOT, we collect 20 hours of carefully curated human motion data for training the humanoid teleoperation policy. We design a transformer-based policy and train it for over 1300 GPU hours. The policy is deployed on a full-sized humanoid with 31 DoF (excluding hands). Both simulation and real-world experiments verify high-dynamic motion, high-precision tracking, and strong robustness in sim-to-real humanoid teleoperation. Motion data, demos and code can be found in our website.

翻译：长期全身人形机器人遥操作由于累积的全局位姿漂移而仍然具有挑战性，尤其是在全尺寸人形机器人上。尽管近期基于学习的跟踪方法能够实现敏捷协调的运动，但它们通常在机器人的局部坐标系中运行，忽略了全局位姿反馈，导致长时间执行时出现漂移和不稳定。在本工作中，我们提出了CLOT，一种实时全身人形机器人遥操作系统，通过高频定位反馈实现闭环全局运动跟踪。CLOT在闭环中同步操作者与机器人的位姿，实现了长时间无漂移的人对人形模仿。然而，在强化学习中直接施加全局跟踪奖励通常会导致激进且脆弱的修正。为解决此问题，我们提出了一种数据驱动的随机化策略，将观测轨迹与奖励评估解耦，从而实现平滑稳定的全局修正。我们进一步通过对抗性运动先验对策略进行正则化，以抑制不自然行为。为支持CLOT，我们收集了20小时精心策划的人体运动数据，用于训练人形机器人遥操作策略。我们设计了一种基于Transformer的策略，并进行了超过1300 GPU小时的训练。该策略部署在一个具有31个自由度（不包括手部）的全尺寸人形机器人上。仿真和真实世界实验均验证了在仿真到现实的人形机器人遥操作中，系统具备高动态运动、高精度跟踪和强鲁棒性。运动数据、演示和代码可在我们的网站上获取。