Training language-conditioned whole-body controllers for humanoid robots demands large-scale motion-language datasets. Existing approaches based on motion capture are costly and limited in diversity, while text-to-motion generative models produce purely kinematic outputs that are not guaranteed to be physically feasible. We present CLAW, a pipeline for scalable generation of language-annotated whole-body motion data for the Unitree G1 humanoid robot. CLAW composes motion primitives from a kinematic planner, parameterized by movement, heading, speed, pelvis height, and duration, and provides two browser-based interfaces--a real-time keyboard mode and a timeline-based sequence editor--for exploratory and batch data collection. A low-level controller tracks these references in MuJoCo simulation, yielding physically grounded trajectories. In parallel, a template-based engine generates diverse natural-language annotations at both segment and trajectory levels. To support scalable generation of motion-language paired data for humanoid robot learning, we make our system publicly available at: https://github.com/JianuoCao/CLAW
翻译:训练人形机器人的语言条件全身控制器需要大规模的运动-语言数据集。现有基于动作捕捉的方法成本高昂且多样性有限,而文本到运动生成模型产生的纯运动学输出无法保证物理可行性。我们提出CLAW,一个用于Unitree G1人形机器人可扩展生成语言标注全身运动数据的流水线。CLAW通过运动学规划器组合运动原语,以运动方向、朝向、速度、骨盆高度和持续时间为参数,并提供两种基于浏览器的界面——实时键盘模式和基于时间线的序列编辑器——用于探索性和批量数据收集。一个低级控制器在MuJoCo仿真中跟踪这些参考,生成物理可实现的轨迹。同时,一个基于模板的引擎在片段和轨迹两个层级生成多样化的自然语言标注。为支持人形机器人学习中可扩展生成运动-语言配对数据,我们将系统公开在:https://github.com/JianuoCao/CLAW