This paper introduces MotionGlot, a model that can generate motion across multiple embodiments with different action dimensions, such as quadruped robots and human bodies. By leveraging the well-established training procedures commonly used in large language models (LLMs), we introduce an instruction-tuning template specifically designed for motion-related tasks. Our approach demonstrates that the principles underlying LLM training can be successfully adapted to learn a wide range of motion generation tasks across multiple embodiments with different action dimensions. We demonstrate the various abilities of MotionGlot on a set of 6 tasks and report an average improvement of 35.3% across tasks. Additionally, we contribute two new datasets: (1) a dataset of expert-controlled quadruped locomotion with approximately 48,000 trajectories paired with direction-based text annotations, and (2) a dataset of over 23,000 situational text prompts for human motion generation tasks. Finally, we conduct hardware experiments to validate the capabilities of our system in real-world applications.
翻译:本文提出MotionGlot模型,该模型能够为具有不同动作维度的多种具身形态(如四足机器人与人体)生成运动。通过借鉴大型语言模型(LLMs)中成熟的训练流程,我们引入了一种专为运动相关任务设计的指令微调模板。我们的研究表明,LLM训练的基本原理可成功迁移至学习跨多种不同动作维度具身形态的广泛运动生成任务。我们在6项任务上展示了MotionGlot的多种能力,并报告了跨任务平均35.3%的性能提升。此外,我们贡献了两个新数据集:(1)包含约48,000条轨迹的专家控制四足运动数据集,每条轨迹均配有基于方向的文本标注;(2)包含超过23,000条情境文本提示的人类运动生成任务数据集。最后,我们通过硬件实验验证了该系统在真实场景中的应用能力。