Humanoid robots are well suited for human habitats due to their morphological similarity, but developing controllers for them is a challenging task that involves multiple sub-problems, such as control, planning and perception. In this paper, we introduce a method to simplify controller design by enabling users to train and fine-tune robot control policies using natural language commands. We first learn a neural network policy that generates behaviors given a natural language command, such as "walk forward", by combining Large Language Models (LLMs), motion retargeting, and motion imitation. Based on the synthesized motion, we iteratively fine-tune by updating the text prompt and querying LLMs to find the best checkpoint associated with the closest motion in history. We validate our approach using a simulated Digit humanoid robot and demonstrate learning of diverse motions, such as walking, hopping, and kicking, without the burden of complex reward engineering. In addition, we show that our iterative refinement enables us to learn 3x times faster than a naive formulation that learns from scratch.
翻译:类人机器人因其形态相似性而适用于人类居住环境,但其控制器的开发涉及控制、规划与感知等多个子问题,极具挑战性。本文提出一种通过自然语言指令训练与微调机器人控制策略的方法,以简化控制器设计流程。我们首先结合大语言模型、运动重定向与运动模仿技术,学习一种能根据"向前行走"等自然语言指令生成行为的神经网络策略。基于合成运动,我们通过更新文本提示并查询大语言模型进行迭代微调,以寻找历史中最接近动作对应的最佳检查点。我们在仿真Digit类人机器人上验证了该方法,展示了行走、跳跃、踢腿等多样化运动的学习能力,且无需复杂的奖励函数设计。此外,实验表明迭代优化机制使学习速度比从头学习的朴素方法提升3倍。