Expressive robotic behavior is essential for the widespread acceptance of robots in social environments. Recent advancements in learned legged locomotion controllers have enabled more dynamic and versatile robot behaviors. However, determining the optimal behavior for interactions with different users across varied scenarios remains a challenge. Current methods either rely on natural language input, which is efficient but low-resolution, or learn from human preferences, which, although high-resolution, is sample inefficient. This paper introduces a novel approach that leverages priors generated by pre-trained LLMs alongside the precision of preference learning. Our method, termed Language-Guided Preference Learning (LGPL), uses LLMs to generate initial behavior samples, which are then refined through preference-based feedback to learn behaviors that closely align with human expectations. Our core insight is that LLMs can guide the sampling process for preference learning, leading to a substantial improvement in sample efficiency. We demonstrate that LGPL can quickly learn accurate and expressive behaviors with as few as four queries, outperforming both purely language-parameterized models and traditional preference learning approaches. Website with videos: https://lgpl-gaits.github.io/
翻译:富有表现力的机器人行为对于机器人在社交环境中被广泛接受至关重要。近期,学习型足式运动控制器的进展使得机器人行为更具动态性和多功能性。然而,确定在不同场景下与不同用户互动的最佳行为仍然是一个挑战。现有方法要么依赖于高效但分辨率较低的自然语言输入,要么从人类偏好中学习——这种方法虽然分辨率高,但样本效率低下。本文提出了一种新颖的方法,该方法结合了预训练大语言模型生成的先验知识与偏好学习的精确性。我们的方法称为语言引导偏好学习,它利用大语言模型生成初始行为样本,然后通过基于偏好的反馈进行精炼,以学习与人类期望高度一致的行为。我们的核心见解是,大语言模型可以引导偏好学习的采样过程,从而显著提高样本效率。我们证明,语言引导偏好学习能够仅用四次查询就快速学习到准确且富有表现力的行为,其性能优于纯语言参数化模型和传统的偏好学习方法。视频网站:https://lgpl-gaits.github.io/