Language-conditioned robot manipulation is an emerging field aimed at enabling seamless communication and cooperation between humans and robotic agents by teaching robots to comprehend and execute instructions conveyed in natural language. This interdisciplinary area integrates scene understanding, language processing, and policy learning to bridge the gap between human instructions and robotic actions. In this comprehensive survey, we systematically explore recent advancements in language-conditioned robotic manipulation. We categorize existing methods into language-conditioned reward shaping, language-conditioned policy learning, neuro-symbolic artificial intelligence, and the utilization of foundational models (FMs) such as large language models (LLMs) and vision-language models (VLMs). Specifically, we analyze state-of-the-art techniques concerning semantic information extraction, environment and evaluation, auxiliary tasks, and task representation strategies. By conducting a comparative analysis, we highlight the strengths and limitations of current approaches in bridging language instructions with robot actions. Finally, we discuss open challenges and future research directions, focusing on potentially enhancing generalization capabilities and addressing safety issues in language-conditioned robot manipulators. The GitHub repository of this paper can be found at https://github.com/hk-zh/language-conditioned-robot-manipulation-models.
翻译:语言条件化机器人操作是一个新兴领域,旨在通过教导机器人理解并执行以自然语言传达的指令,实现人类与机器人代理之间的无缝通信与合作。这一跨学科领域融合了场景理解、语言处理与策略学习,以弥合人类指令与机器人行动之间的鸿沟。在本篇全面综述中,我们系统性地探讨了语言条件化机器人操作的最新进展。我们将现有方法归类为语言条件化奖励塑形、语言条件化策略学习、神经符号人工智能,以及利用基础模型(FMs)——如大语言模型(LLMs)和视觉语言模型(VLMs)——的方法。具体而言,我们分析了在语义信息提取、环境与评估、辅助任务及任务表示策略方面的前沿技术。通过比较分析,我们强调了当前方法在连接语言指令与机器人行动方面的优势与局限。最后,我们讨论了开放挑战与未来研究方向,重点关注如何潜在提升语言条件化机器人操作器的泛化能力并解决其安全问题。本文的GitHub仓库地址为:https://github.com/hk-zh/language-conditioned-robot-manipulation-models。