Language-conditioned robotic manipulation represents a cutting-edge area of research, enabling seamless communication and cooperation between humans and robotic agents. This field focuses on teaching robotic systems to comprehend and execute instructions conveyed in natural language. To achieve this, the development of robust language understanding models capable of extracting actionable insights from textual input is essential. In this comprehensive survey, we systematically explore recent advancements in language-conditioned approaches within the context of robotic manipulation. We analyze these approaches based on their learning paradigms, which encompass reinforcement learning, imitation learning, and the integration of foundational models, such as large language models and vision-language models. Furthermore, we conduct an in-depth comparative analysis, considering aspects like semantic information extraction, environment & evaluation, auxiliary tasks, and task representation. Finally, we outline potential future research directions in the realm of language-conditioned learning for robotic manipulation, with the topic of generalization capabilities and safety issues.
翻译:语言条件式机器人操作代表了前沿研究领域,实现了人类与机器人智能体之间的无缝通信与协作。该领域聚焦于教会机器人系统理解并执行自然语言传达的指令。为实现此目标,必须开发能够从文本输入中提取可执行语义信息的鲁棒语言理解模型。本综合综述系统探讨了机器人操作背景下语言条件方法的最新进展。我们基于学习范式对这些方法进行分析,涵盖强化学习、模仿学习以及基础模型(如大语言模型和视觉-语言模型)的融合。此外,我们开展了深度比较分析,考察了语义信息提取、环境与评估、辅助任务以及任务表征等维度。最后,我们展望了语言条件式机器人操作学习的未来研究方向,重点聚焦泛化能力与安全性议题。