Recent developments in foundation models, like Large Language Models (LLMs) and Vision-Language Models (VLMs), trained on extensive data, facilitate flexible application across different tasks and modalities. Their impact spans various fields, including healthcare, education, and robotics. This paper provides an overview of the practical application of foundation models in real-world robotics, with a primary emphasis on the replacement of specific components within existing robot systems. The summary encompasses the perspective of input-output relationships in foundation models, as well as their role in perception, motion planning, and control within the field of robotics. This paper concludes with a discussion of future challenges and implications for practical robot applications.
翻译:近期以大规模语言模型(LLMs)和视觉-语言模型(VLMs)为代表的基础模型,凭借海量数据的训练,展现出跨任务、跨模态的灵活应用能力。其影响已广泛覆盖医疗、教育及机器人等多个领域。本文系统综述了基础模型在真实世界机器人中的实际应用,重点探讨了其对现有机器人系统特定组件的替代作用。总结涵盖基础模型的输入-输出关系视角,及其在机器人感知、运动规划与控制中的功能角色。最后,本文展望了未来挑战及其对机器人实际应用的意义。