基于基础模型的移动服务机器人具身人工智能：系统性综述 (Embodied AI with Foundation Models for Mobile Service Robots: A Systematic Review)

Rapid advancements in foundation models, including Large Language Models, Vision-Language Models, Multimodal Large Language Models, and Vision-Language-Action Models, have opened new avenues for embodied AI in mobile service robotics. By combining foundation models with the principles of embodied AI, where intelligent systems perceive, reason, and act through physical interaction, mobile service robots can achieve more flexible understanding, adaptive behavior, and robust task execution in dynamic real-world environments. Despite this progress, embodied AI for mobile service robots continues to face fundamental challenges related to the translation of natural language instructions into executable robot actions, multimodal perception in human-centered environments, uncertainty estimation for safe decision-making, and computational constraints for real-time onboard deployment. In this paper, we present the first systematic review focused specifically on the integration of foundation models in mobile service robotics. We analyze how recent advances in foundation models address these core challenges through language-conditioned control, multimodal sensor fusion, uncertainty-aware reasoning, and efficient model scaling. We further examine real-world applications in domestic assistance, healthcare, and service automation, highlighting how foundation models enable context-aware, socially responsive, and generalizable robot behaviors. Beyond technical considerations, we discuss ethical, societal, and human-interaction implications associated with deploying foundation model-enabled service robots in human environments. Finally, we outline future research directions emphasizing reliability and lifelong adaptation, privacy-aware and resource-constrained deployment, and governance and human-in-the-loop frameworks required for safe, scalable, and trustworthy mobile service robotics.

翻译：大型语言模型、视觉语言模型、多模态大语言模型以及视觉语言行动模型等基础模型的快速发展，为移动服务机器人领域的具身人工智能开辟了新途径。通过将基础模型与具身人工智能原理相结合——即智能系统通过物理交互进行感知、推理与行动——移动服务机器人能够在动态现实环境中实现更灵活的理解、自适应行为及鲁棒的任务执行。尽管取得了这些进展，移动服务机器人的具身人工智能仍面临若干根本性挑战，包括自然语言指令到可执行机器人动作的转化、以人为中心环境中的多模态感知、安全决策的不确定性估计，以及实时机载部署的计算约束。本文首次针对基础模型在移动服务机器人领域的集成进行了系统性综述。我们分析了基础模型的最新进展如何通过语言条件控制、多模态传感器融合、不确定性感知推理及高效模型缩放来应对这些核心挑战。我们进一步考察了在家庭辅助、医疗保健和服务自动化等领域的实际应用，重点阐述了基础模型如何实现情境感知、社会响应和可泛化的机器人行为。除技术考量外，我们还讨论了在人类环境中部署基于基础模型的服务机器人所涉及的伦理、社会及人机交互影响。最后，我们展望了未来研究方向，强调可靠性及终身适应性、隐私保护与资源受限部署，以及实现安全、可扩展和可信赖的移动服务机器人所需的治理与人机协同框架。