Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace and the physical world. Recently, the emergence of Multi-modal Large Models (MLMs) and World Models (WMs) have attracted significant attention due to their remarkable perception, interaction, and reasoning capabilities, making them a promising architecture for the brain of embodied agents. However, there is no comprehensive survey for Embodied AI in the era of MLMs. In this survey, we give a comprehensive exploration of the latest advancements in Embodied AI. Our analysis firstly navigates through the forefront of representative works of embodied robots and simulators, to fully understand the research focuses and their limitations. Then, we analyze four main research targets: 1) embodied perception, 2) embodied interaction, 3) embodied agent, and 4) sim-to-real adaptation, covering the state-of-the-art methods, essential paradigms, and comprehensive datasets. Additionally, we explore the complexities of MLMs in virtual and real embodied agents, highlighting their significance in facilitating interactions in dynamic digital and physical environments. Finally, we summarize the challenges and limitations of embodied AI and discuss their potential future directions. We hope this survey will serve as a foundational reference for the research community and inspire continued innovation. The associated project can be found at https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List.
翻译:具身人工智能(Embodied AI)是实现通用人工智能(AGI)的关键,并为连接虚拟空间与物理世界的多种应用奠定基础。近年来,多模态大模型(MLMs)与世界模型(WMs)因其卓越的感知、交互与推理能力受到广泛关注,成为具身智能体“大脑”的潜在架构。然而,目前尚缺乏针对MLMs时代具身人工智能的全面综述。本综述系统梳理了具身人工智能的最新进展。我们首先通过分析具身机器人及仿真平台的前沿代表性工作,深入理解该领域的研究焦点与现存局限。随后,我们围绕四大研究目标展开论述:1)具身感知,2)具身交互,3)具身智能体,以及4)仿真到现实的迁移,涵盖前沿方法、核心范式与完整数据集。此外,我们探讨了MLMs在虚拟与真实具身智能体中的复杂应用,强调其在动态数字与物理环境交互中的重要作用。最后,我们总结了具身人工智能面临的挑战与局限,并展望其未来潜在发展方向。我们希望本综述能为研究社区提供基础性参考,并激发持续创新。相关项目可在 https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List 获取。