Quadruped robots are capable of traversing a wide range of complex terrains with high flexibility. As highly mobile ground-based intelligent platforms, they can be equipped with modules for navigation control, environmental perception, and intelligent interaction, thereby serving as real-world mobile deployment platforms for various algorithms. In this paper, we introduce Y-BotFrame, an extensible embodied platform that turns a robot into an intelligent ground assistant. Y-BotFrame integrates multimodal perception capabilities, including speech, vision, and LiDAR, and employs a large language model as the cognitive core for environmental understanding, contextual reasoning, and task planning. The system maps user natural-language instructions into executable embodied task units that can be carried out by the robot. Y-BotFrame supports natural interaction through voice commands and visual feedback, removing the need for a remote controller and enabling efficient human-robot collaboration. With a highly extensible framework, Y-BotFrame supports plug-and-play integration of new functional modules as well as modular upgrades and iterative development, offering a reference implementation for the real-world deployment of general-purpose, instruction-driven embodied agents.The supplementary video is available at https://xdei-group.github.io/Y-BotFrame/.
翻译:四足机器人能够以高灵活性穿越各种复杂地形。作为高机动性的地面智能平台,它们可配备导航控制、环境感知和智能交互模块,从而成为各类算法的实际移动部署平台。本文介绍了Y-BotFrame,一个可扩展的具身智能平台,可让机器人成为智能地面助手。Y-BotFrame集成了包括语音、视觉和激光雷达在内的多模态感知能力,并以大型语言模型作为认知核心,实现环境理解、上下文推理和任务规划。该系统将用户的自然语言指令映射为机器人可执行的具体具身任务单元。Y-BotFrame通过语音指令和视觉反馈支持自然交互,无需遥控器,从而实现高效的人机协作。凭借高度可扩展的框架,Y-BotFrame支持新功能模块的即插即用集成以及模块化升级与迭代开发,为通用指令驱动的具身智能体的实际部署提供了参考实现。补充视频见https://xdei-group.github.io/Y-BotFrame/。