We propose LEO-RobotAgent, a general-purpose language-driven intelligent agent framework for robots. Under this framework, LLMs can operate different types of robots to complete unpredictable complex tasks across various scenarios. This framework features strong generalization, robustness, and efficiency. The application-level system built around it can fully enhance bidirectional human-robot intent understanding and lower the threshold for human-robot interaction. Regarding robot task planning, the vast majority of existing studies focus on the application of large models in single-task scenarios and for single robot types. These algorithms often have complex structures and lack generalizability. Thus, the proposed LEO-RobotAgent framework is designed with a streamlined structure as much as possible, enabling large models to independently think, plan, and act within this clear framework. We provide a modular and easily registrable toolset, allowing large models to flexibly call various tools to meet different requirements. Meanwhile, the framework incorporates a human-robot interaction mechanism, enabling the algorithm to collaborate with humans like a partner. Experiments have verified that this framework can be easily adapted to mainstream robot platforms including unmanned aerial vehicles (UAVs), robotic arms, and wheeled robot, and efficiently execute a variety of carefully designed tasks with different complexity levels. Our code is available at https://github.com/LegendLeoChen/LEO-RobotAgent.
翻译:我们提出LEO-RobotAgent,一种面向机器人的通用语言驱动智能体框架。在该框架下,大型语言模型(LLMs)可操控不同类型机器人在多种场景中完成不可预知的复杂任务。该框架具有强泛化性、鲁棒性和高效性。基于该框架构建的应用级系统能够全面提升人机双向意图理解能力,并降低人机交互门槛。在机器人任务规划方面,现有研究绝大多数聚焦于大型模型在单任务场景与单一机器人类型中的应用,这些算法结构复杂且缺乏泛化能力。因此,所提出的LEO-RobotAgent框架采用尽可能精简的结构设计,使大型模型能够在此清晰框架内自主思考、规划与行动。我们提供模块化且易于注册的工具集,使大型模型可灵活调用各类工具以满足不同需求。同时,该框架内嵌人机交互机制,使算法能够像伙伴一样与人类协作。实验验证表明,此框架可轻松适配包括无人机(UAV)、机械臂及轮式机器人在内的主流机器人平台,并高效执行多种精心设计且复杂度各异的任务。我们的代码已开源至 https://github.com/LegendLeoChen/LEO-RobotAgent。