Tool use is a hallmark of advanced intelligence, exemplified in both animal behavior and robotic capabilities. This paper investigates the feasibility of imbuing robots with the ability to creatively use tools in tasks that involve implicit physical constraints and long-term planning. Leveraging Large Language Models (LLMs), we develop RoboTool, a system that accepts natural language instructions and outputs executable code for controlling robots in both simulated and real-world environments. RoboTool incorporates four pivotal components: (i) an "Analyzer" that interprets natural language to discern key task-related concepts, (ii) a "Planner" that generates comprehensive strategies based on the language input and key concepts, (iii) a "Calculator" that computes parameters for each skill, and (iv) a "Coder" that translates these plans into executable Python code. Our results show that RoboTool can not only comprehend explicit or implicit physical constraints and environmental factors but also demonstrate creative tool use. Unlike traditional Task and Motion Planning (TAMP) methods that rely on explicit optimization, our LLM-based system offers a more flexible, efficient, and user-friendly solution for complex robotics tasks. Through extensive experiments, we validate that RoboTool is proficient in handling tasks that would otherwise be infeasible without the creative use of tools, thereby expanding the capabilities of robotic systems. Demos are available on our project page: https://creative-robotool.github.io/.
翻译:工具使用是高级智能的标志,这在动物行为和机器人能力中均有体现。本文研究了在涉及隐性物理约束和长期规划的任务中,赋予机器人创造性使用工具能力的可行性。利用大型语言模型(LLMs),我们开发了RoboTool系统,该系统接受自然语言指令,并输出可在模拟和真实环境中控制机器人的可执行代码。RoboTool包含四个关键组件:(i)“分析器”,解释自然语言以识别与任务相关的关键概念;(ii)“规划器”,基于语言输入和关键概念生成综合策略;(iii)“计算器”,计算每个技能的参数;(iv)“编码器”,将这些计划转化为可执行的Python代码。研究结果表明,RoboTool不仅能理解显式或隐式的物理约束和环境因素,还能展示创造性工具使用。与依赖显式优化的传统任务与运动规划(TAMP)方法不同,我们的基于LLM的系统为复杂机器人任务提供了更灵活、高效且用户友好的解决方案。通过大量实验,我们验证了RoboTool能够熟练处理那些在没有创造性工具使用情况下无法完成的任务,从而扩展了机器人系统的能力。演示视频可在项目页面获取:https://creative-robotool.github.io/。