Vision-language models (VLMs) have demonstrated remarkable capabilities in robotic planning, particularly for long-horizon tasks that require a holistic understanding of the environment for task decomposition. Existing methods typically rely on prior environmental knowledge or carefully designed task-specific prompts, making them struggle with dynamic scene changes or unexpected task conditions, e.g., a robot attempting to put a carrot in the microwave but finds the door was closed. Such challenges underscore two critical issues: adaptability and efficiency. To address them, in this work, we propose an adaptive multi-agent planning framework, termed REMAC, that enables efficient, scene-agnostic multi-robot long-horizon task planning and execution through continuous reflection and self-evolution. REMAC incorporates two key modules: a self-reflection module performing pre-condition and post-condition checks in the loop to evaluate progress and refine plans, and a self-evolvement module dynamically adapting plans based on scene-specific reasoning. It offers several appealing benefits: 1) Robots can initially explore and reason about the environment without complex prompt design. 2) Robots can keep reflecting on potential planning errors and adapting the plan based on task-specific insights. 3) After iterations, a robot can call another one to coordinate tasks in parallel, maximizing the task execution efficiency. To validate REMAC's effectiveness, we build a multi-agent environment for long-horizon robot manipulation and navigation based on RoboCasa, featuring 4 task categories with 27 task styles and 50+ different objects. Based on it, we further benchmark state-of-the-art reasoning models, including DeepSeek-R1, o3-mini, QwQ, and Grok3, demonstrating REMAC's superiority by boosting average success rates by 40% and execution efficiency by 52.7% over the single robot baseline.
翻译:视觉语言模型(VLMs)在机器人任务规划方面展现出卓越的能力,尤其适用于需要对环境进行整体理解以分解任务的长时程任务。现有方法通常依赖于先验环境知识或精心设计的任务特定提示,这使其难以应对动态场景变化或意外任务条件,例如机器人试图将胡萝卜放入微波炉却发现门已关闭。此类挑战凸显了两个关键问题:适应性与效率。为解决这些问题,本研究提出了一种自适应多智能体规划框架,称为REMAC,该框架通过持续反思与自我演进,实现了高效、场景无关的多机器人长时程任务规划与执行。REMAC包含两个核心模块:在循环中执行前置条件与后置条件检查以评估进度并优化规划的自反思模块,以及基于场景特定推理动态调整规划的自演进模块。该框架具有以下显著优势:1) 机器人能够在无需复杂提示设计的情况下,对环境进行初步探索与推理。2) 机器人能够持续反思潜在的规划错误,并基于任务特定洞察调整规划。3) 经过多次迭代后,机器人可调用其他机器人并行协调任务,从而最大化任务执行效率。为验证REMAC的有效性,我们基于RoboCasa构建了一个面向长时程机器人操作与导航的多智能体环境,包含4个任务类别、27种任务风格及50多种不同物体。在此基础上,我们进一步对包括DeepSeek-R1、o3-mini、QwQ和Grok3在内的先进推理模型进行了基准测试,结果表明REMAC相较于单机器人基线,将平均成功率提升了40%,执行效率提高了52.7%,充分证明了其优越性。