This paper addresses the problem of planning complex manipulation tasks, in which multiple robots with different end-effectors and capabilities, informed by computer vision, must plan and execute concatenated sequences of actions on a variety of objects that can appear in arbitrary positions and configurations in unstructured scenes. We propose an intent-driven planning pipeline which can robustly construct such action sequences with varying degrees of supervisory input from a human using simple language instructions. The pipeline integrates: (i) perception-to-text scene encoding, (ii) an ensemble of large language models (LLMs) that generate candidate removal sequences based on the operator's intent, (iii) an LLM-based verifier that enforces formatting and precedence constraints, and (iv) a deterministic consistency filter that rejects hallucinated objects. The pipeline is evaluated on an example task in which two robot arms work collaboratively to dismantle an Electric Vehicle battery for recycling applications. A variety of components must be grasped and removed in specific sequences, determined by human instructions and/or by task-order feasibility decisions made by the autonomous system. On 200 real scenes with 600 operator prompts across five component classes, we used metrics of full-sequence correctness and next-task correctness to evaluate and compare five LLM-based planners (including ablation analyses of pipeline components). We also evaluated the LLM-based human interface in terms of time to execution and NASA TLX with human participant experiments. Results indicate that our ensemble-with-verification approach reliably maps operator intent to safe, executable multi-robot plans while maintaining low user effort.
翻译:本文研究了复杂操作任务的规划问题,其中多个具有不同末端执行器和能力的机器人,在计算机视觉的引导下,必须对非结构化场景中可能以任意位置和姿态出现的各种物体,规划并执行一系列串联的动作序列。我们提出了一种意图驱动的规划流程,能够根据人类操作员使用简单语言指令提供的不同层级监督输入,稳健地构建此类动作序列。该流程整合了:(i)感知到文本的场景编码,(ii)基于操作员意图生成候选移除序列的大型语言模型(LLM)集成,(iii)强制执行格式和优先顺序约束的基于LLM的验证器,以及(iv)拒绝幻觉对象的确定性一致性过滤器。该流程在一个示例任务中进行了评估,该任务中两个机械臂协作拆解一个电动汽车电池以用于回收应用。多种组件必须按照特定的顺序被抓取和移除,该顺序由人类指令和/或自主系统根据任务顺序可行性决策确定。在包含五个组件类别的200个真实场景和600个操作员提示中,我们使用全序列正确率和下一任务正确率指标评估并比较了五种基于LLM的规划器(包括对流程组件的消融分析)。我们还通过执行时间和NASA TLX量表,结合人类参与者实验评估了基于LLM的人机交互界面。结果表明,我们提出的集成验证方法能够可靠地将操作员意图映射为安全、可执行的多机器人计划,同时保持较低的用户操作负担。