Recent work leverages the capabilities and commonsense priors of generative models for robot control. In this paper, we present an agentic control system in which a reasoning-capable language model plans and executes tasks by selecting and invoking robot skills within an iterative planner and executor loop. We deploy the system on two physical robot platforms in two settings: (i) tabletop grasping, placement, and box insertion in indoor mobile manipulation (Mobipick) and (ii) autonomous agricultural navigation and sensing (Valdemar). Both settings involve uncertainty, partial observability, sensor noise, and ambiguous natural-language commands. The system exposes structured introspection of its planning and decision process, reacts to exogenous events via explicit event checks, and supports operator interventions that modify or redirect ongoing execution. Across both platforms, our proof-of-concept experiments reveal substantial fragility, including non-deterministic suboptimal behavior, instruction-following errors, and high sensitivity to prompt specification. At the same time, the architecture is flexible: transfer to a different robot and task domain largely required updating the system prompt (domain model, affordances, and action catalogue) and re-binding the same tool interface to the platform-specific skill API.
翻译:近期研究利用生成模型的能力与常识先验进行机器人控制。本文提出一种智能体化控制系统,其中具备推理能力的语言模型通过在一个迭代式规划器-执行器循环中选择并调用机器人技能来规划并执行任务。我们在两种物理机器人平台上部署该系统,应用于两种场景:(i) 室内移动操作(Mobipick)中的桌面抓取、放置与箱体插入;(ii)自主农业导航与感知(Valdemar)。两种场景均涉及不确定性、部分可观测性、传感器噪声及模糊的自然语言指令。该系统支持对其规划与决策过程进行结构化内省,通过显式事件检查响应外部事件,并允许操作员干预以修改或重定向正在执行的任务。在两个平台的概念验证实验中,我们观察到系统存在显著的脆弱性,包括非确定性的次优行为、指令跟随错误以及对提示规范的高度敏感性。同时,该架构具有灵活性:迁移至不同机器人及任务领域主要只需更新系统提示(领域模型、功能可供性与动作目录),并将相同的工具接口重新绑定至平台特定的技能API。