Code-as-Policies (CaP) has shown that large language models (LLMs) can write code to solve robotics tasks by composing perception, planning, and control primitives. Recent CaP systems, however, rely on multi-turn code-generation loops at test time, which is often infeasible for real-time robot control. We introduce Robotics Harness Optimization (RHO), a novel paradigm in which tool-enabled coding agents, at training time, propose and search for interpretable, neurosymbolic multi-file policy repositories (Repositories-as-Policies) that compose these primitives rather than a single prompt, function, or file. RHO searches with reflective feedback from environment reward and execution rather than teleoperation demonstrations. It generalizes to perturbed pick-and-place settings like LIBERO-PRO, where OpenVLA scores 0.0% and $π_{0.5}$ averages 12.83%. Using the same low-level primitives, RHO reaches a 45.0% success rate, 2.5x higher than the strongest multi-turn agentic system, and 3.5x higher than $π_{0.5}$. On Robosuite, RHO sets a new state-of-the-art of 70.0%, exceeding the prior multi-turn record of 68.29% using single-turn execution with no corrective LLM code edits at deployment. When an LLM is used in the control loop, as on RAI's O3DE benchmark, RHO optimizes the deployed agent's multi-file harness of prompts, tools, and control code, improving held-out success from 23.5% to 44.3% with 20% less wall-clock time and 27% fewer tool calls.
翻译:代码即策略(Code-as-Policies, CaP)表明,大语言模型(LLMs)可通过编排感知、规划与控制基元编写代码以解决机器人任务。然而,近期CaP系统在测试阶段依赖多轮代码生成循环,这通常难以满足机器人实时控制需求。本文提出机器人操作优化框架(Robotics Harness Optimization, RHO),该创新范式使工具赋能编码智能体在训练阶段能够通过提议和搜索,构建可解释的神经符号化多文件策略仓库(仓库即策略,Repositories-as-Policies),而非依赖单一提示、函数或文件。RHO利用环境奖励与执行结果的反省式反馈进行搜索,而非依赖遥操作示范。在LIBERO-PRO等受扰动的拾取放置场景中,OpenVLA得分为0.0%、π₀.₅平均得分为12.83%时,RHO使用相同低级基元实现45.0%成功率,较最强多轮智能体系统提升2.5倍,较π₀.₅提升3.5倍。在Robosuite基准测试中,RHO以70.0%成功率刷新当前最优记录,超越先前多轮记录68.29%,且部署阶段无需LLM代码修正。当LLM介入控制回路时(如RAI的O3DE基准测试),RHO优化部署智能体的多文件操作框架(含提示、工具与控制代码),将留出成功率从23.5%提升至44.3%,同时减少20%时钟时间与27%工具调用次数。