无需演示的机器人控制：基于LLM智能体的实现 (Demonstration-Free Robotic Control via LLM Agents)

Robotic manipulation has increasingly adopted vision-language-action (VLA) models, which achieve strong performance but typically require task-specific demonstrations and fine-tuning, and often generalize poorly under domain shift. We investigate whether general-purpose large language model (LLM) agent frameworks, originally developed for software engineering, can serve as an alternative control paradigm for embodied manipulation. We introduce FAEA (Frontier Agent as Embodied Agent), which applies an LLM agent framework directly to embodied manipulation without modification. Using the same iterative reasoning that enables software agents to debug code, FAEA enables embodied agents to reason through manipulation strategies. We evaluate an unmodified frontier agent, Claude Agent SDK, across the LIBERO, ManiSkill3, and MetaWorld benchmarks. With privileged environment state access, FAEA achieves success rates of 84.9%, 85.7%, and 96%, respectively. This level of task success approaches that of VLA models trained with less than 100 demonstrations per task, without requiring demonstrations or fine-tuning. With one round of human feedback as an optional optimization, performance increases to 88.2% on LIBERO. This demonstration-free capability has immediate practical value: FAEA can autonomously explore novel scenarios in simulation and generate successful trajectories for training data augmentation in embodied learning. Our results indicate that general-purpose agents are sufficient for a class of manipulation tasks dominated by deliberative, task-level planning. This opens a path for robotics systems to leverage actively maintained agent infrastructure and benefit directly from ongoing advances in frontier models. Code is available at https://github.com/robiemusketeer/faea-sim

翻译：机器人操作领域日益采用视觉-语言-动作（VLA）模型，这类模型虽能实现强劲性能，但通常需要任务特定的演示与微调，且在领域偏移下泛化能力往往欠佳。本研究探讨了最初为软件工程开发的通用大语言模型（LLM）智能体框架，能否作为具身操作任务的替代控制范式。我们提出FAEA（前沿智能体作为具身智能体），该框架将LLM智能体架构直接应用于具身操作任务而无需任何修改。借助软件智能体调试代码时所采用的迭代推理机制，FAEA使具身智能体能够通过推理制定操作策略。我们在LIBERO、ManiSkill3和MetaWorld基准测试中评估了未经修改的前沿智能体Claude Agent SDK。在拥有环境状态特权访问权限的条件下，FAEA分别实现了84.9%、85.7%和96%的成功率。这一任务成功率接近每项任务使用少于100次演示训练的VLA模型水平，且无需任何演示或微调。当引入单轮人工反馈作为可选优化时，在LIBERO基准上的性能提升至88.2%。这种无需演示的能力具有直接实用价值：FAEA可在仿真环境中自主探索新场景，并为具身学习中的训练数据增强生成成功轨迹。研究结果表明，通用智能体足以应对一类以审慎任务级规划为主导的操作任务。这为机器人系统开辟了新路径，使其能够利用持续维护的智能体基础设施，并直接从前沿模型的持续进展中获益。代码发布于https://github.com/robiemusketeer/faea-sim