AgentSpec: Understanding Embodied Agent Scaffolds Through Controlled Composition

LLM agents are increasingly built not as single model calls, but as scaffolded systems that combine reasoning, memory, reflection, action execution, and learning. While such scaffolds often improve performance, they are often embedded in tightly coupled pipelines, making it difficult to isolate component contributions, compare alternative designs, or understand how module interactions shape agent behavior. We introduce AgentSpec, a modular specification framework that represents embodied agents as typed compositions of reusable policy components with standardized interfaces. AgentSpec standardizes the interfaces among perception, memory, reasoning, reflection, action, and optional learning, enabling components to be swapped and recombined under controlled conditions. We instantiate this framework across DeliveryBench, ALFRED, MiniGrid, and RoboTHOR, and analyze reasoning, memory, reflection, and reinforcement-learning modules across model backbones. Our results show that agent performance is governed by scaffold compatibility and interaction effects rather than isolated module strength. In particular, structured multi-granularity memory improves long-horizon state tracking, reasoning and memory interact non-uniformly across environments, reflection trades off correction and cost, and RL-trained policies compose best when optimized with deployment-time scaffold structure. AgentSpec provides a controlled foundation for studying, comparing, and designing composable LLM agents. Our code, baselines and interactive playground are publicly available at https://agentspec-embodied.github.io.

翻译：大型语言模型（LLM）智能体正日益被构建为包含推理、记忆、反思、动作执行和学习的脚手架系统，而非单一模型调用。此类脚手架虽常能提升性能，但往往嵌入紧密耦合的流水线中，导致难以隔离组件贡献、比较替代设计方案或理解模块交互如何塑造智能体行为。我们提出AgentSpec——一种模块化规范框架，将具身智能体表示为具有标准化接口的可复用策略组件的类型化组合。AgentSpec标准化了感知、记忆、推理、反思、动作及可选学习模块间的接口，使得组件可在受控条件下被替换与重组。我们在DeliveryBench、ALFRED、MiniGrid和RoboTHOR上实例化该框架，并跨模型骨干分析了推理、记忆、反思及强化学习模块。研究结果表明，智能体性能由脚手架兼容性与交互效应主导，而非孤立模块强度。具体而言：结构化多粒度记忆改善长程状态追踪；推理与记忆在不同环境中的交互呈非均匀特性；反思在修正与代价间权衡；采用部署时脚手架结构优化的强化学习训练策略具有最佳的组合性。AgentSpec为研究、比较及设计可组合LLM智能体提供了受控基础。我们的代码、基线模型及交互式沙盒已发布于https://agentspec-embodied.github.io。