Agent Primitives: Reusable Latent Building Blocks for Multi-Agent Systems

While existing multi-agent systems (MAS) can handle complex problems by enabling collaboration among multiple agents, they are often highly task-specific, relying on manually crafted agent roles and interaction prompts, which leads to increased architectural complexity and limited reusability across tasks. Moreover, most MAS communicate primarily through natural language, making them vulnerable to error accumulation and instability in long-context, multi-stage interactions within internal agent histories. In this work, we propose \textbf{Agent Primitives}, a set of reusable latent building blocks for LLM-based MAS. Inspired by neural network design, where complex models are built from reusable components, we observe that many existing MAS architectures can be decomposed into a small number of recurring internal computation patterns. Based on this observation, we instantiate three primitives: Review, Voting and Selection, and Planning and Execution. All primitives communicate internally via key-value (KV) cache, which improves both robustness and efficiency by mitigating information degradation across multi-stage interactions. To enable automatic system construction, an Organizer agent selects and composes primitives for each query, guided by a lightweight knowledge pool of previously successful configurations, forming a primitive-based MAS. Experiments show that primitives-based MAS improve average accuracy by 12.0-16.5\% over single-agent baselines, reduce token usage and inference latency by approximately 3$\times$-4$\times$ compared to text-based MAS, while incurring only 1.3$\times$-1.6$\times$ overhead relative to single-agent inference and providing more stable performance across model backbones.

翻译：现有基于大语言模型的多智能体系统（MAS）虽能通过多智能体协作解决复杂问题，却常因高度任务特异性而依赖人工设计的智能体角色与交互提示词，导致架构复杂化及跨任务复用性受限。此外，多数MAS主要依赖自然语言通信，使其在内部智能体历史记录的长上下文多阶段交互中易产生错误累积和不稳定性。本工作提出**Agent Primitives**（智能体基元）——一套面向LLM驱动的MAS的可复用潜空间构建模块。受神经网络设计中通过可复用组件构建复杂模型的启发，我们发现现有MAS架构可分解为少量重复出现的内部计算模式。基于此观察，我们实例化三种基元：审查（Review）、投票与选择（Voting and Selection）、规划与执行（Planning and Execution）。所有基元通过键值缓存（Key-Value Cache）实现内部通信，通过缓解跨阶段交互中的信息退化，同步提升鲁棒性与效率。为实现自动化系统构建，调度器（Organizer Agent）依据历史成功配置的轻量级知识库，为每条查询选取并组合相应基元，形成基于基元的MAS架构。实验表明，相较于单智能体基线，基元化MAS的平均准确率提升12.0%~16.5%，token消耗与推理延迟较文本型MAS降低约3倍至4倍，仅较单智能体推理增加1.3倍至1.6倍开销，且在不同模型骨干上保持更稳定的性能表现。