Industry practitioners and academic researchers regularly use multi-agent systems to accelerate their work, yet the frameworks through which these systems operate do not provide a simple, unified mechanism for scalably managing the critical aspects of the agent harness, impacting both the quality of individual human-agent interactions and the capacity for practitioners to coordinate toward common goals through shared agent infrastructure. Agent frameworks have enabled increasingly sophisticated multi-agent systems, but the behavioral specifications that define what these agents can do remain fragmented across prose instruction files, framework-internal configuration, and mechanisms like MCP servers that operate separately from individual agent definitions, making these specifications difficult to share, version, or collaboratively maintain across teams and projects. Applying the ALARA principle from radiation safety (exposures kept as low as reasonably achievable) to agent context, we introduce a declarative context-agent-tool (CAT) data layer expressed through interrelated files that scope each agent's tool access and context to the minimum its role requires, and \texttt{npcsh}, a command-line shell for executing it. Because the system parses and enforces these files structurally, modifying an agent's tool list produces a guaranteed behavioral change rather than a suggestion the model may or may not follow. We evaluate 22 locally-hosted models from 0.6B to 35B parameters across 115 practical tasks spanning file operations, web search, multi-step scripting, tool chaining, and multi-agent delegation, characterizing which model families succeed at which task categories and where they break down across $\sim$2500 total executions.
翻译:工业从业者和学术研究者经常使用多智能体系统来加速工作,但支撑这些系统运行的框架并未提供简单统一的机制来可扩展地管理智能体编排的关键方面,这不仅影响了个人人-智能体交互的质量,也限制了从业者通过共享智能体基础设施协调实现共同目标的能力。现有智能体框架虽已实现日益复杂的多智能体系统,但定义智能体行为的规范仍分散在文本指令文件、框架内部配置以及独立于个体智能体定义的MCP服务器等机制中,导致这些规范难以在团队和项目间共享、版本控制或协作维护。借鉴辐射安全领域中的ALARA原则(辐射暴露应尽可能低),我们将其应用于智能体上下文,提出一种通过相互关联的文件表达的声明式上下文-智能体-工具数据层,该数据层将每个智能体的工具访问权限和上下文限制为其角色所需的最小范围,并配套开发了命令行执行外壳npcsh。由于系统以结构化方式解析并强制执行这些文件,修改智能体的工具列表将产生确定性的行为变更,而非模型可能遵守也可能不遵守的建议。我们针对22个本地部署模型(参数量从0.6B到35B)在115项实际任务(涵盖文件操作、网页搜索、多步脚本编写、工具链调用和多智能体委派)中进行了评估,在总计约2500次执行中刻画了各模型族在哪些任务类别上表现成功、又在哪些环节出现能力不足。