Operating LLMs as coordinated multi-agent research systems over multi-hour runs surfaces failure modes that single-shot evaluation cannot: upstream providers throttle without warning, sub-agents drift the task to fit accessible tools, narrate machinery instead of using it, open revision iterations with self-apology, or treat upstream context as executable directives. We present PRIMA, whose primary contributions are three operational patterns for surviving these failure modes: (1) a resilience-and-recovery layer that detects upstream rate-limit signals, persists a typed pause record to disk, and resumes long-running runs without re-executing converged work even across process restarts; (2) a sub-agent operating discipline encoding task-fidelity, tool-use, revision, and inter-step context-boundary norms as a structural prompt layer; (3) a multi-phase application pattern for structured engineering deliverables pairing orthogonal draft steps with an explicit cross-document harmonization pass before final synthesis. These sit atop a foundational protocol: a research-program specification language with explicit convergence criteria, a dual-metric scoring engine (LLM-judged rubric plus sandboxed code), an outer meta-optimization loop, event-driven persistence, hook-based middleware, context compaction, and a multi-provider LLM abstraction. Agent identities derive from prime powers, giving collision-free identifiers and trivially-verifiable cluster membership without a central registry. Theoretical guarantees include $O(k)$ verification, $O(V+E)$ DAG validation, and identity collision freedom by the Fundamental Theorem of Arithmetic. A Graph Isomorphism case study grounds the architectural claims in a generated artifact: a six-step protocol that produced a research paper proposing a new canonical-form algorithm with three theorems and five conjectures.
翻译:将大语言模型作为协调的多智能体研究系统运行数小时,会暴露出单次评估无法捕捉的故障模式:上游服务商无预警限流、子智能体为适配可用工具而偏离任务、叙述机械操作而非实际使用工具、以自我致歉开启修订迭代、或将上游上下文视为可执行指令。本文提出PRIMA,其核心贡献在于三种应对上述故障模式的运行模式:(1)弹性恢复层,可检测上游速率限制信号,将带类型的暂停记录持久化至磁盘,并在进程重启后恢复长时间运行任务,无需重新执行已收敛的工作;(2)子智能体操作规范,将任务保真度、工具使用、修订及跨步骤上下文边界规范编码为结构性提示层;(3)面向结构化工程交付物的多阶段应用模式,将正交草稿阶段与最终合成前的显式跨文档协调环节相结合。上述模式基于基础协议层:包含显式收敛准则的研究程序规范语言、双指标评分引擎(LLM评分标准加沙盒代码)、外部元优化循环、事件驱动持久化、钩子中间件、上下文压缩及多供应商LLM抽象。智能体身份源自素数幂,可在无中央注册表的情况下获得无冲突标识符与可轻松验证的集群成员资格。理论保证包括O(k)验证复杂度、O(V+E)有向无环图校验复杂度,以及基于算术基本定理的身份无冲突性。图同构案例研究通过生成物验证了架构主张:一个六阶段协议产出了提出新规范形式算法的研究论文,该算法包含三个定理与五个猜想。