Multi-component LLM agents assemble probabilistic claims from components that each see only part of a joint problem; the composition can violate basic probability axioms even when every component is locally coherent. We formalise this locally coherent, globally incoherent failure via the compositional residual eps*, the L2 distance from the composed quote to the joint coherent polytope, computable at runtime from system output and the declared cross-component coupling constraints. A product-structure dichotomy characterises when local coherence suffices, and a Rayleigh-quotient prediction matches the observed residual within 7% on three of four relation classes. A hierarchical Boyle-Dykstra projection repairs the composition deterministically; an anytime-valid e-process gives sequential coherence monitoring. Across 1,876 ensemble cliques on a four-LLM mid-tier panel (frontier-panel rerun in Section 5.5), eps* > 0 on 33-94% of cliques, translating to +0.115 nats per bet of regret on 1,770 resolved bets under the proportional allocation rule (the gain collapses to +0.006 under bettors that themselves coherentise). Three intuitive LLM-side mitigations(retrieval, partition-aware prompting, aggregator-LLM) each fail or regress.
翻译:多组件大语言模型智能体将各组件提供的概率性主张进行组合,而每个组件仅能观测联合问题的局部信息;即便每个组件在局部范围内满足相干性,组合结果仍可能违反基本概率公理。我们通过组合残差ε*(组合命题与联合相干多面体之间的L2距离,可在运行时根据系统输出及声明的跨组件耦合约束计算)形式化描述了这种"局部相干、整体不相干"的失效模式。乘积结构二分法刻画了局部相干性足以保证全局一致性的条件,而瑞利商预测方法在四类关系中的三类上实现了与观测残差7%以内的匹配精度。分层Boyle-Dykstra投影算法能以确定性方式修复组合结果;任意有效的e过程实现序贯相干性监控。在包含四组大语言模型的中端模型面板(前沿模型面板重测见第5.5节)产生的1,876个集成团中,33%-94%的团存在ε*>0的情况,在比例分配规则下对应1,770个已结算赌注中每注+0.115纳特的遗憾值(当使用自身已相干化的投注者时,增益降至+0.006)。三种直观的大语言模型端缓解策略(检索增强、分区感知提示、聚合型大语言模型)均告失败或出现性能倒退。