Multi-agent LLM systems are increasingly used to solve complex tasks through decomposition, debate, specialization, and ensemble reasoning. However, these systems are usually evaluated in terms of robustness: whether performance is preserved under perturbation. This paper studies a different question: whether semantic stress exposes structured variation that could support future antifragile learning. We introduce CAFE, a statistical framework for detecting antifragility-compatible regimes in multi-agent architectures. CAFE models a controlled expected distribution of semantic stressors, reconstructs an architecture-specific observed effective stress distribution from multi-dimensional judge signals, and compares both distributions using a distributional Jensen Gap under a convex stress potential. A positive gap does not imply immediate performance improvement; instead, it indicates a convex-expansive deformation of the observed stress distribution, suggesting that the architecture exposes learnable stress structure. We evaluate CAFE on a banking-risk analysis benchmark with five multi-agent architectures: flat, hierarchical, debate, meta-adaptive, and ensemble. Across all architectures, semantic stress reduces average judged quality by roughly one third. Yet all architectures exhibit positive distributional Jensen Gaps with bootstrap confidence intervals above zero. These results show that immediate quality degradation can coexist with statistically detectable antifragility-compatible stress geometry. CAFE is therefore not an antifragile learner itself, but a measurement layer for identifying when and where antifragility learning may be worth applying.
翻译:多智能体大语言模型系统日益广泛用于通过分解、辩论、专业化和集成推理来解决复杂任务。然而,这些系统通常以鲁棒性(即性能在扰动下是否保持不变)为评估标准。本文研究一个不同的问题:语义压力是否能够暴露可能支持未来反脆弱学习的结构化变异。我们提出CAFE——一个用于检测多智能体架构中反脆弱性兼容态的统计框架。CAFE对受控的语义压力源期望分布进行建模,从多维评判信号中重构架构特定的观测有效压力分布,并在凸性压力势下利用分布性Jensen Gap比较这两个分布。正向缺口并不直接意味着性能即时提升;相反,它表明观测压力分布发生了凸性扩张变形,暗示该架构暴露出可学习的压力结构。我们在一个银行风险分析基准上评估了CAFE,使用了五种多智能体架构:扁平架构、层级架构、辩论架构、元自适应架构和集成架构。在所有架构中,语义压力使平均评判质量降低约三分之一。然而,所有架构均表现出正向分布性Jensen Gap,且其自助法置信区间高于零。这些结果表明,即时质量下降与统计上可检测的反脆弱性兼容压力几何结构可以共存。因此,CAFE本身并非反脆弱学习器,而是一个用于识别何时何地值得应用反脆弱学习的测量层。