Hierarchical Certified Semantic Commitment for Byzantine-Resilient LLM-Agent Collaboration

Byzantine collaboration among large-language-model agents requires a finality-control primitive: given delivered stochastic, structured natural-language proposals, the protocol must decide whether the round supports a commit, what kind of commit, or a typed safe abort. Naive aggregation hides this choice behind a single verdict; classical Byzantine fault tolerance hides it behind byte-identity that LLM proposals do not satisfy. We introduce Hierarchical Certified Semantic Commitment (H-CSC), a BFT-inspired protocol that converts embedding-derived finality signals over verdict-conditioned proposal groups into one of three typed outcomes: a semantic_commit (a 2f+1 within-verdict semantic core backs the verdict, emitting a parameter-bound digest over the quantised aggregate), a verdict_commit (strong verdict margin but dispersed semantic rationale, emitting a verdict-level certificate without claiming a semantic aggregate), or an explicit abort with a typed reason. The contribution is typed finality, not raw commit accuracy. On a controlled semantic-poisoning diagnostic (BCS_v1, 120 episodes), H-CSC commits with low angular deviation on BFT-feasible buckets (0.31 to 2.04 degrees) and aborts 100% of beyond-BFT rounds (n<3f+1) as intended. On a real LLM-agent claim-verification benchmark (MVR-50, 50 tasks) under paired static and rushing Byzantine attacks, H-CSC commits 0.90/0.92 with honest-reference-invalid rates of 0.02/0.00, statistically matching a strong certificate-emitting verdict-only baseline. Unlike that baseline, H-CSC also emits an embedding-backed semantic_commit digest on 74%/72% of rounds, supplying typed provenance. A strict-semantic ablation commits only 0.54/0.48, showing the verdict-level fallback is necessary for coverage (+0.36/+0.44) at the same <=0.04 safety floor; a 100-task cross-model check across four LLMs preserves invalid_hmaj within 0.00 to 0.03.

翻译：大语言模型智能体间的拜占庭协作需要一种最终性控制原语：给定已交付的随机结构化自然语言提案，协议必须判定该轮是否支持提交、提交类型，或带类型的安全中止。朴素聚合将这一选择隐藏于单一裁决背后；经典拜占庭容错将其隐藏于大语言模型提案无法满足的字节一致性之后。我们提出层级化认证语义承诺（H-CSC），一种受BFT启发的协议，将基于判决条件提案组上嵌入导出的最终性信号转化为三种类型化结果之一：语义承诺（2f+1个判决内语义核心支持裁决，输出基于量化聚合的参数绑定摘要）、判决承诺（强裁决边界但语义理由分散，输出不含语义聚合的判决级证书），或带类型原因显式中止。其贡献在于类型化最终性，而非原始提交精度。在受控语义投毒诊断（BCS_v1，120个回合）中，H-CSC在BFT可行桶上以低角度偏差（0.31至2.04度）提交，并按预期100%中止超BFT回合（n<3f+1）。在真实LLM智能体声明验证基准（MVR-50，50项任务）上，面对配对静态与急速拜占庭攻击时，H-CSC提交率为0.90/0.92，诚实参考无效率为0.02/0.00，在统计上与强证书生成型纯判决基线持平。与基线不同，H-CSC还在74%/72%的回合上输出嵌入支持的语义承诺摘要，提供类型化溯源。严格语义消融仅提交0.54/0.48，表明判决级回退对覆盖率提升（+0.36/+0.44）且保持≤0.04安全底线是必要的；跨四组LLM的100任务交叉模型检查将无效多数维持在0.00至0.03之间。