Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents restricting their reasoning to certified terms ("core-guarded reasoning") achieve provably bounded disagreement. We also outline mechanisms for detecting drift (recertification) and recovering shared vocabulary (renegotiation). In simulations with varying degrees of semantic divergence, core-guarding reduces disagreement by 72-96%. In a validation with fine-tuned language models, disagreement is reduced by 51%. Our framework provides a first step towards verifiable agent-to-agent communication.
翻译:多智能体人工智能系统需要一致的通信,但目前我们缺乏验证不同智能体是否共享相同术语理解的方法。自然语言可解释但易受语义漂移影响,而学习型协议虽高效却不透明。我们提出一种基于刺激-意义模型的认证协议:通过让智能体接受共享可观测事件的测试,若经验性分歧低于统计阈值,则认证相应术语。在该协议中,仅使用认证术语进行推理的智能体(即“核心保护推理”)可实现可证明的有界分歧。我们还概述了检测漂移(重新认证)与恢复共享词汇(重新协商)的机制。在具有不同程度语义分歧的仿真中,核心保护将分歧降低了72%至96%。在与微调语言模型的验证中,分歧降低了51%。我们的框架为实现可验证的智能体间通信迈出了初步一步。