Multiagent AI systems require consistent communication, but we lack methods to verify that agents share the same understanding of the terms used. Natural language is interpretable but vulnerable to semantic drift, while learned protocols are efficient but opaque. We propose a certification protocol based on the stimulus-meaning model, where agents are tested on shared observable events and terms are certified if empirical disagreement falls below a statistical threshold. In this protocol, agents restricting their reasoning to certified terms ("core-guarded reasoning") achieve provably bounded disagreement. We also outline mechanisms for detecting drift (recertification) and recovering shared vocabulary (renegotiation). In simulations with varying degrees of semantic divergence, core-guarding reduces disagreement by 72-96%. In a validation with fine-tuned language models, disagreement is reduced by 51%. Our framework provides a first step towards verifiable agent-to-agent communication.
翻译:多智能体人工智能系统需要保持通信的一致性,但目前缺乏验证智能体对所用术语具有相同理解的方法。自然语言具有可解释性但易受语义漂移影响,而学习得到的通信协议虽高效却不透明。我们提出一种基于刺激-意义模型的认证协议:通过让智能体在共享可观测事件上进行测试,当经验性分歧低于统计阈值时对术语进行认证。在该协议中,智能体若将推理限制在已认证术语内(即“核心保护推理”),可获得可证明的有界分歧保证。我们还设计了检测语义漂移的机制(重认证)与恢复共享词汇的方法(重协商)。在具有不同程度语义分歧的仿真实验中,核心保护机制将分歧降低了72-96%。在使用微调语言模型的验证实验中,分歧降低了51%。本框架为实现可验证的多智能体通信迈出了第一步。