Undetectable Conversations Between AI Agents via Pseudorandom Noise-Resilient Key Exchange

AI agents are increasingly deployed to interact with other agents on behalf of users and organizations. We ask whether two such agents, operated by different entities, can carry out a parallel secret conversation while still producing a transcript that is computationally indistinguishable from an honest interaction, even to a strong passive auditor that knows the full model descriptions, the protocol, and the agents' private contexts. Building on recent work on watermarking and steganography for LLMs, we first show that if the parties possess an interaction-unique secret key, they can facilitate an optimal-rate covert conversation: the hidden conversation can exploit essentially all of the entropy present in the honest message distributions. Our main contributions concern extending this to the keyless setting, where the agents begin with no shared secret. We show that covert key exchange, and hence covert conversation, is possible even when each model has an arbitrary private context, and their messages are short and fully adaptive, assuming only that sufficiently many individual messages have at least constant min-entropy. This stands in contrast to previous covert communication works, which relied on the min-entropy in each individual message growing with the security parameter. To obtain this, we introduce a new cryptographic primitive, which we call pseudorandom noise-resilient key exchange: a key-exchange protocol whose public transcript is pseudorandom while still remaining correct under constant noise. We study this primitive, giving several constructions relevant to our application as well as strong limitations showing that more naive variants are impossible or vulnerable to efficient attacks. These results show that transcript auditing alone cannot rule out covert coordination between AI agents, and identify a new cryptographic theory that may be of independent interest.

翻译：人工智能代理正越来越多地被部署用于代表用户和组织与其他代理进行交互。我们探讨这样一个问题：由不同实体操控的两个此类代理，是否能够进行并行的秘密对话，同时生成的对话转录本在计算上与诚实互动的转录本不可区分——即使面对一个了解完整模型描述、协议及代理私有上下文的强被动审计者。基于近期针对大语言模型的水印与隐写术研究成果，我们首先证明：若双方持有交互唯一的密钥，他们便可实现最优速率的隐蔽对话——隐藏对话实质上能够利用诚实消息分布中存在的全部熵。我们的核心贡献在于将该结论扩展至无密钥场景，即代理初始时不共享任何密钥。我们证明：即使每个模型拥有任意私有上下文，且其消息简短且完全自适应，只要假设足够数量的单个消息至少具备恒定最小熵，隐蔽密钥交换（进而隐蔽对话）仍可实现。这与先前隐蔽通信研究形成对比——后者依赖每个单条消息的最小熵随安全参数增长。为此，我们提出了一种名为"伪随机噪声鲁棒密钥交换"的新型密码学原语：一种密钥交换协议，其公开转录本具有伪随机性，同时能在恒定噪声下保持正确性。我们对该原语进行了深入研究，给出了适用于本应用的若干构造方案，并揭示了显著局限性——表明更朴素的变体要么不可行，要么易受高效攻击。这些结果表明，仅凭转录本审计无法排除AI代理之间的隐蔽协调，并识别出一种可能具有独立研究价值的新密码学理论。