A meaningful text can be hidden inside another, completely different yet still coherent and plausible, text of the same length. For example, a tweet containing a harsh political critique could be embedded in a tweet that celebrates the same political leader, or an ordinary product review could conceal a secret manuscript. This uncanny state of affairs is now possible thanks to Large Language Models, and in this paper we present Calgacus, a simple and efficient protocol to achieve it. We show that even modest 8-billion-parameter open-source LLMs are sufficient to obtain high-quality results, and a message as long as this abstract can be encoded and decoded locally on a laptop in seconds. The existence of such a protocol demonstrates a radical decoupling of text from authorial intent, further eroding trust in written communication, already shaken by the rise of LLM chatbots. We illustrate this with a concrete scenario: a company could covertly deploy an unfiltered LLM by encoding its answers within the compliant responses of a safe model. This possibility raises urgent questions for AI safety and challenges our understanding of what it means for a Large Language Model to know something.
翻译:一段有意义的文本可以被隐藏于另一段长度相同、内容完全不同但仍保持连贯性与合理性的文本之中。例如,一条包含尖锐政治批评的推文可被嵌入歌颂同一政治领袖的推文中,或一篇普通产品评论可隐藏秘密手稿。这种离奇现象如今因大型语言模型而成为可能,本文提出Calgacus——一种实现该目标的简单高效协议。我们证明,即使是规模适中的80亿参数开源LLM也足以获得高质量结果,且如本摘要长度的信息可在笔记本电脑上以秒为单位完成本地编码与解码。该协议的存在表明文本与作者意图间的根本性解耦,进一步侵蚀了本就因LLM聊天机器人兴起而动摇的书面通信信任。我们通过具体场景说明:企业可通过将未过滤LLM的应答编码至安全模型的合规响应中,实现隐蔽部署。这种可能性为AI安全提出了紧迫问题,并挑战了我们对大型语言模型"知晓"某事物的理解。