Safeguarding LLMs Against Misuse and AI-Driven Malware Using Steganographic Canaries

AI-powered malware increasingly exploits cloud-hosted generative-AI services and large language models (LLMs) as analysis engines for reconnaissance and code generation. Simultaneously, enterprise uploads expose sensitive documents to third-party AI vendors. Both threats converge at the AI service ingestion boundary, yet existing defenses focus on endpoints and network perimeters, leaving organizations with limited visibility once plaintext reaches an LLM service. To address this, we present a framework based on steganographic canary files: realistic documents carrying cryptographically derived identifiers embedded via complementary encoding channels. A pre-ingestion filter extracts and verifies these identifiers before LLM processing, enabling passive, format-agnostic detection without semantic classification. We support two modes of operation where Mode A marks existing sensitive documents with layered symbolic encodings (whitespace substitution, zero-width character insertion, homoglyph substitution), while Mode B generates synthetic canary documents using linguistic steganography (arithmetic coding over GPT-2), augmented with compatible symbolic layers. We model increasing document pre-processing and adversarial capability for both modes via a four-tier transport-transform taxonomy: All methods achieve 100% identifier recovery under benign and sanitization workflows (Tiers 1-2). The hybrid Mode B maintains 97% through targeted adversarial transforms (Tier 3). An end-to-end case study against an LLM-orchestrated ransomware pipeline confirms that both modes detect and block canary-bearing uploads before file encryption begins. To our knowledge, this is the first framework to systematically combine symbolic and linguistic text steganography into layered canary documents for detecting unauthorized LLM processing, evaluated against a transport-threat taxonomy tailored to AI malware.

翻译：AI驱动的恶意软件日益利用云托管的生成式AI服务与大型语言模型（LLMs）作为分析引擎，执行侦察与代码生成。与此同时，企业上传行为将敏感文档暴露给第三方AI供应商。这两种威胁在AI服务摄取边界处交汇，然而现有防御措施集中于端点与网络边界，导致组织在明文到达LLM服务后缺乏可见性。为此，我们提出一种基于隐写金丝雀文件的框架：通过互补编码通道嵌入加密衍生标识符的真实文档。在LLM处理前，预摄取过滤器提取并验证这些标识符，从而无需语义分类即可实现被动且格式无关的检测。我们支持两种操作模式：模式A利用分层符号编码（空白替换、零宽字符插入、同形异义替换）标记现有敏感文档；模式B通过语言隐写（基于GPT-2的算术编码）生成合成金丝雀文档，并辅以兼容的符号层。我们通过四层传输-变换分类法对两种模式下的文档预处理与对抗能力进行建模：在良性环境与清洗工作流（第1-2层）中，所有方法均实现100%标识符恢复；混合模式B在针对性对抗变换（第3层）中仍保持97%恢复率。针对LLM编排的勒索软件管道的端到端案例研究证实，两种模式均能在文件加密前检测并阻断携带金丝雀的上传行为。据我们所知，这是首个系统性地将符号隐写与语言隐写整合至分层金丝雀文档以检测未授权LLM处理的框架，其性能依据面向AI恶意软件的传输-威胁分类法进行评估。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

综述：面向移动端大语言模型的隐私与安全

专知会员服务

19+阅读 · 2025年9月7日

《信息战中基于大语言模型的AI代理红蓝队对抗沙盒方法：探索反信息、提示注入与AI素养中的人类控制》最新报告

专知会员服务

27+阅读 · 2025年5月29日

【新书】大规模语言模型的隐私与安全，

专知会员服务

29+阅读 · 2024年12月4日

设计和构建强大的大语言模型智能体

专知会员服务

55+阅读 · 2024年10月6日