AI-powered malware increasingly exploits cloud-hosted generative-AI services and large language models (LLMs) as analysis engines for reconnaissance and code generation. Simultaneously, enterprise uploads expose sensitive documents to third-party AI vendors. Both threats converge at the AI service ingestion boundary, yet existing defenses focus on endpoints and network perimeters, leaving organizations with limited visibility once plaintext reaches an LLM service. To address this, we present a framework based on steganographic canary files: realistic documents carrying cryptographically derived identifiers embedded via complementary encoding channels. A pre-ingestion filter extracts and verifies these identifiers before LLM processing, enabling passive, format-agnostic detection without semantic classification. We support two modes of operation where Mode A marks existing sensitive documents with layered symbolic encodings (whitespace substitution, zero-width character insertion, homoglyph substitution), while Mode B generates synthetic canary documents using linguistic steganography (arithmetic coding over GPT-2), augmented with compatible symbolic layers. We model increasing document pre-processing and adversarial capability for both modes via a four-tier transport-transform taxonomy: All methods achieve 100% identifier recovery under benign and sanitization workflows (Tiers 1-2). The hybrid Mode B maintains 97% through targeted adversarial transforms (Tier 3). An end-to-end case study against an LLM-orchestrated ransomware pipeline confirms that both modes detect and block canary-bearing uploads before file encryption begins. To our knowledge, this is the first framework to systematically combine symbolic and linguistic text steganography into layered canary documents for detecting unauthorized LLM processing, evaluated against a transport-threat taxonomy tailored to AI malware.
翻译:AI驱动的恶意软件日益利用云托管的生成式AI服务与大型语言模型(LLMs)作为分析引擎,执行侦察与代码生成。与此同时,企业上传行为将敏感文档暴露给第三方AI供应商。这两种威胁在AI服务摄取边界处交汇,然而现有防御措施集中于端点与网络边界,导致组织在明文到达LLM服务后缺乏可见性。为此,我们提出一种基于隐写金丝雀文件的框架:通过互补编码通道嵌入加密衍生标识符的真实文档。在LLM处理前,预摄取过滤器提取并验证这些标识符,从而无需语义分类即可实现被动且格式无关的检测。我们支持两种操作模式:模式A利用分层符号编码(空白替换、零宽字符插入、同形异义替换)标记现有敏感文档;模式B通过语言隐写(基于GPT-2的算术编码)生成合成金丝雀文档,并辅以兼容的符号层。我们通过四层传输-变换分类法对两种模式下的文档预处理与对抗能力进行建模:在良性环境与清洗工作流(第1-2层)中,所有方法均实现100%标识符恢复;混合模式B在针对性对抗变换(第3层)中仍保持97%恢复率。针对LLM编排的勒索软件管道的端到端案例研究证实,两种模式均能在文件加密前检测并阻断携带金丝雀的上传行为。据我们所知,这是首个系统性地将符号隐写与语言隐写整合至分层金丝雀文档以检测未授权LLM处理的框架,其性能依据面向AI恶意软件的传输-威胁分类法进行评估。