Hidden LLM prompts have appeared in online documents with increasing frequency. Their goal is to trigger indirect prompt injection attacks while remaining undetected from human oversight, to manipulate LLM-powered automated document processing systems, against applications as diverse as r\'esum\'e screeners through to academic peer review processes. Detecting hidden LLM prompts is therefore important for ensuring trust in AI-assisted human decision making. This paper presents the first principled approach to hidden LLM prompt detection in structured documents. We implement our approach in a prototype tool called PhantomLint. We evaluate PhantomLint against a corpus of 3,402 documents, including both PDF and HTML documents, and covering academic paper preprints, CVs, theses and more. We find that our approach is generally applicable against a wide range of methods for hiding LLM prompts from visual inspection, has a very low false positive rate (approx. 0.092%), is practically useful for detecting hidden LLM prompts in real documents, while achieving acceptable performance.
翻译:隐藏的LLM提示在在线文档中出现的频率日益增加。其目标是在规避人工审查的情况下触发间接提示注入攻击,从而操纵基于LLM的自动化文档处理系统,其影响范围涵盖从简历筛选到学术同行评审流程等多种应用场景。因此,检测隐藏的LLM提示对于确保AI辅助人类决策的可信度至关重要。本文提出了首个针对结构化文档中隐藏LLM提示的规范化检测方法。我们将该方法实现于名为PhantomLint的原型工具中。我们在包含3,402份文档的语料库上对PhantomLint进行评估,这些文档涵盖PDF和HTML格式,涉及学术论文预印本、简历、学位论文等多种类型。研究发现,我们的方法能普遍适用于对抗各类规避视觉审查的LLM提示隐藏技术,具有极低的误报率(约0.092%),在实际检测真实文档中的隐藏LLM提示时具有实用价值,同时保持了可接受的性能表现。