Prompt injection defenses are often framed as semantic understanding problems and delegated to increasingly large neural detectors. For the first screening layer, however, the requirements are different: the detector runs on every request and therefore must be fast, deterministic, non-promptable, and auditable. We introduce Mirror, a data-curation design pattern that organizes prompt injection corpora into matched positive and negative cells so that a classifier learns control-plane attack mechanics rather than incidental corpus shortcuts. Using 5,000 strictly curated open-source samples -- the largest corpus supportable under our public-data validity contract -- we define a 32-cell mirror topology, fill 31 of those cells with public data, train a sparse character n-gram linear SVM, compile its weights into a static Rust artifact, and obtain 95.97\% recall and 92.07\% F1 on a 524-case holdout at sub-millisecond latency with no external model runtime dependencies. On the same holdout, our next line of defense, a 22-million-parameter Prompt Guard~2 model reaches 44.35\% recall and 59.14\% F1 at 49\,ms median and 324\,ms p95 latency. Linear models still leave residual semantic ambiguities such as use-versus-mention for later pipeline layers, but within that scope our results show that for L1 prompt injection screening, strict data geometry can matter more than model scale.
翻译:提示注入防御常被视作语义理解问题,并委托给日益庞大的神经检测器处理。然而,对于第一层筛选环节,其需求有所不同:检测器需处理每个请求,因此必须具备快速性、确定性、抗提示性及可审计性。本文提出Mirror(镜像)这一数据治理设计模式,通过将提示注入语料库组织为匹配的正负样本单元,使分类器学习控制平面攻击机制而非偶然的语料库捷径。基于5,000个严格筛选的开源样本(在公共数据有效性协议下可支持的最大规模语料库),我们定义了32单元镜像拓扑结构,使用公共数据填充其中31个单元,训练稀疏字符n-gram线性支持向量机,将其权重编译为静态Rust组件,最终在524例保留测试集上实现95.97%召回率与92.07% F1值,亚毫秒级延迟且无需外部模型运行时依赖。在同一测试集上,我们的下一层防御——拥有2,200万参数的Prompt Guard~2模型仅达到44.35%召回率与59.14% F1值,中位延迟49毫秒,p95延迟324毫秒。线性模型虽会遗留使用与提及等残余语义歧义问题供后续流水线处理,但实验结果表明:在L1提示注入筛查范畴内,严格的数据几何结构可能比模型规模更具决定性影响。