Current open-source prompt-injection detectors converge on two architectural choices: regular-expression pattern matching and fine-tuned transformer classifiers. Both share failure modes that recent work has made concrete. Regular expressions miss paraphrased attacks. Fine-tuned classifiers are vulnerable to adaptive adversaries: a 2025 NAACL Findings study reported that eight published indirect-injection defenses were bypassed with greater than fifty percent attack success rates under adaptive attacks. This work proposes seven detection techniques that each port a specific mechanism from a discipline outside large-language-model security: forensic linguistics, materials-science fatigue analysis, deception technology from network security, local-sequence alignment from bioinformatics, mechanism design from economics, spectral signal analysis from epidemiology, and taint tracking from compiler theory. Three of the seven techniques are implemented in the prompt-shield v0.4.1 release (Apache 2.0) and evaluated in a four-configuration ablation across six datasets including deepset/prompt-injections, NotInject, LLMail-Inject, AgentHarm, and AgentDojo. The local-alignment detector lifts F1 on deepset from 0.033 to 0.378 with zero additional false positives. The stylometric detector adds 11.1 percentage points of F1 on an indirect-injection benchmark. The fatigue tracker is validated via a probing-campaign integration test. All code, data, and reproduction scripts are released under Apache 2.0.
翻译:当前开源提示注入检测器集中在两种架构选择上:正则表达式模式匹配与微调后的Transformer分类器。二者共享的失效模式已被近期研究具体揭示:正则表达式无法识别改写型攻击;微调分类器易遭受自适应对手攻击——2025年NAACL Findings研究指出,在自适应攻击下,八种已发表的间接注入防御均被以超过50%的攻击成功率绕过。本文提出七项检测技术,每项技术分别移植自大语言模型安全领域之外的特定机制:司法语言学、材料科学中的疲劳分析、网络安全中的欺骗技术、生物信息学中的局部序列比对、经济学中的机制设计、流行病学中的频谱信号分析,以及编译器理论中的污点追踪。其中三项技术已在prompt-shield v0.4.1版本(Apache 2.0许可)中实现,并在包括deepset/prompt-injections、NotInject、LLMail-Inject、AgentHarm及AgentDojo在内的六个数据集上进行了四种配置的消融评估。局部比对检测器在零额外误报条件下,将deepset数据集上的F1值从0.033提升至0.378;笔迹风格检测器在间接注入基准测试上使F1值增加11.1个百分点;疲劳追踪器通过探测式活动集成测试完成验证。所有代码、数据及复现脚本均以Apache 2.0许可公开发布。