Large Language Model (LLM) applications are vulnerable to prompt injection and context manipulation attacks that traditional security models cannot prevent. We introduce two novel primitives--authenticated prompts and authenticated context--that provide cryptographically verifiable provenance across LLM workflows. Authenticated prompts enable self-contained lineage verification, while authenticated context uses tamper-evident hash chains to ensure integrity of dynamic inputs. Building on these primitives, we formalize a policy algebra with four proven theorems providing protocol-level Byzantine resistance--even adversarial agents cannot violate organizational policies. Five complementary defenses--from lightweight resource controls to LLM-based semantic validation--deliver layered, preventative security with formal guarantees. Evaluation against representative attacks spanning 6 exhaustive categories achieves 100% detection with zero false positives and nominal overhead. We demonstrate the first approach combining cryptographically enforced prompt lineage, tamper-evident context, and provable policy reasoning--shifting LLM security from reactive detection to preventative guarantees.
翻译:大型语言模型(LLM)应用易受传统安全模型无法防范的提示注入与上下文操纵攻击。我们引入两种新型原语——认证提示与认证上下文——为LLM工作流提供可密码学验证的来源追溯。认证提示支持自包含的溯源验证,而认证上下文则利用防篡改哈希链确保动态输入的完整性。基于这些原语,我们形式化了一套策略代数,通过四条已证明定理提供协议级拜占庭容错能力——即使对抗性代理也无法违反组织策略。五项互补防御机制——从轻量级资源控制到基于LLM的语义验证——以形式化保证实现了分层预防性安全。针对涵盖6个完备类别的典型攻击进行评估,实现了100%检测率、零误报及可忽略的开销。我们首次展示了融合密码学强制的提示溯源、防篡改上下文与可证明策略推理的方法——将LLM安全从被动检测转向预防性保障。