SMSR: Certified Defence Against Runtime Memory Poisoning in Persistent LLM Agent Systems

Retrieval-augmented generation (RAG) agents increasingly run with persistent memory that accumulates across user sessions. This creates a new attack surface: an adversary interacting only through normal channels can inject crafted memories that, once retrieved, steer the agent's responses for future users, without touching model weights or code. We call this Multi-Session Memory Poisoning (MSMP) and show that no existing defence certifies against it; static-corpus defences (RobustRAG, ReliabilityRAG) assume a fixed knowledge base, and heuristic filters are bypassed by fluent enterprise-style text. We present Signed Memory with Smoothed Retrieval (SMSR), the first defence with a certified robustness bound for this setting. Component 1 adds HMAC-SHA256 provenance at write time, blocking unsigned injection. Component 2 applies randomised memory ablation with verdict-based majority voting at query time, bounding the influence of authenticated adversaries. We prove that no provenance-free retrieval-time filter can certify against adaptive injection, derive a hypergeometric certificate for Component 2, and formalise the Consistent Minority Effect, whereby a consistent adversarial answer wins string-based voting as a numerical minority while verdict-based voting removes it. Across 15 enterprise scenarios (3,150 repeated trials), Component 1 cuts attack success from 93-100% to 0% for all unsigned variants. For an authenticated adversary with a single injection, Component 2 holds success to 8.0% (95% CI [5.8, 10.9], n=450), below the certified worst case. In an end-to-end query-only attack where the agent itself writes the poison rather than it being pre-seeded, SMSR reduces success from 65.3% to 5.3% (n=150, non-overlapping CIs) on a live agent stack. Clean-query utility is 90% (Component 1) and 85% (combined).

翻译：检索增强生成（RAG）智能体正日益采用跨用户会话累积的持久化内存运行。这催生了新型攻击面：仅通过常规通道交互的对手可注入精心构造的记忆，这些记忆一旦被检索，将误导智能体对未来用户的响应——整个过程无需触及模型权重或代码。我们称此为多会话内存投毒（MSMP），并证明现有防御均无法对此提供可认证保护：静态语料防御（RobustRAG、ReliabilityRAG）假设知识库固定不变，而启发式过滤器则会被流畅的企业级文本绕过。我们提出带平滑检索的有符号记忆（SMSR），这是针对该场景首个具备可认证鲁棒边界的防御方案。组件1在写入时采用HMAC-SHA256实现来源可溯性，阻断未签名注入；组件2在查询时执行基于判决多数投票的随机记忆消融，限制认证后对手的影响范围。我们证明：任何无来源验证的检索阶段过滤器均无法认证抵御自适应注入；推导出组件2的超几何证书，并形式化描述"一致少数效应"——即当一致对抗性答案作为数值少数派赢得基于字符串的多数投票时，基于判决的投票可将此消除。在15个企业级场景（3150次重复试验）中，组件1将所有未签名变体的攻击成功率从93-100%降至0%。针对单次注入的认证后对手，组件2将成功率限制在8.0%（95% CI [5.8, 10.9]，n=450），低于认证最坏情况。在端到端纯查询攻击中（由智能体自主撰写投毒内容而非预植入），SMSR将实时智能体栈上的攻击成功率从65.3%降至5.3%（n=150，置信区间无重叠）。干净查询效用为90%（组件1）与85%（联合系统）。