Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.
翻译:基于KV缓存的事后上下文擦除具有挑战性,因为局部编辑会产生全局影响:一旦某个跨度被处理,其影响会传播到所有后续令牌的缓存状态中。这一问题自然出现在长上下文大语言模型应用中,例如过时检索事实、错误工具观测、撤销用户偏好或有害提示注入可能仅在预填充后才被识别。精确擦除必须重新计算被删除跨度之后的所有令牌,使得计算成本取决于后缀长度而非擦除跨度长度。我们提出KVEraser,一种基于学习的KV缓存编辑方法,用于高效的局部上下文擦除。给定已处理上下文和待删除跨度,KVEraser仅用学习到的引导状态替换被擦除间隔的KV状态,同时保持其余缓存不变。为学习可迁移的擦除机制,我们构建了两阶段训练流水线:通用跨度邻居预训练教导擦除器抑制被擦除跨度的影响,而任务特定微调使该能力适应下游场景。实验表明,KVEraser在跨1K--32K上下文长度的域内任务中,擦除后性能几乎匹配完全重计算,同时延迟仅增加24%,而完全重计算延迟增加17.6倍。KVEraser还泛化到含有害事实干扰的未见过长文档问答任务,在近似基线中达到最佳性能,相比完全重计算获得3--4倍加速。