Adaptive Focus Memory for Language Models

Large language models (LLMs) are increasingly deployed in multi-turn dialogue settings, yet their behavior remains bottlenecked by naive history management strategies. Replaying the full conversation at every turn is simple but costly, while recency-based truncation or static summarization often causes early, high-impact user constraints to drift out of effective context. As a result, models may retain text without reliably applying it when it matters. We present Adaptive Focus Memory (AFM), a lightweight context management system that dynamically assigns each past message one of three fidelity levels: Full, Compressed, or Placeholder, based on semantic relevance, temporal decay, and importance classification. AFM packs messages chronologically under a fixed token budget, preserving critical constraints at high fidelity while allowing low-importance context to degrade gracefully. We evaluate AFM on two multi-turn dialogue benchmarks designed to stress long-horizon constraint preservation: a safety-critical travel scenario involving a user with a severe peanut allergy, and a policy-critical tax compliance scenario involving an illegal evasion request. Under strict grading that requires both explicit constraint recall and appropriately conditioned generation, AFM succeeds in 83.3 percent of allergy runs where all baseline strategies fail, and preserves correct refusal behavior on the tax benchmark. These results demonstrate that effective dialogue memory requires more than retaining prior text. Selectively allocating fidelity across past messages enables reliable constraint preservation under bounded context growth, without modifying model weights or introducing external retrieval infrastructure. We release an open-source implementation of AFM compatible with OpenAI-style chat APIs to support reproducible research and practical deployment.

翻译：大型语言模型（LLM）在多轮对话场景中的部署日益广泛，但其性能仍受限于简单历史管理策略的瓶颈。每轮对话完全重放历史内容虽简单但成本高昂，而基于时效性的截断或静态摘要方法常导致早期关键用户约束逐渐脱离有效上下文，致使模型可能保留文本却无法在需要时可靠应用。本文提出自适应聚焦记忆（AFM）——一种轻量级上下文管理系统，该系统依据语义相关性、时间衰减及重要性分类，动态为每条历史消息分配三种保真度级别之一：完整、压缩或占位符。AFM在固定令牌预算下按时间顺序封装消息，以高保真度保留关键约束，同时允许低重要性语境平缓退化。我们在两个旨在测试长程约束保持能力的多轮对话基准上评估AFM：涉及严重花生过敏用户的安全关键型旅行场景，以及涉及非法避税请求的政策关键型税务合规场景。在要求显式约束回忆与条件化生成的双重严格评估下，AFM在过敏测试案例中取得83.3%的成功率（所有基线策略均失败），并在税务基准中保持正确的拒绝行为。这些结果表明，有效的对话记忆不仅需要保留历史文本，还需通过对过往消息的选择性保真度分配，在有限上下文增长条件下实现可靠约束保持，且无需修改模型权重或引入外部检索架构。我们开源了兼容OpenAI风格聊天API的AFM实现，以支持可复现研究及实际部署。