Large language model (LLM)-based agents have been successfully deployed in many tool-augmented settings, but their scalability is fundamentally constrained by context length. Existing context-folding methods mitigate this issue by summarizing past interactions, yet they are typically designed for single-query or single-intent scenarios. In more realistic user-centric dialogues, we identify two major failure modes: (i) they irreversibly discard fine-grained constraints and intermediate facts that are crucial for later decisions, and (ii) their summaries fail to track evolving user intent, leading to omissions and erroneous actions. To address these limitations, we propose U-Fold, a dynamic context-folding framework tailored to user-centric tasks. U-Fold retains the full user--agent dialogue and tool-call history but, at each turn, uses two core components to produce an intent-aware, evolving dialogue summary and a compact, task-relevant tool log. Extensive experiments on $τ$-bench, $τ^2$-bench, VitaBench, and harder context-inflated settings show that U-Fold consistently outperforms ReAct (achieving a 71.4% win rate in long-context settings) and prior folding baselines (with improvements of up to 27.0%), particularly on long, noisy, multi-turn tasks. Our study demonstrates that U-Fold is a promising step toward transferring context-management techniques from single-query benchmarks to realistic user-centric applications.
翻译:基于大语言模型(LLM)的智能体已在众多工具增强场景中成功部署,但其可扩展性从根本上受限于上下文长度。现有的上下文折叠方法通过总结过往交互来缓解这一问题,然而这些方法通常针对单次查询或单一意图场景设计。在更贴近现实的用户中心化对话中,我们识别出两大失效模式:(i)它们不可逆地丢弃了对后续决策至关重要的细粒度约束与中间事实;(ii)其摘要无法追踪动态演变的用户意图,导致信息遗漏与错误操作。为应对这些局限,我们提出U-Fold——一个专为用户中心化任务设计的动态上下文折叠框架。U-Fold完整保留用户-智能体对话及工具调用历史,但在每一轮交互中,通过两个核心组件生成意图感知的动态对话摘要与紧凑的任务相关工具日志。在$τ$-bench、$τ^2$-bench、VitaBench及更具挑战性的上下文膨胀设置上的大量实验表明,U-Fold持续优于ReAct(在长上下文设置中取得71.4%的胜率)及现有折叠基线方法(最高提升27.0%),尤其在冗长、嘈杂的多轮任务中表现突出。本研究证明,U-Fold是将上下文管理技术从单次查询基准向现实用户中心化应用迁移的重要进展。