Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Qizheng Zhang,Changran Hu,Shubhangi Upasani,Boyuan Ma,Fenglu Hong,Vamsidhar Kamanuru,Jay Rainton,Chen Wu,Mengmeng Ji,Hanchen Li,Urmish Thakker,James Zou,Kunle Olukotun

from arxiv, ICLR 2026; 32 pages

Large language model (LLM) applications such as agents and domain-specific reasoning increasingly rely on context adaptation: modifying inputs with instructions, strategies, or evidence, rather than weight updates. Prior approaches improve usability but often suffer from brevity bias, which drops domain insights for concise summaries, and from context collapse, where iterative rewriting erodes details over time. We introduce ACE (Agentic Context Engineering), a framework that treats contexts as evolving playbooks that accumulate, refine, and organize strategies through a modular process of generation, reflection, and curation. ACE prevents collapse with structured, incremental updates that preserve detailed knowledge and scale with long-context models. Across agent and domain-specific benchmarks, ACE optimizes contexts both offline (e.g., system prompts) and online (e.g., agent memory), consistently outperforming strong baselines: +10.6% on agents and +8.6% on finance, while significantly reducing adaptation latency and rollout cost. Notably, ACE could adapt effectively without labeled supervision and instead by leveraging natural execution feedback. On the AppWorld leaderboard, ACE matches the top-ranked production-level agent on the overall average and surpasses it on the harder test-challenge split, despite using a smaller open-source model. These results show that comprehensive, evolving contexts enable scalable, efficient, and self-improving LLM systems with low overhead.

翻译：大型语言模型应用（如智能体与领域特定推理）日益依赖上下文自适应：通过指令、策略或证据修改输入，而非更新模型权重。现有方法虽提升了可用性，却常受制于简洁性偏差（因追求摘要简洁而丢失领域洞见）与上下文崩塌（迭代重写随时间推移导致细节流失）。本文提出ACE（智能体上下文工程）框架，将上下文视为可演化的操作手册——通过生成、反思与策展的模块化流程，实现策略的积累、精炼与组织。ACE采用结构化增量更新以防止上下文崩塌，既保留领域知识细节，又可随长上下文模型扩展。在智能体与领域专用基准测试中，ACE能同时优化离线上下文（如系统提示）与在线上下文（如智能体记忆），显著超越强基线：智能体任务提升+10.6%，金融领域提升+8.6%，同时大幅降低自适应延迟与部署成本。值得注意的是，ACE无需标签监督即可有效自适应，仅需利用自然执行反馈。在AppWorld排行榜上，ACE在总体均分上匹敌排名前列的生产级智能体，并在更具挑战性的测试-挑战子集上超越后者——尽管其使用更小的开源模型。实验表明：全面且可演化的上下文能够以低开销实现可扩展、高效且自我改进的大语言模型系统。