Large Language Model (LLM) agents are increasingly deployed in complex, multi-step workflows involving planning, tool use, reflection, and interaction with external knowledge systems. These workflows generate rapidly expanding contexts that must be curated, transformed, and compressed to maintain fidelity, avoid attention dilution, and reduce inference cost. Prior work on summarization and query-aware compression largely ignores the multi-step, plan-aware nature of agentic reasoning. In this work, we introduce PAACE (Plan-Aware Automated Context Engineering), a unified framework for optimizing the evolving state of LLM agents through next-k-task relevance modeling, plan-structure analysis, instruction co-refinement, and function-preserving compression. PAACE comprises (1) PAACE-Syn, a large-scale generator of synthetic agent workflows annotated with stepwise compression supervision, and (2) PAACE-FT, a family of distilled, plan-aware compressors trained from successful teacher demonstrations. Experiments on long-horizon benchmarks (AppWorld, OfficeBench, and 8-Objective QA) demonstrate that PAACE consistently improves agent correctness while substantially reducing context load. On AppWorld, PAACE achieves higher accuracy than all baselines while lowering peak context and cumulative dependency. On OfficeBench and multi-hop QA, PAACE improves both accuracy and F1, achieving fewer steps, lower peak tokens, and reduced attention dependency. Distilled PAACE-FT retains 97 percent of the teacher's performance while reducing inference cost by over an order of magnitude, enabling practical deployment of plan-aware compression with compact models.
翻译:大型语言模型(LLM)智能体正越来越多地部署于涉及规划、工具使用、反思以及与外部知识系统交互的复杂多步骤工作流中。这些工作流会产生快速扩展的上下文,必须对其进行整理、转换和压缩,以保持保真度、避免注意力稀释并降低推理成本。先前关于摘要和查询感知压缩的研究大多忽略了智能体推理的多步骤、计划感知特性。本文中,我们提出了PAACE(计划感知的自动化上下文工程),这是一个通过下一k任务相关性建模、计划结构分析、指令协同精炼和函数保持压缩来优化LLM智能体演化状态的统一框架。PAACE包含:(1)PAACE-Syn,一个大规模合成智能体工作流生成器,带有逐步压缩监督标注;(2)PAACE-FT,一系列通过成功教师演示训练得到的、经过蒸馏的计划感知压缩器。在长视野基准测试(AppWorld、OfficeBench和8-Objective QA)上的实验表明,PAACE在显著降低上下文负载的同时,持续提升了智能体的正确性。在AppWorld上,PAACE实现了比所有基线更高的准确率,同时降低了峰值上下文和累积依赖。在OfficeBench和多跳QA任务上,PAACE同时提升了准确率和F1分数,实现了更少的步骤、更低的峰值令牌数和更弱的注意力依赖。经过蒸馏的PAACE-FT保留了教师模型97%的性能,同时将推理成本降低了一个数量级以上,从而使得计划感知压缩能够通过紧凑模型实现实际部署。