面向LLM智能体的复杂策略文档分析与内化 (Analyzing and Internalizing Complex Policy Documents for LLM Agents)

Large Language Model (LLM)-based agentic systems rely on in-context policy documents encoding diverse business rules. As requirements grow, these documents expand rapidly, causing high computational overhead. This motivates developing internalization methods that embed policy documents into model priors while preserving performance. Prior prompt compression work targets generic prompts, but agentic policy documents span multiple complexity levels and require deeper reasoning, making internalization harder. We introduce CC-Gen, an agentic benchmark generator with Controllable Complexity across four levels, enabling systematic evaluation of agents' ability to handle complexity and offering a unified framework for assessing policy internalization. Our analysis shows that complex policy specifications governing workflows pose major reasoning challenges. Supporting internalization with gold user agent interaction trajectories containing chain-of-thought (CoT) annotations via supervised fine-tuning (SFT) is data-intensive and degrades sharply as policy complexity increases. To mitigate data and reasoning burdens, we propose Category-Aware Policy Continued Pretraining (CAP-CPT). Our automated pipeline parses policy documents to extract key specifications, grouping them into factual, behavioral, and conditional categories, and isolating complex conditions that drive workflow complexity. This guides targeted data synthesis and enables agents to internalize policy information through an autoregressive pretraining loss. Experiments show CAP-CPT improves SFT baselines in all settings, with up to 41% and 22% gains on Qwen-3-32B, achieving 97.3% prompt length reduction on CC-Gen and further enhancing tau-Bench with minimal SFT data.

翻译：基于大型语言模型（LLM）的智能体系统依赖于编码多样化业务规则的上下文策略文档。随着需求增长，这些文档迅速膨胀，导致高昂的计算开销。这促使我们开发将策略文档嵌入模型先验同时保持性能的内化方法。现有的提示压缩工作主要针对通用提示，但智能体策略文档跨越多个复杂度层级且需要更深层推理，使得内化更为困难。我们提出了CC-Gen——一个具有四个可控制复杂度层级的智能体基准生成器，能够系统评估智能体处理复杂度的能力，并为策略内化评估提供统一框架。我们的分析表明，控制工作流的复杂策略规范构成了主要的推理挑战。通过监督微调（SFT）使用包含思维链（CoT）标注的黄金用户-智能体交互轨迹来支持内化，不仅数据需求量大，而且随着策略复杂度增加性能急剧下降。为缓解数据与推理负担，我们提出类别感知策略持续预训练（CAP-CPT）。我们的自动化流程解析策略文档以提取关键规范，将其归类为事实性、行为性和条件性三类，并分离出导致工作流复杂度的复杂条件。这指导了定向数据合成，并使智能体能够通过自回归预训练损失内化策略信息。实验表明，CAP-CPT在所有设定下均优于SFT基线，在Qwen-3-32B模型上最高提升41%和22%，在CC-Gen上实现97.3%的提示长度缩减，并仅需少量SFT数据即可进一步提升tau-Bench性能。