Large language models (LLMs) exhibit strong capabilities as decision-making agents by interleaving reasoning and actions, as seen in ReAct-style frameworks. Yet, their practical deployment is constrained by high inference costs and large model sizes. We propose Structured Agent Distillation, a framework that compresses large LLM-based agents into smaller student models while preserving both reasoning fidelity and action consistency. Unlike standard token-level distillation, our method segments trajectories into [REASON] and [ACT] spans, applying segment-specific losses to align each component with the teacher's behavior. This structure-aware supervision enables compact agents to better replicate the teacher's decision process. Experiments on ALFWorld, HotPotQA-ReAct, and WebShop show that our approach consistently outperforms token-level and imitation learning baselines, achieving significant compression with minimal performance drop. Scaling and ablation results further highlight the importance of span-level alignment for efficient and deployable agents.
翻译:大语言模型(LLMs)通过交错推理与操作展现出强大的决策智能体能力,如ReAct风格框架所示。然而,其实际部署受限于高昂的推理成本与模型规模。我们提出结构化智能体蒸馏框架,将基于大语言模型的智能体压缩为更小的学生模型,同时保留推理保真度与操作一致性。与标准词元级蒸馏不同,本方法将轨迹划分为[推理]和[操作]区间,通过区间特定损失函数对齐各组件与教师模型的行为。这种结构感知监督使紧凑型智能体能够更好地复现教师模型的决策过程。在ALFWorld、HotPotQA-ReAct和WebShop上的实验表明,我们的方法始终优于词元级蒸馏和模仿学习基线,在性能下降极小的情况下实现了显著压缩。缩放实验与消融研究进一步凸显了区间级对齐对构建高效可部署智能体的重要性。