ET-Agent：通过行为校准激励有效的工具集成推理智能体 (ET-Agent: Incentivizing Effective Tool-Integrated Reasoning Agent via Behavior Calibration)

Large Language Models (LLMs) can extend their parameter knowledge limits by adopting the Tool-Integrated Reasoning (TIR) paradigm. However, existing LLM-based agent training framework often focuses on answers' accuracy, overlooking specific alignment for behavior patterns. Consequently, agent often exhibits ineffective actions during TIR tasks, such as redundant and insufficient tool calls. How to calibrate erroneous behavioral patterns when executing TIR tasks, thereby exploring effective trajectories, remains an open-ended problem. In this paper, we propose ET-Agent, a training framework for calibrating agent's tool-use behavior through two synergistic perspectives: Self-evolving Data Flywheel and Behavior Calibration Training. Specifically, we introduce a self-evolutionary data flywheel to generate enhanced data, used to fine-tune LLM to improve its exploration ability. Based on this, we implement an two-phases behavior-calibration training framework. It is designed to progressively calibrate erroneous behavioral patterns to optimal behaviors. Further in-depth experiments confirm the superiority of \ourmodel{} across multiple dimensions, including correctness, efficiency, reasoning conciseness, and tool execution accuracy. Our ET-Agent framework provides practical insights for research in the TIR field. Codes can be found in https://github.com/asilverlight/ET-Agent

翻译：大型语言模型（LLM）可通过采用工具集成推理（TIR）范式突破其参数知识限制。然而，现有基于LLM的智能体训练框架往往侧重于答案的准确性，忽视了对行为模式的针对性对齐。因此，智能体在执行TIR任务时常表现出低效行为，例如冗余和不足的工具调用。如何在校正执行TIR任务时出现的错误行为模式，从而探索有效轨迹，仍是一个开放性问题。本文提出ET-Agent，一个通过两个协同视角校准智能体工具使用行为的训练框架：自我演进数据飞轮与行为校准训练。具体而言，我们引入自演进数据飞轮以生成增强数据，用于微调LLM以提升其探索能力。在此基础上，我们实现了一个两阶段行为校准训练框架。该框架旨在逐步将错误行为模式校准至最优行为。进一步的深入实验证实了\ourmodel{}在多个维度上的优越性，包括正确性、效率、推理简洁性和工具执行准确性。我们的ET-Agent框架为TIR领域的研究提供了实用见解。代码可见于 https://github.com/asilverlight/ET-Agent