Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Yueqi Song,Ketan Ramaneti,Zaid Sheikh,Ziru Chen,Boyu Gou,Tianbao Xie,Yiheng Xu,Danyang Zhang,Apurva Gandhi,Fan Yang,Joseph Liu,Tianyue Ou,Zhihao Yuan,Frank Xu,Shuyan Zhou,Xingyao Wang,Xiang Yue,Tao Yu,Huan Sun,Yu Su,Graham Neubig

Public research results on large-scale supervised finetuning of AI agents remain relatively rare, since the collection of agent training data presents unique challenges. In this work, we argue that the bottleneck is not a lack of underlying data sources, but that a large variety of data is fragmented across heterogeneous formats, tools, and interfaces. To this end, we introduce the agent data protocol (ADP), a light-weight representation language that serves as an "interlingua" between agent datasets in diverse formats and unified agent training pipelines downstream. The design of ADP is expressive enough to capture a large variety of tasks, including API/tool use, browsing, coding, software engineering, and general agentic workflows, while remaining simple to parse and train on without engineering at a per-dataset level. In experiments, we unified a broad collection of 13 existing agent training datasets into ADP format, and converted the standardized ADP data into training-ready formats for multiple agent frameworks. We performed SFT on these data, and demonstrated an average performance gain of ~20% over corresponding base models, and delivers state-of-the-art or near-SOTA performance on standard coding, browsing, tool use, and research benchmarks, without domain-specific tuning. All code and data are released publicly, in the hope that ADP could help lower the barrier to standardized, scalable, and reproducible agent training.

翻译：关于AI智能体大规模监督微调的公开研究成果仍相对稀缺，这主要是因为智能体训练数据的收集面临独特挑战。本研究认为，瓶颈并非在于缺乏底层数据源，而是大量数据分散在异构的格式、工具和接口中。为此，我们提出了智能体数据协议（ADP），一种轻量级的表示语言，充当多样化格式的智能体数据集与下游统一智能体训练流程之间的“中间语言”。ADP的设计具有足够的表达能力，能够涵盖多种任务，包括API/工具使用、浏览、编码、软件工程及通用智能体工作流，同时保持解析和训练的简便性，无需针对每个数据集进行工程化处理。在实验中，我们将13个现有智能体训练数据集统一转换为ADP格式，并将标准化的ADP数据转化为适用于多个智能体框架的训练就绪格式。我们对这些数据进行了监督微调，结果显示，相较于相应的基础模型，平均性能提升约20%，并在标准编码、浏览、工具使用和研究基准测试中达到了最先进或接近最先进的性能，且无需领域特定调优。所有代码和数据均已公开，期望ADP有助于降低标准化、可扩展和可复现的智能体训练门槛。