The construction of function calling agents has emerged as a promising avenue for extending model capabilities. A major challenge for this task is obtaining high quality diverse data for training. Prior work emphasizes diversity in functions, invocation patterns, and interaction turns, yet linguistic diversity of requests and coverage of arguments (e.g., \texttt{city\_name}, \texttt{stock\_ticker}) remain underexplored. We propose a method that generates synthetic datasets via optimizing general-purpose diversity metrics across both queries and arguments, without relying on hand-crafted rules or taxonomies, making it robust to different usecases. We demonstrate the effectiveness of our technique via both intrinsic and extrinsic testing, comparing it to SoTA data generation methods. We show a superiority over baselines in terms of diversity, while keeping comparable correctness. Additionally, when used as a training set, the model resulting from our dataset exhibits superior performance compared to analogous models based on the baseline data generation methods in out-of-distribution performance. In particular, we achieve an $7.4\%$ increase in accuracy on the BFCL benchmark compared to similar counterparts.
翻译:函数调用代理的构建已成为扩展模型能力的一条前景广阔的途径。该任务面临的一个主要挑战是获取高质量且多样化的训练数据。先前的研究侧重于函数、调用模式和交互轮次的多样性,然而请求的语言多样性及参数(例如 \texttt{city\_name}, \texttt{stock\_ticker})的覆盖范围仍未得到充分探索。我们提出一种方法,通过优化查询和参数两方面的通用多样性指标来生成合成数据集,该方法不依赖于手工规则或分类法,从而对不同用例具有鲁棒性。我们通过内在和外在测试,并与最先进的数据生成方法进行比较,证明了我们技术的有效性。实验表明,在保持可比正确性的同时,我们的方法在多样性方面优于基线。此外,当用作训练集时,基于我们数据集训练的模型在分布外性能上,相较于基于基线数据生成方法的同类模型表现出更优的性能。具体而言,在BFCL基准测试中,我们实现了相比同类方法 $7.4\%$ 的准确率提升。