Large Language Models (LLMs) have extended their capabilities beyond language generation to interact with external systems through tool calling, offering powerful potential for real-world applications. However, the phenomenon of tool hallucinations, which occur when models improperly select or misuse tools, presents critical challenges that can lead to flawed task execution and increased operational costs. This paper investigates the concept of reliable tool calling and highlights the necessity of addressing tool hallucinations. We systematically categorize tool hallucinations into two main types: tool selection hallucination and tool usage hallucination. To mitigate these issues, we propose a reliability-focused alignment framework that enhances the model's ability to accurately assess tool relevance and usage. By proposing a suite of evaluation metrics and evaluating on StableToolBench, we further demonstrate the effectiveness of our framework in mitigating tool hallucination and improving the overall system reliability of LLM tool calling.
翻译:大型语言模型(LLM)已将其能力从语言生成扩展到通过工具调用与外部系统交互,为现实世界应用提供了强大潜力。然而,工具幻觉现象——即模型错误选择或误用工具——带来了严峻挑战,可能导致任务执行错误并增加运营成本。本文探讨了可靠工具调用的概念,并强调了解决工具幻觉的必要性。我们系统地将工具幻觉分为两种主要类型:工具选择幻觉与工具使用幻觉。为缓解这些问题,我们提出了一种以可靠性为中心的对齐框架,以增强模型准确评估工具相关性与使用方式的能力。通过提出一套评估指标并在StableToolBench上进行评估,我们进一步证明了该框架在减轻工具幻觉、提升LLM工具调用整体系统可靠性方面的有效性。