Large Language Models (LLMs) have shown remarkable capabilities in tool calling and tool usage, but suffer from hallucinations where they choose incorrect tools, provide malformed parameters and exhibit 'tool bypass' behavior by performing simulations and generating outputs instead of invoking specialized tools or external systems. This undermines the reliability of LLM based agents in production systems as it leads to inconsistent results, and bypasses security and audit controls. Such hallucinations in agent tool selection require early detection and error handling. Unlike existing hallucination detection methods that require multiple forward passes or external validation, we present a computationally efficient framework that detects tool-calling hallucinations in real-time by leveraging LLMs' internal representations during the same forward pass used for generation. We evaluate this approach on reasoning tasks across multiple domains, demonstrating strong detection performance (up to 86.4\% accuracy) while maintaining real-time inference capabilities with minimal computational overhead, particularly excelling at detecting parameter-level hallucinations and inappropriate tool selections, critical for reliable agent deployment.
翻译:大型语言模型(LLM)在工具调用与使用方面展现出卓越能力,但存在幻觉问题:它们会选择错误工具、提供畸变参数,并通过执行模拟和生成输出而非调用专用工具或外部系统,表现出“工具旁路”行为。这导致结果不一致,并规避安全与审计控制,从而削弱了基于LLM的智能体在生产系统中的可靠性。此类智能体工具选择中的幻觉需要早期检测与错误处理。与现有需要多次前向传播或外部验证的幻觉检测方法不同,我们提出一种计算高效的框架,该框架利用LLM在生成所用的同一前向传播过程中的内部表征,实时检测工具调用幻觉。我们在多领域推理任务上评估该方法,展示了强大的检测性能(准确率最高达86.4%),同时以最小计算开销保持实时推理能力,尤其在检测参数级幻觉和不恰当工具选择方面表现优异,这对可靠智能体部署至关重要。