As LLM-based agents operate over sequential multi-step reasoning, hallucinations arising at intermediate steps risk propagating along the trajectory, thus degrading overall reliability. Unlike hallucination detection in single-turn responses, diagnosing hallucinations in multi-step workflows requires identifying which step causes the initial divergence. To fill this gap, we propose a new research task, automated hallucination attribution of LLM-based agents, aiming to identify the step responsible for the hallucination and explain why. To support this task, we introduce AgentHallu, a comprehensive benchmark with: (1) 693 high-quality trajectories spanning 7 agent frameworks and 5 domains, (2) a hallucination taxonomy organized into 5 categories (Planning, Retrieval, Reasoning, Human-Interaction, and Tool-Use) and 14 sub-categories, and (3) multi-level annotations curated by humans, covering binary labels, hallucination-responsible steps, and causal explanations. We evaluate 13 leading models, and results show the task is challenging even for top-tier models (like GPT-5, Gemini-2.5-Pro). The best-performing model achieves only 41.1\% step localization accuracy, where tool-use hallucinations are the most challenging at just 11.6\%. We believe AgentHallu will catalyze future research into developing robust, transparent, and reliable agentic systems.
翻译:随着基于大语言模型的智能体在序列化多步推理中运行,中间步骤产生的幻觉风险会沿推理轨迹传播,从而降低整体可靠性。与单轮响应中的幻觉检测不同,多步工作流中的幻觉诊断需要识别初始偏差发生在哪一步骤。为填补这一空白,我们提出一项新的研究任务——基于大语言模型的智能体幻觉溯源自动化,旨在识别导致幻觉的责任步骤并解释其原因。为支持该任务,我们提出了AgentHallu,一个包含以下内容的综合性基准:(1) 涵盖7种智能体框架与5个领域的693条高质量轨迹,(2) 一个包含5大类(规划、检索、推理、人机交互与工具使用)及14个子类的幻觉分类体系,(3) 由人工标注的多层级注释,涵盖二元标签、幻觉责任步骤及因果解释。我们评估了13个领先模型,结果显示该任务即使对顶级模型(如GPT-5、Gemini-2.5-Pro)也极具挑战性。表现最佳的模型仅达到41.1%的步骤定位准确率,其中工具使用类幻觉最为困难,准确率仅为11.6%。我们相信AgentHallu将推动未来开发鲁棒、透明且可靠的智能体系统的研究。