Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering. As these systems become more autonomous and are deployed at scale, understanding why an agent takes a particular action becomes increasingly important for accountability and governance. However, existing research predominantly focuses on \textit{failure attribution} to localize explicit errors in unsuccessful trajectories, which is insufficient for explaining \textbf{the reason behind agent behaviors}. To bridge this gap, we propose a novel framework for \textbf{general agentic attribution}, designed to identify the internal factors driving agent actions regardless of the task outcome. Our framework operates hierarchically to manage the complexity of agent interactions. Specifically, at the \textit{component level}, we employ temporal likelihood dynamics to identify critical interaction steps; then at the \textit{sentence level}, we refine this localization using perturbation-based analysis to isolate the specific textual evidence. We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias. Experimental results demonstrate that the proposed framework reliably pinpoints pivotal historical events and sentences behind the agent behavior, offering a critical step toward safer and more accountable agentic systems. Codes are available at https://github.com/AI45Lab/AgentDoG.
翻译:基于大型语言模型(LLM)的智能体已广泛应用于客户服务、网络导航和软件工程等现实场景。随着这些系统自主性日益增强且部署规模不断扩大,理解智能体采取特定行为的原因对于问责与治理变得愈发重要。然而,现有研究主要聚焦于通过\textit{失败归因}来定位失败轨迹中的显式错误,这不足以解释\textbf{智能体行为背后的根本原因}。为弥补这一空白,我们提出了一种新颖的\textbf{通用智能体归因框架},旨在识别驱动智能体行为的内部因素,而不受任务结果的影响。我们的框架采用分层架构以应对智能体交互的复杂性:具体而言,在\textit{组件层面},我们使用时序似然动态识别关键交互步骤;随后在\textit{语句层面},通过基于扰动的分析细化定位,以分离出具体的文本证据。我们在多样化智能体场景中验证了该框架,包括标准工具使用场景及记忆诱发偏差等隐蔽可靠性风险。实验结果表明,所提框架能可靠定位影响智能体行为的关键历史事件与语句,为构建更安全、更可问责的智能体系统迈出关键一步。代码发布于 https://github.com/AI45Lab/AgentDoG。