Large Language Model (LLM)-based agents are widely used in real-world applications such as customer service, web navigation, and software engineering. As these systems become more autonomous and are deployed at scale, understanding why an agent takes a particular action becomes increasingly important for accountability and governance. However, existing research predominantly focuses on \textit{failure attribution} to localize explicit errors in unsuccessful trajectories, which is insufficient for explaining the reasoning behind agent behaviors. To bridge this gap, we propose a novel framework for \textbf{general agentic attribution}, designed to identify the internal factors driving agent actions regardless of the task outcome. Our framework operates hierarchically to manage the complexity of agent interactions. Specifically, at the \textit{component level}, we employ temporal likelihood dynamics to identify critical interaction steps; then at the \textit{sentence level}, we refine this localization using perturbation-based analysis to isolate the specific textual evidence. We validate our framework across a diverse suite of agentic scenarios, including standard tool use and subtle reliability risks like memory-induced bias. Experimental results demonstrate that the proposed framework reliably pinpoints pivotal historical events and sentences behind the agent behavior, offering a critical step toward safer and more accountable agentic systems.
翻译:基于大语言模型(LLM)的智能体已广泛应用于客户服务、网络导航和软件工程等现实场景。随着这些系统自主性日益增强且部署规模不断扩大,理解智能体采取特定行为的原因对于问责与治理变得至关重要。然而,现有研究主要聚焦于通过《故障归因》定位失败轨迹中的显式错误,这不足以解释智能体行为背后的推理逻辑。为弥补这一空白,我们提出一种新颖的《通用智能体归因》框架,旨在识别驱动智能体行为的内部因素,且不受任务结果影响。该框架采用分层架构以应对智能体交互的复杂性:在《组件层面》,我们使用时态似然动态识别关键交互步骤;在《语句层面》,则通过基于扰动的分析精确定位,以隔离具体的文本证据。我们在多样化智能体场景(包括标准工具使用及记忆诱发偏差等隐性风险)中验证了该框架的有效性。实验结果表明,所提框架能可靠地追溯影响智能体行为的关键历史事件与语句,为构建更安全、更可问责的智能体系统迈出关键一步。