As multi-agent systems powered by Large Language Models (LLMs) are increasingly adopted in real-world workflows, users with diverse technical backgrounds are now building and refining their own agentic processes. However, these systems can fail in opaque ways, making it difficult for users to observe, understand, and correct errors. We conducted formative interviews with 12 practitioners to identify mismatches between existing debugging tools and users' needs. Based on these insights, we designed XAgen, an explainability tool that supports users with varying AI expertise through three core capabilities: log visualization for glanceable workflow understanding, human-in-the-loop feedback to capture expert judgment, and automatic error detection via an LLM-as-a-judge. In a user study with 8 participants, XAgen helped users locate failures more easily, attribute to specific agents or steps, and iteratively improve configurations. Our findings surface human-centered design guidelines for explainable agentic AI development and highlight opportunities for more context-aware interactive debugging.
翻译:随着基于大语言模型(LLM)的多智能体系统在现实世界工作流中的应用日益广泛,具有不同技术背景的用户现在能够构建并优化其自身的智能体流程。然而,这些系统可能以不透明的方式发生故障,使得用户难以观察、理解和纠正错误。我们通过对12名从业者进行形成性访谈,识别出现有调试工具与用户需求之间的不匹配。基于这些洞察,我们设计了XAgen——一种可解释性工具,通过三项核心能力为具备不同AI专业知识的用户提供支持:用于快速理解工作流的日志可视化、通过人在回路反馈捕获专家判断,以及利用LLM作为评判者实现自动错误检测。在一项包含8名参与者的用户研究中,XAgen帮助用户更轻松地定位故障、归因于特定智能体或步骤,并迭代改进配置。我们的研究结果提出了以人为中心的可解释智能体AI设计准则,并强调了发展更具情境感知能力的交互式调试的机遇。