Large language model (LLM)-based multi-agent systems have demonstrated impressive capabilities in handling complex tasks. However, the complexity of agentic behaviors makes these systems difficult to understand. When failures occur, developers often struggle to identify root causes and to determine actionable paths for improvement. Traditional methods that rely on inspecting raw log records are inefficient, given both the large volume and complexity of data. To address this challenge, we propose a framework and an interactive system, DiLLS, designed to reveal and structure the behaviors of multi-agent systems. The key idea is to organize information across three levels of query completion: activities, actions, and operations. By probing the multi-agent system through natural language, DiLLS derives and organizes information about planning and execution into a structured, multi-layered summary. Through a user study, we show that DiLLS significantly improves developers' effectiveness and efficiency in identifying, diagnosing, and understanding failures in LLM-based multi-agent systems.
翻译:基于大型语言模型(LLM)的多智能体系统在处理复杂任务方面展现出令人瞩目的能力。然而,智能体行为的复杂性使得这些系统难以理解。当故障发生时,开发者往往难以定位根本原因并确定可行的改进路径。传统依赖检查原始日志记录的方法效率低下,因为数据量庞大且结构复杂。为应对这一挑战,我们提出了一个框架及交互式系统 DiLLS,旨在揭示并结构化多智能体系统的行为。其核心思想是在三个查询完成层级(活动、动作与操作)上组织信息。通过自然语言探查多智能体系统,DiLLS 将规划与执行相关的信息推导并组织成结构化的多层摘要。通过用户研究,我们证明 DiLLS 能显著提升开发者在识别、诊断和理解基于 LLM 的多智能体系统故障时的效能与效率。