Fully autonomous teams of LLM-powered AI agents are emerging that collaborate to perform complex tasks for users. What challenges do developers face when trying to build and debug these AI agent teams? In formative interviews with five AI agent developers, we identify core challenges: difficulty reviewing long agent conversations to localize errors, lack of support in current tools for interactive debugging, and the need for tool support to iterate on agent configuration. Based on these needs, we developed an interactive multi-agent debugging tool, AGDebugger, with a UI for browsing and sending messages, the ability to edit and reset prior agent messages, and an overview visualization for navigating complex message histories. In a two-part user study with 14 participants, we identify common user strategies for steering agents and highlight the importance of interactive message resets for debugging. Our studies deepen understanding of interfaces for debugging increasingly important agentic workflows.
翻译:由大型语言模型驱动的全自主AI智能体团队正在兴起,这些智能体通过协作执行用户的复杂任务。开发者在构建和调试此类AI智能体团队时面临哪些挑战?通过对五位AI智能体开发者的形成性访谈,我们识别出核心挑战:难以通过冗长的智能体对话记录定位错误、现有工具缺乏交互式调试支持,以及对智能体配置迭代工具的需求。基于这些需求,我们开发了交互式多智能体调试工具AGDebugger,其具备浏览与发送消息的用户界面、编辑与重置历史智能体消息的功能,以及用于导航复杂消息历史的可视化概览视图。通过一项包含14名参与者的两部分用户研究,我们总结了用户引导智能体的常见策略,并强调了交互式消息重置功能对调试的重要性。本研究深化了对日益重要的智能体工作流调试界面的理解。