LLM agents have been widely adopted in real-world applications, relying on agent frameworks for workflow execution and multi-agent coordination. As these systems scale, understanding bugs in the underlying agent frameworks becomes critical. However, existing work mainly focuses on agent-level failures, overlooking framework-level bugs. To address this gap, we conduct an empirical study of 998 bug reports from CrewAI and LangChain, constructing a taxonomy of 15 root causes and 7 observable symptoms across five agent lifecycle stages: 'Agent Initialization','Perception', 'Self-Action', 'Mutual Interaction' and 'Evolution'. Our findings show that agent framework bugs mainly arise from 'API misuse', 'API incompatibility', and 'Documentation Desync', largely concentrated in the 'Self-Action' stage. Symptoms typically appear as 'Functional Error', 'Crash', and 'Build Failure', reflecting disruptions to task progression and control flow.
翻译:LLM智能体已在现实应用中广泛部署,其依赖于智能体框架来执行工作流程并实现多智能体协同。随着这些系统规模不断扩大,理解底层智能体框架中的缺陷变得至关重要。然而,现有研究主要关注智能体层面的故障,忽视了框架层面的缺陷。为填补这一空白,我们对来自CrewAI和LangChain的998份缺陷报告进行了实证研究,构建了一个涵盖15种根本原因和7种可观测症状的分类体系,这些缺陷分布于智能体生命周期的五个阶段:"智能体初始化"、"感知"、"自主行动"、"交互协作"和"演化适应"。研究发现表明,智能体框架缺陷主要源于"API误用"、"API不兼容"和"文档同步滞后",且多集中于"自主行动"阶段。缺陷症状通常表现为"功能异常"、"系统崩溃"和"构建失败",这反映了任务执行流程与控制流的中断现象。