Despite significant advancements in general-purpose AI agents, several challenges still hinder their practical application in real-world scenarios. First, the limited planning capabilities of Large Language Models (LLM) restrict AI agents from effectively solving complex tasks that require long-horizon planning. Second, general-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise. In this paper, we introduce the Standard Operational Procedure-guided Agent (SOP-agent), a novel framework for constructing domain-specific agents through pseudocode-style Standard Operational Procedures (SOPs) written in natural language. Formally, we represent a SOP as a decision graph, which is traversed to guide the agent in completing tasks specified by the SOP. We conduct extensive experiments across tasks in multiple domains, including decision-making, search and reasoning, code generation, data cleaning, and grounded customer service. The SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks and comparable to domain-specific agent systems. Additionally, we introduce the Grounded Customer Service Benchmark, the first benchmark designed to evaluate the grounded decision-making capabilities of AI agents in customer service scenarios based on SOPs.
翻译:尽管通用人工智能代理取得了显著进展,但在实际应用场景中仍面临若干挑战。首先,大型语言模型有限的规划能力限制了AI代理有效解决需要长程规划的复杂任务。其次,通用AI代理难以高效利用领域特定知识和人类专家经验。本文提出标准操作流程引导代理(SOP-agent),这是一种通过自然语言编写的伪代码式标准操作流程构建领域特定代理的新型框架。形式上,我们将SOP表示为决策图,通过遍历该图来引导代理完成SOP指定的任务。我们在多个领域的任务上进行了广泛实验,包括决策制定、搜索与推理、代码生成、数据清洗以及具身化客户服务。SOP-agent展现出卓越的泛化能力,其性能优于通用代理框架,并与领域特定代理系统相当。此外,我们提出了具身化客户服务基准测试,这是首个基于SOP评估AI代理在客户服务场景中具身化决策能力的基准测试。