The emergence of foundation models (FMs) has enabled the development of highly capable and autonomous agents, unlocking new application opportunities across a wide range of domains. Evaluating the architecture of agents is particularly important as the architectural decisions significantly impact the quality attributes of agents given their unique characteristics, including compound architecture, autonomous and non-deterministic behaviour, and continuous evolution. However, these traditional methods fall short in addressing the evaluation needs of agent architecture due to the unique characteristics of these agents. Therefore, in this paper, we present AgentArcEval, a novel agent architecture evaluation method designed specially to address the complexities of FM-based agent architecture and its evaluation. Moreover, we present a catalogue of agent-specific general scenarios, which serves as a guide for generating concrete scenarios to design and evaluate the agent architecture. We demonstrate the usefulness of AgentArcEval and the catalogue through a case study on the architecture evaluation of a real-world tax copilot, named Luna.
翻译:基础模型的出现使得开发高度智能且自主的智能体成为可能,从而在广泛领域解锁了新的应用机遇。鉴于智能体具有复合架构、自主与非确定性行为以及持续演化等独特特性,其架构决策会显著影响智能体的质量属性,因此评估智能体架构尤为重要。然而,传统方法由于无法充分应对这些智能体的独特性,难以满足其架构评估需求。为此,本文提出AgentArcEval,这是一种专门为解决基于基础模型的智能体架构及其评估的复杂性而设计的新型智能体架构评估方法。此外,我们提出了一套面向智能体的通用场景目录,该目录可作为生成具体场景以设计和评估智能体架构的指南。我们通过对一个名为Luna的真实世界税务助手进行架构评估的案例研究,验证了AgentArcEval方法及场景目录的实用性。