Role-Playing Agent (RPA) is an increasingly popular type of LLM Agent that simulates human-like behaviors in a variety of tasks. However, evaluating RPAs is challenging due to diverse task requirements and agent designs. This paper proposes an evidence-based, actionable, and generalizable evaluation design guideline for LLM-based RPA by systematically reviewing 1,676 papers published between Jan. 2021 and Dec. 2024. Our analysis identifies six agent attributes, seven task attributes, and seven evaluation metrics from existing literature. Based on these findings, we present an RPA evaluation design guideline to help researchers develop more systematic and consistent evaluation methods.
翻译:角色扮演智能体(RPA)是一种日益流行的大语言模型智能体,能在多种任务中模拟类人行为。然而,由于任务需求与智能体设计的多样性,对RPA的评估颇具挑战。本文通过系统综述2021年1月至2024年12月期间发表的1,676篇论文,提出了一套基于证据、可操作且可推广的、面向基于大语言模型的RPA的评估设计指南。我们的分析从现有文献中识别出六项智能体属性、七项任务属性以及七项评估指标。基于这些发现,我们提出了一套RPA评估设计指南,以帮助研究者开发更系统、更一致的评估方法。