Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.
翻译:角色扮演语言智能体(RPLAs)已成为大型语言模型(LLMs)前景广阔的应用方向。然而,由于缺乏真实的人物数据集以及基于此类数据的细致评估方法,模拟既定人物对RPLAs而言仍是一项具有挑战性的任务。本文提出了CoSER,这是一个包含高质量数据集、开源模型和评估协议的综合框架,旨在实现针对既定人物的有效RPLAs。CoSER数据集涵盖了来自771本知名著作的17,966个人物角色。它提供了蕴含现实世界复杂性的真实对话,以及多样化的数据类型,如对话场景设置、人物经历和内心独白。借鉴表演方法论,我们引入了"给定情境表演"用于训练和评估角色扮演LLMs,即让LLMs在书籍场景中依次扮演多个角色。利用我们的数据集,我们开发了CoSER 8B和CoSER 70B,即基于LLaMA-3.1模型构建的先进开源角色扮演LLMs。大量实验证明了CoSER数据集在RPLA训练、评估和检索方面的价值。此外,CoSER 70B在我们的评估以及三个现有基准测试(即InCharacter和LifeChoice基准)上展现出最先进的性能,超越或比肩GPT-4o,在这两个基准上的准确率分别达到75.80%和93.47%。