CoSER：一个用于训练和评估LLM角色扮演与人物模拟的综合文学数据集与框架 (CoSER: A Comprehensive Literary Dataset and Framework for Training and Evaluating LLM Role-Playing and Persona Simulation)

Role-playing language agents (RPLAs) have emerged as promising applications of large language models (LLMs). However, simulating established characters presents a challenging task for RPLAs, due to the lack of authentic character datasets and nuanced evaluation methods using such data. In this paper, we present CoSER, a collection of a high-quality dataset, open models, and an evaluation protocol towards effective RPLAs of established characters. The CoSER dataset covers 17,966 characters from 771 renowned books. It provides authentic dialogues with real-world intricacies, as well as diverse data types such as conversation setups, character experiences and internal thoughts. Drawing from acting methodology, we introduce given-circumstance acting for training and evaluating role-playing LLMs, where LLMs sequentially portray multiple characters in book scenes. Using our dataset, we develop CoSER 8B and CoSER 70B, i.e., advanced open role-playing LLMs built on LLaMA-3.1 models. Extensive experiments demonstrate the value of the CoSER dataset for RPLA training, evaluation and retrieval. Moreover, CoSER 70B exhibits state-of-the-art performance surpassing or matching GPT-4o on our evaluation and three existing benchmarks, i.e., achieving 75.80% and 93.47% accuracy on the InCharacter and LifeChoice benchmarks respectively.

翻译：角色扮演语言智能体（RPLAs）已成为大型语言模型（LLMs）前景广阔的应用方向。然而，由于缺乏真实的人物数据集以及基于此类数据的细致评估方法，模拟既定人物对RPLAs而言仍是一项具有挑战性的任务。本文提出了CoSER，这是一个包含高质量数据集、开源模型和评估协议的综合框架，旨在实现针对既定人物的有效RPLAs。CoSER数据集涵盖了来自771本知名著作的17,966个人物角色。它提供了蕴含现实世界复杂性的真实对话，以及多样化的数据类型，如对话场景设置、人物经历和内心独白。借鉴表演方法论，我们引入了"给定情境表演"用于训练和评估角色扮演LLMs，即让LLMs在书籍场景中依次扮演多个角色。利用我们的数据集，我们开发了CoSER 8B和CoSER 70B，即基于LLaMA-3.1模型构建的先进开源角色扮演LLMs。大量实验证明了CoSER数据集在RPLA训练、评估和检索方面的价值。此外，CoSER 70B在我们的评估以及三个现有基准测试（即InCharacter和LifeChoice基准）上展现出最先进的性能，超越或比肩GPT-4o，在这两个基准上的准确率分别达到75.80%和93.47%。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

LLM/智能体作为数据分析师：综述

专知会员服务

36+阅读 · 2025年9月30日

LLMs与生成式智能体模拟：复杂系统研究的新范式

专知会员服务

27+阅读 · 2025年6月15日

大型语言模型（LLM）智能体全栈安全的综述：数据、训练与部署

专知会员服务

32+阅读 · 2025年4月23日

【新书】设计大型语言模型应用：一种面向LLMs的整体方法

专知会员服务

55+阅读 · 2025年3月16日