Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or characteristic imitation, failing to capture the nuanced character understanding with LLMs. In this paper, we propose evaluating LLMs' character understanding capability via the character profiling task, i.e., summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. Specifically, we construct the CroSS dataset from literature experts and assess the generated profiles by comparing ground truth references and their applicability in downstream tasks. Our experiments, which cover various summarization methods and LLMs, have yielded promising results. These results strongly validate the character understanding capability of LLMs. We believe our constructed resource will promote further research in this field. Resources are available at https://github.com/Joanna0123/character_profiling.
翻译:大语言模型(LLMs)已展现出卓越性能并催生了众多人工智能应用,其中角色扮演代理(RPAs)尤其受到青睐,特别是在虚构角色领域。这类RPAs的前提在于LLMs需要具备从虚构作品中理解角色的能力。以往研究通过基础分类任务或特征模仿来评估该能力,未能捕捉LLMs对角色理解的细微之处。本文提出通过角色画像任务(即从对应材料中归纳角色画像)来评估LLMs的角色理解能力——这是RPA开发中广泛采用但尚未深入研究的实践方法。具体而言,我们构建了由文学专家标注的CroSS数据集,并通过比对真实参照及其在下游任务中的适用性来评估生成的画像。涵盖多种摘要方法与LLMs的实验取得了令人鼓舞的结果,这些结果强有力地验证了LLMs的角色理解能力。我们相信所构建的资源将推动该领域的进一步研究。相关资源已发布于https://github.com/Joanna0123/character_profiling。