Large language models (LLMs) have demonstrated impressive performance and spurred numerous AI applications, in which role-playing agents (RPAs) are particularly popular, especially for fictional characters. The prerequisite for these RPAs lies in the capability of LLMs to understand characters from fictional works. Previous efforts have evaluated this capability via basic classification tasks or characteristic imitation, failing to capture the nuanced character understanding with LLMs. In this paper, we propose evaluating LLMs' character understanding capability via the character profiling task, i.e., summarizing character profiles from corresponding materials, a widely adopted yet understudied practice for RPA development. Specifically, we construct the CroSS dataset from literature experts and assess the generated profiles by comparing ground truth references and their applicability in downstream tasks. Our experiments, which cover various summarization methods and LLMs, have yielded promising results. These results strongly validate the character understanding capability of LLMs. Resources are available at https://github.com/Joanna0123/character_profiling.
翻译:大语言模型(LLMs)已展现出卓越性能并催生了众多人工智能应用,其中角色扮演智能体(RPAs)尤为流行,特别是在虚构角色领域。此类RPAs的前提在于LLMs理解虚构作品中角色的能力。先前研究通过基础分类任务或特征模仿来评估该能力,未能充分捕捉LLMs对角色理解的细微差异。本文提出通过角色画像任务——即从相应材料中总结角色画像这一在RPA开发中广泛采用但尚未被充分研究的实践——来评估LLMs的角色理解能力。具体而言,我们基于文学专家知识构建了CroSS数据集,并通过对比真实参考文本及其在下游任务中的适用性来评估生成的角色画像。实验涵盖了多种摘要生成方法与LLMs,并取得了显著成果。这些结果有力验证了LLMs的角色理解能力。相关资源详见https://github.com/Joanna0123/character_profiling。