Large language models (LLMs) have demonstrated significant potential in developing Role-Playing Agents (RPAs). However, current research primarily evaluates RPAs using famous fictional characters, allowing models to rely on memory associated with character names. This dependency creates a bias that limits the generalization of RPAs to unseen personas. To address this issue, we propose an anonymous evaluation method. Experiments across multiple benchmarks reveal that anonymization significantly degrades role-playing performance, confirming that name exposure carries implicit information. Furthermore, we investigate personality augmentation to enhance role fidelity under anonymous setting. We systematically compare the efficacy of personality traits derived from human annotations versus those self-generated by the model. Our results demonstrate that incorporating personality information consistently improves RPA performance. Crucially, self-generated personalities achieve performance comparable to human-annotated ones. This work establishes a fairer evaluation protocol and validates a scalable, personality-enhanced framework for constructing robust RPAs.
翻译:大型语言模型(LLMs)在开发角色扮演智能体(RPAs)方面展现出巨大潜力。然而,当前研究主要使用知名虚构角色评估RPAs,这使得模型能够依赖与角色名称相关的记忆。这种依赖性造成了评估偏差,限制了RPAs对未见人物角色的泛化能力。为解决此问题,我们提出了一种匿名评估方法。在多个基准测试上的实验表明,匿名化会显著降低角色扮演性能,证实了名称暴露携带隐含信息。此外,我们研究了人格增强方法以提升匿名设置下的角色保真度。我们系统比较了基于人工标注的人格特质与模型自生成人格特质的有效性。结果表明,融入人格信息能持续提升RPA性能。关键的是,自生成人格能达到与人工标注人格相当的性能。本研究建立了一种更公平的评估方案,并验证了构建稳健RPAs的可扩展人格增强框架。