The advent of Large Language Models (LLMs) has paved the way for complex tasks such as role-playing, which enhances user interactions by enabling models to imitate various characters. However, the closed-source nature of state-of-the-art LLMs and their general-purpose training limit role-playing optimization. In this paper, we introduce RoleLLM, a framework to benchmark, elicit, and enhance role-playing abilities in LLMs. RoleLLM comprises four stages: (1) Role Profile Construction for 100 roles; (2) Context-Based Instruction Generation (Context-Instruct) for role-specific knowledge extraction; (3) Role Prompting using GPT (RoleGPT) for speaking style imitation; and (4) Role-Conditioned Instruction Tuning (RoCIT) for fine-tuning open-source models along with role customization. By Context-Instruct and RoleGPT, we create RoleBench, the first systematic and fine-grained character-level benchmark dataset for role-playing with 168,093 samples. Moreover, RoCIT on RoleBench yields RoleLLaMA (English) and RoleGLM (Chinese), significantly enhancing role-playing abilities and even achieving comparable results with RoleGPT (using GPT-4).
翻译:大型语言模型(LLM)的出现为角色扮演等复杂任务铺平了道路,该能力通过使模型能够模仿各类角色来增强用户交互体验。然而,当前最先进的大型语言模型因其闭源特性及通用训练目标,限制了其在角色扮演任务上的优化空间。本文提出RoleLLM框架,用于系统评估、激发并增强大型语言模型的角色扮演能力。该框架包含四个阶段:(1)构建涵盖100个角色的角色画像库;(2)基于上下文的指令生成(Context-Instruct)以提取角色特定知识;(3)采用GPT的角色提示技术(RoleGPT)实现语言风格模仿;(4)基于角色条件的指令微调(RoCIT)对开源模型进行细粒度角色定制化训练。通过Context-Instruct与RoleGPT,我们构建了RoleBench——首个系统化、细粒度的角色级角色扮演基准数据集,包含168,093个样本。进一步在RoleBench上应用RoCIT训练得到RoleLLaMA(英文模型)与RoleGLM(中文模型),这些模型在角色扮演能力上获得显著提升,部分场景甚至达到与RoleGPT(基于GPT-4)相当的效果。