Role-playing chatbots built on large language models have drawn interest, but better techniques are needed to enable mimicking specific fictional characters. We propose an algorithm that controls language models via an improved prompt and memories of the character extracted from scripts. We construct ChatHaruhi, a dataset covering 32 Chinese / English TV / anime characters with over 54k simulated dialogues. Both automatic and human evaluations show our approach improves role-playing ability over baselines. Code and data are available at https://github.com/LC1332/Chat-Haruhi-Suzumiya .
翻译:基于大型语言模型构建的角色扮演聊天机器人已引起广泛关注,但如何模仿特定虚构角色仍需更优技术。本文提出一种算法,通过改进提示词(prompt)及从剧本中提取的角色记忆控制语言模型。我们构建了ChatHaruhi数据集,涵盖32个中英文影视/动漫角色,包含超过5.4万条模拟对话。自动评估与人工评估均表明,我们的方法在角色扮演能力上显著超越基线。代码与数据已开源至 https://github.com/LC1332/Chat-Haruhi-Suzumiya 。