With the emergence of increasingly powerful large language models, there is a burgeoning interest in leveraging these models for casual conversation and role-play applications. However, existing conversational and role-playing datasets often fail to capture the diverse and nuanced interactions typically exhibited by real-world role-play participants. To address this limitation and contribute to the rapidly growing field, we introduce a partially-synthetic dataset named PIPPA (Personal Interaction Pairs between People and AI). PIPPA is a result of a community-driven crowdsourcing effort involving a group of role-play enthusiasts. The dataset comprises over 1 million utterances that are distributed across 26,000 conversation sessions and provides a rich resource for researchers and AI developers to explore and refine conversational AI systems in the context of role-play scenarios.
翻译:随着日益强大的大语言模型的涌现,利用这些模型进行日常对话和角色扮演应用的需求正在迅速增长。然而,现有的对话和角色扮演数据集往往未能捕捉到真实世界角色扮演参与者所表现出的多样且微妙的交互特征。为解决这一局限性并助力这一快速发展的领域,我们引入了一个名为PIPPA(人与AI之间的个人交互对)的部分合成数据集。PIPPA通过一个由角色扮演爱好者群体驱动的众包努力构建而成。该数据集包含超过100万条话语,分布于26000个对话会话中,为研究人员和AI开发者在角色扮演场景背景下探索和优化对话AI系统提供了丰富的资源。