Understanding rich dialogues often requires NLP systems to access relevant commonsense persona knowledge, but retrieving this knowledge is challenging due to complex contexts and the implicit nature of commonsense. This paper presents our approach to the Commonsense Persona Knowledge Linking (CPKL) challenge, addressing the critical need for integrating persona and commonsense knowledge in open-domain dialogue systems. We introduce SynCPKL Pipeline, a pipeline that leverages Large Language Models to generate high-quality synthetic datasets for training commonsense persona knowledge linkers. To demonstrate the efficacy of our approach, we present SynCPKL, a new dataset specifically designed for this task. Our experiments validate the effectiveness of SynCPKL for training commonsense persona knowledge linkers. Additionally, our top-performing model, Derberta-SynCPKL, secured first place in the CPKL challenge by a 16% improvement in F1 score. We released both SynCPKL and Derberta-SynCPKL at https://github.com/irislin1006/CPKL.
翻译:理解丰富的对话通常需要自然语言处理系统获取相关的常识人物知识,但由于上下文复杂且常识具有隐含性,检索此类知识颇具挑战。本文针对常识人物知识链接挑战,提出了我们的解决方案,旨在满足开放域对话系统中整合人物与常识知识的迫切需求。我们介绍了SynCPKL流程,该流程利用大语言模型生成高质量合成数据集,用于训练常识人物知识链接器。为验证方法的有效性,我们提出了专门为此任务设计的SynCPKL数据集。实验结果表明,SynCPKL在训练常识人物知识链接器方面具有显著效果。此外,我们的最佳性能模型Derberta-SynCPKL在CPKL挑战中以F1分数提升16%的优势获得第一名。我们已在https://github.com/irislin1006/CPKL 开源SynCPKL数据集与Derberta-SynCPKL模型。