Language models trained on large-scale corpora can generate remarkably fluent results in open-domain dialogue. However, for the persona-based dialogue generation task, consistency and coherence are also key factors, which are great challenges for language models. Existing works mainly focus on valuable data filtering, model structure modifying, or objective function designing, while their improvements are limited and hard to generalize to all types of pre-trained language models. However, we find that language models can produce consistent and coherent responses if we consider enough generations. Thus, the problems lay in large-scale response generation and target response selection. In this work, a simple but effective two-stage SimOAP strategy is proposed, i.e., over-sampling and post-evaluation. The over-sampling stage takes large-scale responses from existing trained models efficiently via off-the-shelf distilling and compressing methods, and the post-evaluation stage selects a good response based on multiple well-designed evaluation metrics from large-scale candidates. Experimental results show that the proposed plug-in SimOAP strategy improves the backbone models and outperforms the baseline strategies in both automatic and human evaluations.
翻译:在大规模语料上训练的语言模型能够在开放域对话中生成极为流畅的结果。然而,对于基于人格的对话生成任务,一致性与连贯性同样是关键因素,这对语言模型构成了巨大挑战。现有研究主要聚焦于有价值数据筛选、模型结构修改或目标函数设计,但其改进效果有限且难以泛化至所有类型的预训练语言模型。然而,我们发现若考虑足够多的生成结果,语言模型能够产生一致且连贯的回复。因此,问题在于大规模回复生成与目标回复选择。本文提出了一种简单但有效的两阶段SimOAP策略,即过采样与后评估。过采样阶段利用现有蒸馏与压缩方法高效地从已有训练模型中获取大规模回复,后评估阶段则基于多种精心设计的评估指标从大规模候选结果中筛选出最佳回复。实验结果表明,所提出的插件式SimOAP策略在自动评估与人工评估中均能提升主干模型性能,并优于基线策略。