Mitigating the generation of contradictory responses poses a substantial challenge in dialogue response generation. The quality and quantity of available contradictory response data play a vital role in suppressing these contradictions, offering two significant benefits. First, having access to large contradiction data enables a comprehensive examination of their characteristics. Second, data-driven methods to mitigate contradictions may be enhanced with large-scale contradiction data for training. Nevertheless, no attempt has been made to build an extensive collection of model-generated contradictory responses. In this paper, we build a large dataset of response generation models' contradictions for the first time. Then, we acquire valuable insights into the characteristics of model-generated contradictions through an extensive analysis of the collected responses. Lastly, we also demonstrate how this dataset substantially enhances the performance of data-driven contradiction suppression methods.
翻译:缓解矛盾回复的生成是对话回复生成中的重大挑战。现有矛盾回复数据的质量和数量对于抑制这些矛盾具有关键作用,并带来两大显著优势:首先,获取大量矛盾数据有助于全面剖析其特性;其次,基于数据驱动的矛盾缓解方法可通过大规模矛盾数据的训练得到增强。然而,目前尚未有研究尝试构建模型生成矛盾回复的大规模集合。本文首次构建了回复生成模型矛盾行为的大规模数据集,通过广泛分析所收集的回复,深入揭示了模型生成矛盾行为的特性。最后,我们进一步证明了该数据集如何显著提升数据驱动矛盾抑制方法的性能。