We introduce a novel data generation method for contradiction detection, which leverages the generative power of large language models as well as linguistic rules. Our vision is to provide a condensed corpus of prototypical contradictions, allowing for in-depth linguistic analysis as well as efficient language model fine-tuning. To this end, we instruct the generative models to create contradicting statements with respect to descriptions of specific contradiction types. In addition, the model is also instructed to come up with completely new contradiction typologies. As an auxiliary approach, we use linguistic rules to construct simple contradictions such as those arising from negation, antonymy and numeric mismatch. We find that our methods yield promising results in terms of coherence and variety of the data. Further studies, as well as manual refinement are necessary to make use of this data in a machine learning setup.
翻译:我们提出了一种新型矛盾检测数据生成方法,该方法融合了大语言模型的生成能力与语言规则。我们的愿景是提供一个精简的原型矛盾语料库,从而支持深入的 linguistic 分析以及高效的语言模型微调。为此,我们指导生成模型针对特定矛盾类型的描述创建矛盾陈述。此外,该模型还被要求提出全新的矛盾类型学。作为辅助手段,我们利用语言规则构建简单矛盾,例如由否定、反义词和数值不匹配引发的矛盾。我们发现,我们的方法在数据的一致性和多样性方面产生了有前景的结果。但为了在机器学习环境中利用这些数据,仍需进一步的研究以及手动优化。