Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.
翻译:比较推理是一个对物体、概念或实体进行比较以得出结论的过程,这是一种基本的认知能力。本文提出了一种新颖的框架,用于预训练语言模型,以增强其对文本的比较推理能力。尽管已有一些方法用于需要比较推理的自然语言处理任务,但它们往往面临昂贵的人工数据标注成本以及对不同任务的泛化能力有限的问题。我们的方法引入了一种新颖的文本实体比较可扩展数据收集方法,该方法同时利用结构化和非结构化数据。此外,我们提出了一个通过三种关于比较推理的新颖目标来预训练语言模型的框架。在下游任务(包括比较问答、问题生成和摘要生成)上的评估表明,我们的预训练框架显著提升了语言模型的比较推理能力,尤其是在低资源条件下。本研究还发布了首个用于比较推理的综合基准。