Comparative reasoning is a process of comparing objects, concepts, or entities to draw conclusions, which constitutes a fundamental cognitive ability. In this paper, we propose a novel framework to pre-train language models for enhancing their abilities of comparative reasoning over texts. While there have been approaches for NLP tasks that require comparative reasoning, they suffer from costly manual data labeling and limited generalizability to different tasks. Our approach introduces a novel method of collecting scalable data for text-based entity comparison, which leverages both structured and unstructured data. Moreover, we present a framework of pre-training language models via three novel objectives on comparative reasoning. Evaluation on downstream tasks including comparative question answering, question generation, and summarization shows that our pre-training framework significantly improves the comparative reasoning abilities of language models, especially under low-resource conditions. This work also releases the first integrated benchmark for comparative reasoning.
翻译:比较推理是一个比较对象、概念或实体以得出结论的过程,构成了基本的认知能力。本文提出了一种新颖的框架,用于预训练语言模型,以增强其对文本的比较推理能力。尽管已有针对需要比较推理的自然语言处理任务的方法,但这些方法存在人工数据标注成本高且不同任务泛化能力有限的问题。我们的方法引入了一种新颖的可扩展文本实体比较数据收集方法,同时利用了结构化和非结构化数据。此外,我们提出了一种通过三个新颖的比较推理目标来预训练语言模型的框架。在下游任务(包括比较问答、问题生成和摘要生成)上的评估表明,我们的预训练框架显著提升了语言模型的比较推理能力,尤其是在低资源条件下。本工作还发布了首个比较推理综合基准。