Entailment Graphs (EGs) have been constructed based on extracted corpora as a strong and explainable form to indicate context-independent entailment relations in natural languages. However, EGs built by previous methods often suffer from the severe sparsity issues, due to limited corpora available and the long-tail phenomenon of predicate distributions. In this paper, we propose a multi-stage method, Typed Predicate-Entailment Graph Generator (TP-EGG), to tackle this problem. Given several seed predicates, TP-EGG builds the graphs by generating new predicates and detecting entailment relations among them. The generative nature of TP-EGG helps us leverage the recent advances from large pretrained language models (PLMs), while avoiding the reliance on carefully prepared corpora. Experiments on benchmark datasets show that TP-EGG can generate high-quality and scale-controllable entailment graphs, achieving significant in-domain improvement over state-of-the-art EGs and boosting the performance of down-stream inference tasks.
翻译:蕴含图(Entailment Graphs, EGs)基于提取语料库构建,作为一种强大且可解释的形式,用于表示自然语言中上下文无关的蕴含关系。然而,由于可用语料库的局限性以及谓词分布的长尾现象,先前方法构建的EGs常面临严重的稀疏性问题。本文提出一种多阶段方法——带类型谓词蕴含图生成器(Typed Predicate-Entailment Graph Generator, TP-EGG)以解决该问题。在给定若干种子谓词的情况下,TP-EGG通过生成新谓词并检测它们之间的蕴含关系来构建图。TP-EGG的生成性质有助于我们利用大型预训练语言模型(PLMs)的最新进展,同时避免了对精心准备语料库的依赖。在基准数据集上的实验表明,TP-EGG能够生成高质量且规模可控的蕴含图,在领域内显著超越现有最先进的EGs,并提升下游推理任务的性能。