Neural models for Factual Inconsistency Classification with Explanations

Factual consistency is one of the most important requirements when editing high quality documents. It is extremely important for automatic text generation systems like summarization, question answering, dialog modeling, and language modeling. Still, automated factual inconsistency detection is rather under-studied. Existing work has focused on (a) finding fake news keeping a knowledge base in context, or (b) detecting broad contradiction (as part of natural language inference literature). However, there has been no work on detecting and explaining types of factual inconsistencies in text, without any knowledge base in context. In this paper, we leverage existing work in linguistics to formally define five types of factual inconsistencies. Based on this categorization, we contribute a novel dataset, FICLE (Factual Inconsistency CLassification with Explanation), with ~8K samples where each sample consists of two sentences (claim and context) annotated with type and span of inconsistency. When the inconsistency relates to an entity type, it is labeled as well at two levels (coarse and fine-grained). Further, we leverage this dataset to train a pipeline of four neural models to predict inconsistency type with explanations, given a (claim, context) sentence pair. Explanations include inconsistent claim fact triple, inconsistent context span, inconsistent claim component, coarse and fine-grained inconsistent entity types. The proposed system first predicts inconsistent spans from claim and context; and then uses them to predict inconsistency types and inconsistent entity types (when inconsistency is due to entities). We experiment with multiple Transformer-based natural language classification as well as generative models, and find that DeBERTa performs the best. Our proposed methods provide a weighted F1 of ~87% for inconsistency type classification across the five classes.

翻译：事实一致性是编辑高质量文档时最重要的要求之一，对于自动文本生成系统（如摘要、问答、对话建模和语言建模）至关重要。然而，自动事实不一致检测的研究仍相对不足。现有工作主要集中于：(a) 在知识库背景下发现假新闻，或 (b) 检测广义矛盾（作为自然语言推理文献的一部分）。但尚未有研究探讨在无知识库背景的情况下，检测并解释文本中的事实不一致类型。本文借鉴语言学领域现有成果，正式定义了五种事实不一致类型。基于该分类，我们贡献了一个新数据集FICLE（带解释的事实不一致分类），包含约8000个样本，每个样本由两个句子（主张和上下文）组成，并标注了不一致的类型和跨度。当不一致涉及实体类型时，还分别在粗粒度和细粒度两个层面进行标注。此外，我们利用该数据集训练了一个包含四个神经模型的流水线，用于在给定（主张，上下文）句子对时预测不一致类型及其解释。解释包括不一致的主张事实三元组、不一致的上下文跨度、不一致的主张成分、粗粒度和细粒度的不一致实体类型。所提出的系统首先预测主张和上下文中的不一致跨度，然后利用这些跨度预测不一致类型及不一致实体类型（当不一致由实体引起时）。我们实验了多种基于Transformer的自然语言分类模型和生成模型，发现DeBERTa表现最佳。所提出的方法在五类不一致类型分类上实现了约87%的加权F1分数。