Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.
翻译:无监督事实验证旨在利用可信知识库中的证据验证声明,且无需任何类型的数据标注。为应对这一挑战,算法必须为每个声明生成兼具语义合理性和足够紧凑性的特征,以在声明与源信息之间找到语义对齐。不同于以往通过标注声明及其对应标签的语料库来学习对齐问题的方法,我们提出SFAVEL(通过语言模型蒸馏实现自监督事实验证),这是一种新颖的无监督框架,利用预训练语言模型将自监督特征蒸馏为高质量的声明-证据对齐,而无需标注。该框架通过一种新型对比损失函数得以实现,该函数鼓励特征在保持跨语料库语义关系的同时,实现声明与证据的高质量对齐。值得注意的是,我们在线性评估下,在标准FEVER事实验证基准上取得了新的最先进结果(准确率提升+8%)。