Fact verification aims to verify a claim using evidence from a trustworthy knowledge base. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised pretraining framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on FB15k-237 (+5.3% Hits@1) and FEVER (+8% accuracy) with linear evaluation.
翻译:事实验证旨在使用可信知识库中的证据验证一个主张。为解决这一挑战,算法必须为每个主张生成既具有语义意义、又足够紧凑以与源信息实现语义对齐的特征。与以往通过标注的主张及其对应标签语料库学习对齐问题的研究不同,我们提出SFAVEL(通过语言模型蒸馏的自监督事实验证),这是一种新颖的无监督预训练框架,利用预训练语言模型将自监督特征蒸馏为高质量的主张-事实对齐,无需标注。这得益于一种新颖的对比损失函数,它在鼓励特征实现高质量的主张与证据对齐的同时,保留了语料库中的语义关系。值得注意的是,我们的结果在FB15k-237(Hits@1提升5.3%)和FEVER(准确率提升8%)上通过线性评估取得了新的最先进水平。