Evaluating the veracity of everyday claims is time consuming and in some cases requires domain expertise. We empirically demonstrate that the commonly used fact checking pipeline, known as the retriever-reader, suffers from performance deterioration when it is trained on the labeled data from one domain and used in another domain. Afterwards, we delve into each component of the pipeline and propose novel algorithms to address this problem. We propose an adversarial algorithm to make the retriever component robust against distribution shift. Our core idea is to initially train a bi-encoder on the labeled source data, and then, to adversarially train two separate document and claim encoders using unlabeled target data. We then focus on the reader component and propose to train it such that it is insensitive towards the order of claims and evidence documents. Our empirical evaluations support the hypothesis that such a reader shows a higher robustness against distribution shift. To our knowledge, there is no publicly available multi-topic fact checking dataset. Thus, we propose a simple automatic method to re-purpose two well-known fact checking datasets. We then construct eight fact checking scenarios from these datasets, and compare our model to a set of strong baseline models, including recent domain adaptation models that use GPT4 for generating synthetic data.
翻译:对日常主张真实性的评估既耗时又可能需领域专业知识。我们通过实验证明,常用的基于检索-阅读器的事实核查流程在某一领域标注数据训练后应用于另一领域时,其性能会显著下降。随后,我们深入分析该流程的每个组件,并提出新算法解决此问题。我们提出一种对抗性算法,使检索组件对分布偏移具有鲁棒性:核心思路是先使用标注源数据训练双编码器,再通过未标注目标数据对抗性训练两个分离的文档与主张编码器。接着聚焦阅读器组件,我们提出使其对主张与证据文档的顺序不敏感的训练方法。实验验证表明,此类阅读器对分布偏移具有更高鲁棒性。据我们所知,目前尚无公开的多主题事实核查数据集。因此,我们设计简易自动化方法将两个知名事实核查数据集重新构建为八个事实核查场景,并与包括使用GPT4生成合成数据的近期领域自适应模型在内的强基线模型进行对比。