Most existing retrieval-augmented language models (LMs) for question answering assume all retrieved information is factually correct. In this work, we study a more realistic scenario in which retrieved documents may contain misinformation, causing conflicts among them. We observe that the existing models are highly brittle to such information in both fine-tuning and in-context few-shot learning settings. We propose approaches to make retrieval-augmented LMs robust to misinformation by explicitly fine-tuning a discriminator or prompting to elicit discrimination capability in GPT-3. Our empirical results on open-domain question answering show that these approaches significantly improve LMs' robustness to knowledge conflicts. We also provide our findings on interleaving the fine-tuned model's decision with the in-context learning process, paving a new path to leverage the best of both worlds.
翻译:大多数现有的用于问答的检索增强语言模型假设所有检索信息都事实正确。在本研究中,我们探讨了一个更现实的场景:检索到的文档可能包含错误信息,从而导致文档间存在冲突。我们观察到,现有模型在微调和上下文少样本学习设置下对此类信息高度脆弱。我们提出了使检索增强语言模型对错误信息具有鲁棒性的方法,具体包括明确微调一个判别器,或通过提示激发GPT-3的判别能力。我们在开放域问答上的实证结果表明,这些方法显著提高了语言模型对知识冲突的鲁棒性。此外,我们还发现了将微调模型的决策与上下文学习过程交织进行的方式,为充分利用两者的优势开辟了新路径。