Most existing retrieval-augmented language models (LMs) assume a naive dichotomy within a retrieved document set: query-relevance and irrelevance. Our work investigates a more challenging scenario in which even the "relevant" documents may contain misleading or incorrect information, causing conflict among the retrieved documents and thereby negatively influencing model decisions as noise. We observe that existing LMs are highly brittle to the presence of conflicting information in both the fine-tuning and in-context few-shot learning scenarios. We propose approaches for handling knowledge conflicts among retrieved documents by explicitly fine-tuning a discriminator or prompting GPT-3.5 to elicit its discriminative capability. Our empirical results on open-domain QA show that these approaches significantly enhance model robustness. We also provide our findings on incorporating the fine-tuned discriminator's decision into the in-context learning process, proposing a way to exploit the benefits of two disparate learning schemes. Alongside our findings, we provide MacNoise, a machine-generated, conflict-induced dataset to further encourage research in this direction.
翻译:大多数现有检索增强型语言模型(LMs)假设检索文档集存在朴素二分法:查询相关与不相关。本研究探讨了更具挑战性的场景——即使被认为是“相关”的文档也可能包含误导性或错误信息,导致检索文档间产生冲突,从而作为噪声对模型决策产生负面影响。我们观察到,现有语言模型在微调和上下文少样本学习场景中,对冲突信息的存在高度脆弱。我们提出通过显式微调判别器或提示GPT-3.5激发其判别能力来处理检索文档间的知识冲突方法。在开放域问答上的实证结果表明,这些方法显著增强了模型鲁棒性。我们还提供了将微调判别器的决策融入上下文学习过程的发现,提出了一种利用两种不同学习方案优势的方法。伴随研究发现,我们提供了机器生成的冲突诱导数据集MacNoise,以进一步促进该方向的研究。