While auxiliary information has become a key to enhance Large Language Models (LLMs), relatively little is known about how well LLMs merge these contexts, specifically generated and retrieved. To study this, we formulate a task specifically designed to identify whether the answers, derived from the integration of generated and retrieved contexts, are attributed to either generated or retrieved contexts. To support this task, we develop a methodology to construct datasets with conflicting contexts, where each question is paired with both generated and retrieved contexts, yet only one of them contains the correct answer. Our experiments reveal a significant bias in LLMs towards generated contexts, as evidenced across state-of-the-art open (Llama2-7b/13b) and closed (GPT 3.5/4) systems. We further identify two key factors contributing to this bias: i) Contexts generated by LLMs typically show greater similarity to the questions, increasing their likelihood of selection; ii) The segmentation process used in retrieved contexts disrupts their completeness, thereby hindering their full utilization in LLMs. Our analysis enhances the understanding of how LLMs merge diverse contexts, offering valuable insights for advancing current augmentation methods for LLMs.
翻译:尽管辅助信息已成为增强大语言模型(LLMs)的关键,但关于LLMs如何有效融合这些上下文(尤其是生成上下文与检索上下文)的研究仍相对不足。为探究这一问题,我们设计了一项专门任务,旨在识别由生成上下文与检索上下文融合而产生的答案究竟归属于哪一类上下文。为支撑该任务,我们开发了一种方法论来构建包含矛盾上下文的数据集:每个问题同时配对一个生成上下文与一个检索上下文,但仅其中包含正确答案。实验表明,LLMs对生成上下文存在显著偏向,这一现象在最新开源系统(Llama2-7b/13b)与闭源系统(GPT 3.5/4)中均有体现。我们进一步识别出导致该偏向的两个关键因素:i) LLMs生成的上下文通常与问题具有更高相似度,从而增加其被选中的概率;ii) 检索上下文中使用的分块处理破坏了上下文完整性,阻碍了LLMs对其的充分利用。本分析深化了关于LLMs如何融合不同上下文的理解,为改进当前LLMs增强方法提供了重要启示。