Instruction tuning is a widely used approach to improve the instruction-following ability of large language models (LLMs). Instruction-tuning datasets typically include a mixture of context-augmented and context-free examples, yet prior work has largely combined these data types without examining their distinct effects. In this paper, we investigate how training LLMs with or without context affects model behavior and downstream performance. First, in the text domain, we show that LLMs trained with context attend more strongly to the provided knowledge, achieving better grounding. We also observe that context-augmented training shifts how LLMs use knowledge: models store and leverage less on parametric knowledge and instead depend more on the provided context. Second, we observe that using LLM trained with context-augmented data as the backbone for vision-language models reduces hallucination and improves grounding in the visual domain. Finally, we explore practical strategies for real-world deployments where context availability varies. We show that maintaining separate context-augmented and context-free models and routing inputs between them yields more robust overall performance than training a single mixed model, as it better preserves their complementary strengths.
翻译:指令调优是一种广泛用于提升大语言模型指令遵循能力的方法。指令调优数据集通常混合了上下文增强型和上下文无关型的示例,然而先前的研究大多将这些数据类型混合使用,而未考察其各自独特的影响。本文研究了在使用或不使用上下文的情况下训练大语言模型,会如何影响模型的行为和下游性能。首先,在文本领域,我们证明使用上下文训练的LLM会更强烈地关注所提供的知识,从而实现更好的事实依据。我们还观察到,上下文增强型训练改变了LLM使用知识的方式:模型存储和利用的参数知识更少,转而更多地依赖于所提供的上下文。其次,我们发现,使用经过上下文增强数据训练的LLM作为视觉语言模型的主干,可以减少幻觉并改善在视觉领域的事实依据。最后,我们探索了在实际部署中上下文可用性变化时的实用策略。我们证明,与训练单一的混合模型相比,维护独立的上下文增强型和上下文无关型模型,并在它们之间路由输入,能够获得更稳健的整体性能,因为这更好地保留了它们互补的优势。