Modern language models (LMs) need to follow human instructions while being faithful; yet, they often fail to achieve both. Here, we provide concrete evidence of a trade-off between instruction following (i.e., follow open-ended instructions) and faithfulness (i.e., ground responses in given context) when training LMs with these objectives. For instance, fine-tuning LLaMA-7B on instruction following datasets renders it less faithful. Conversely, instruction-tuned Vicuna-7B shows degraded performance at following instructions when further optimized on tasks that require contextual grounding. One common remedy is multi-task learning (MTL) with data mixing, yet it remains far from achieving a synergic outcome. We propose a simple yet effective method that relies on Rejection Sampling for Continued Self-instruction Tuning (ReSet), which significantly outperforms vanilla MTL. Surprisingly, we find that less is more, as training ReSet with high-quality, yet substantially smaller data (three-fold less) yields superior results. Our findings offer a better understanding of objective discrepancies in alignment training of LMs.
翻译:现代语言模型(LMs)需要在遵循人类指令的同时保持忠实性,然而它们往往难以同时实现这两者。本文提供了具体证据,表明在训练语言模型时,指令遵循(即遵循开放式指令)与忠实性(即基于给定语境生成回应)之间存在权衡。例如,在指令遵循数据集上微调LLaMA-7B会降低其忠实性。反之,经过指令调优的Vicuna-7B在进一步针对需要语境基础的任务进行优化时,其指令遵循性能会下降。一种常见的补救措施是通过数据混合进行多任务学习(MTL),但这仍远未实现协同效果。我们提出了一种简单而有效的方法——基于拒绝采样的持续自指令调优(ReSet),其性能显著优于普通多任务学习。令人惊讶的是,我们发现“少即是多”:使用高质量但数据量显著减少(减少三倍)的ReSet训练能产生更优的结果。我们的研究为理解语言模型对齐训练中的目标差异提供了更深入的见解。