The purpose of instruction tuning is enabling zero-shot performance, but instruction tuning has also been shown to improve chain-of-thought reasoning and value alignment (Si et al., 2023). Here we consider the impact on $\textit{consistency}$, i.e., the sensitivity of language models to small perturbations in the input. We compare 10 instruction-tuned LLaMA models to the original LLaMA-7b model and show that almost across-the-board they become more consistent, both in terms of their representations and their predictions in zero-shot and downstream tasks. We explain these improvements through mechanistic analyses of factual recall.
翻译:指令调优的目的是实现零样本性能,但研究表明,指令调优还能提升思维链推理和价值对齐(Si 等,2023)。本文探讨其对$\textit{一致性}$的影响,即语言模型对输入微小扰动的敏感度。我们比较了10个经过指令调优的LLaMA模型与原始LLaMA-7b模型,发现几乎在所有情况下,它们在表示和预测方面(包括零样本任务和下游任务)都变得更加一致。我们通过对事实性知识回忆的机制分析,解释了这些改进的原因。