Large language models (LLMs) often generate convincing, fluent explanations. However, different from humans, they often generate inconsistent explanations on different inputs. For example, an LLM may generate the explanation "all birds can fly" when answering the question "Can sparrows fly?" but meanwhile answer "no" to the related question "Can penguins fly?". Explanations should be consistent across related examples so that they allow a human to simulate the LLM's decision process on multiple examples. We propose explanation-consistency finetuning (EC-finetuning), a method that adapts LLMs to generate more consistent natural-language explanations on related examples. EC-finetuning involves finetuning LLMs on synthetic data that is carefully constructed to contain consistent explanations. Across a variety of question-answering datasets in various domains, EC-finetuning yields a 10.0% relative explanation consistency improvement on four finetuning datasets, and generalizes to seven out-of-distribution datasets not seen during finetuning (+4.5% relative). Code is available at https://github.com/yandachen/explanation-consistency-finetuning .
翻译:大型语言模型(LLMs)常能生成令人信服且流畅的解释,但与人类不同,它们在不同输入上往往会产生不一致的解释。例如,当回答"麻雀能飞吗?"时,LLM可能生成"所有鸟类都能飞"的解释,但同时对相关问题"企鹅能飞吗?"回答"不能"。理想的解释应在相关示例间保持一致性,从而让人类能够通过多个示例模拟LLM的决策过程。为此,我们提出解释一致性微调方法(EC-finetuning),该方法通过调整LLMs在相关示例上生成更为一致的自然语言解释。EC-finetuning通过精心构建的包含一致解释的合成数据对LLMs进行微调。在多个领域的问答数据集上,EC-finetuning在四个微调数据集中实现了10.0%的解释一致性相对提升,并泛化至七个微调阶段未见过的分布外数据集(相对提升4.5%)。代码已开源于 https://github.com/yandachen/explanation-consistency-finetuning。