Large Language Models (LLMs) are so powerful that they sometimes learn correlations between labels and features that are irrelevant to the task, leading to poor generalization on out-of-distribution data. We propose explanation-based finetuning as a novel and general approach to mitigate LLMs' reliance on spurious correlations. Unlike standard finetuning where the model only predicts the answer given the input, we finetune the model to additionally generate a free-text explanation supporting its answer. To evaluate our method, we finetune the model on artificially constructed training sets containing different types of spurious cues, and test it on a test set without these cues. Compared to standard finetuning, our method makes models remarkably more robust against spurious cues in terms of accuracy drop across four classification tasks: ComVE (+1.2), CREAK (+9.1), e-SNLI (+15.4), and SBIC (+6.5). Moreover, our method works equally well with explanations generated by the model, implying its applicability to more datasets without human-written explanations.
翻译:大语言模型(LLMs)功能强大,以至于它们有时会学习到与任务无关的标签与特征之间的相关性,导致在分布外数据上泛化能力差。我们提出基于解释的微调作为一种新颖且通用的方法,以减轻LLMs对虚假相关性的依赖。与标准微调(模型仅根据输入预测答案)不同,我们微调模型使其额外生成支持其答案的自由文本解释。为评估我们的方法,我们在包含不同类型虚假线索的人工构建训练集上微调模型,并在不含这些线索的测试集上进行测试。与标准微调相比,我们的方法使模型在四个分类任务(ComVE +1.2,CREAK +9.1,e-SNLI +15.4,SBIC +6.5)上对虚假线索的准确性下降表现出显著更强的鲁棒性。此外,我们的方法在使用模型自身生成的解释时同样有效,这意味着它可应用于更多没有人工编写解释的数据集。