Explaining the decisions of neural models is crucial for ensuring their trustworthiness at deployment time. Using Natural Language Explanations (NLEs) to justify a model's predictions has recently gained increasing interest. However, this approach usually demands large datasets of human-written NLEs for the ground-truth answers, which are expensive and potentially infeasible for some applications. For models to generate high-quality NLEs when only a few NLEs are available, the fine-tuning of Pre-trained Language Models (PLMs) in conjunction with prompt-based learning recently emerged. However, PLMs typically have billions of parameters, making fine-tuning expensive. We propose SparseFit, a sparse few-shot fine-tuning strategy that leverages discrete prompts to jointly generate predictions and NLEs. We experiment with SparseFit on the T5 model and four datasets and compare it against state-of-the-art parameter-efficient fine-tuning techniques. We perform automatic and human evaluations to assess the quality of the model-generated NLEs, finding that fine-tuning only 6.8% of the model parameters leads to competitive results for both the task performance and the quality of the NLEs.
翻译:解释神经模型的决策对于确保其部署时的可信度至关重要。近年来,使用自然语言解释(NLEs)来论证模型预测结果的做法日益受到关注。然而,这种方法通常需要大量人工撰写的NLEs数据集作为正确答案的支撑,这些数据集成本高昂,且在某些应用中可能难以实现。为了在仅有少量NLEs可用时使模型生成高质量的NLEs,结合提示学习的预训练语言模型(PLMs)微调方法近期应运而生。然而,PLMs通常包含数十亿参数,导致微调成本高昂。我们提出SparseFit,一种基于离散提示的稀疏少样本微调策略,用于联合生成预测结果与NLEs。我们在T5模型及四个数据集上对SparseFit进行实验,并与当前最先进的参数高效微调技术进行对比。通过自动评估与人工评估对模型生成的NLEs质量进行考量,结果表明,仅微调6.8%的模型参数即可在任务性能与NLEs质量方面达到具有竞争力的结果。