Large language models (LLMs) have become increasingly prevalent in our daily lives, leading to an expectation for LLMs to be trustworthy -- - both accurate and well-calibrated (the prediction confidence should align with its ground truth correctness likelihood). Nowadays, fine-tuning has become the most popular method for adapting a model to practical usage by significantly increasing accuracy on downstream tasks. Despite the great accuracy it achieves, we found fine-tuning is still far away from satisfactory trustworthiness due to "tuning-induced mis-calibration". In this paper, we delve deeply into why and how mis-calibration exists in fine-tuned models, and how distillation can alleviate the issue. Then we further propose a brand new method named Efficient Trustworthy Distillation (FIRST), which utilizes a small portion of teacher's knowledge to obtain a reliable language model in a cost-efficient way. Specifically, we identify the "concentrated knowledge" phenomenon during distillation, which can significantly reduce the computational burden. Then we apply a "trustworthy maximization" process to optimize the utilization of this small portion of concentrated knowledge before transferring it to the student. Experimental results demonstrate the effectiveness of our method, where better accuracy (+2.3%) and less mis-calibration (-10%) are achieved on average across both in-domain and out-of-domain scenarios, indicating better trustworthiness.
翻译:大语言模型(LLMs)在日常生活中的应用日益广泛,这使得人们对LLMs的可信度提出了更高要求——即模型需同时具备准确性(accuracy)与良好校准性(calibration),其预测置信度应与真实正确概率保持一致。目前,微调(fine-tuning)已成为使模型适应实际应用场景的主流方法,能显著提升下游任务的准确率。然而,尽管微调能实现很高的准确率,我们发现由于"调优引发的校准偏差"(tuning-induced mis-calibration),其可信度仍远未达到理想水平。本文深入探究了微调模型中校准偏差的成因与表现形式,并分析了蒸馏(distillation)方法如何缓解该问题。在此基础上,我们提出了一种全新的高效可信蒸馏方法(Efficient Trustworthy Distillation,简称FIRST),该方法通过利用教师模型的小部分知识,以经济高效的方式获得可靠的语言模型。具体而言,我们发现了蒸馏过程中存在的"知识集中"(concentrated knowledge)现象,该现象能显著降低计算负担。随后,我们通过"可信度最大化"(trustworthy maximization)流程,在将这部分集中知识迁移给学生模型前对其进行优化利用。实验结果表明,我们的方法在领域内与跨领域场景中平均实现了更高的准确率(+2.3%)和更低的校准偏差(-10%),证明了其提升模型可信度的有效性。