Though Large Language Models (LLMs) have demonstrated the powerful capabilities of few-shot learning through prompting methods, supervised training is still necessary for complex reasoning tasks. Because of their extensive parameters and memory consumption, both Parameter-Efficient Fine-Tuning (PEFT) methods and Memory-Efficient Fine-Tuning methods have been proposed for LLMs. Nevertheless, the issue of large annotated data consumption, the aim of Data-Efficient Fine-Tuning, remains unexplored. One obvious way is to combine the PEFT method with active learning. However, the experimental results show that such a combination is not trivial and yields inferior results. Through probe experiments, such observation might be explained by two main reasons: uncertainty gap and poor model calibration. Therefore, in this paper, we propose a novel approach to effectively integrate uncertainty-based active learning and LoRA. Specifically, for the uncertainty gap, we introduce a dynamic uncertainty measurement that combines the uncertainty of the base model and the uncertainty of the full model during the iteration of active learning. For poor model calibration, we incorporate the regularization method during LoRA training to keep the model from being over-confident, and the Monte-Carlo dropout mechanism is employed to enhance the uncertainty estimation. Experimental results show that the proposed approach outperforms existing baseline models on three complex reasoning tasks.
翻译:尽管大语言模型(LLMs)通过提示方法展现了强大的少样本学习能力,但在复杂推理任务中仍需要监督训练。由于LLMs参数量大且内存消耗高,参数高效微调(PEFT)方法和内存高效微调方法应运而生。然而,数据高效微调所关注的大规模标注数据消耗问题尚未得到充分探索。一种显然的方法是将PEFT方法与主动学习相结合,但实验结果表明这种组合并非易事且效果不佳。通过探针实验发现,这种现象可能由两个主要原因导致:不确定性差距和模型校准不佳。为此,本文提出了一种将基于不确定性的主动学习与LoRA有效融合的新方法。具体而言,针对不确定性差距,我们引入动态不确定性度量机制,在主动学习迭代过程中综合考量基础模型不确定性与完整模型不确定性;针对模型校准不佳问题,我们在LoRA训练过程中引入正则化方法以防止模型过度自信,并采用蒙特卡洛丢弃机制增强不确定性估计。实验结果表明,本方法在三个复杂推理任务上均优于现有基线模型。