Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
翻译:监督微调(SFT)在指令数据集上的应用,对于现代大语言模型(LLMs)展现出的卓越零样本泛化能力起到了关键作用。然而,为指令生成高质量回复所需的标注成本正变得日益高昂,尤其是当指令数据集涵盖的任务数量持续增加时。主动学习能够从未标记的数据池中有效筛选出有用的样本子集进行标注,但其高计算成本仍是其在LLMs中广泛应用的障碍。为降低SFT的标注成本并规避主动学习的计算瓶颈,我们提出采用实验设计方法。实验设计技术选取信息量最大的样本进行标注,通常最大程度地体现某种不确定性或多样性。在本工作中,我们实现了一个评估多种现有及新颖实验设计技术的框架,并发现这些方法在几乎无额外计算开销的情况下,始终能显著提升标签效率。在生成型任务中,我们的方法仅需随机采样所需标注成本的50%,即可达到相同的泛化性能。