Supervised finetuning (SFT) on instruction datasets has played a crucial role in achieving the remarkable zero-shot generalization capabilities observed in modern large language models (LLMs). However, the annotation efforts required to produce high quality responses for instructions are becoming prohibitively expensive, especially as the number of tasks spanned by instruction datasets continues to increase. Active learning is effective in identifying useful subsets of samples to annotate from an unlabeled pool, but its high computational cost remains a barrier to its widespread applicability in the context of LLMs. To mitigate the annotation cost of SFT and circumvent the computational bottlenecks of active learning, we propose using experimental design. Experimental design techniques select the most informative samples to label, and typically maximize some notion of uncertainty and/or diversity. In our work, we implement a framework that evaluates several existing and novel experimental design techniques and find that these methods consistently yield significant gains in label efficiency with little computational overhead. On generative tasks, our methods achieve the same generalization performance with only $50\%$ of annotation cost required by random sampling.
翻译:监督微调(SFT)在指令数据集上的应用,对于实现现代大型语言模型(LLM)所展现出的卓越零样本泛化能力起到了关键作用。然而,为指令生成高质量回复所需的标注工作正变得极其昂贵,尤其是在指令数据集涵盖的任务数量持续增长的情况下。主动学习能有效从未标注池中识别出有用的样本子集进行标注,但其高昂的计算成本仍然是其在LLM领域广泛应用的主要障碍。为了降低SFT的标注成本并规避主动学习的计算瓶颈,我们提出采用实验设计方法。实验设计技术选择信息量最大的样本进行标注,通常旨在最大化某种不确定性或多样性的度量。在本研究中,我们实现了一个评估多种现有及新颖实验设计技术的框架,并发现这些方法能以极小的计算开销持续带来标签效率的显著提升。在生成任务上,我们的方法仅需随机采样$50\%$的标注成本即可达到相同的泛化性能。