Aligning large language models(LLMs) with human is a critical step in effectively utilizing their pre-trained capabilities across a wide array of language tasks. Current instruction tuning practices often rely on expanding dataset size without a clear strategy for ensuring data quality, which can inadvertently introduce noise and degrade model performance. To address this challenge, we introduce Nuggets, a novel and efficient methodology that employs one shot learning to select high-quality instruction data from expansive datasets. Nuggets assesses the potential of individual instruction examples to act as effective one shot examples, thereby identifying those that can significantly enhance diverse task performance. Nuggets utilizes a scoring system based on the impact of candidate examples on the perplexity of a diverse anchor set, facilitating the selection of the most beneficial data for instruction tuning. Through rigorous testing on two benchmarks, including MT-Bench and Alpaca-Eval, we demonstrate that instruction tuning with the top 1% of Nuggets-curated examples substantially outperforms conventional methods that use the full dataset. These findings advocate for a data selection paradigm that prioritizes quality, offering a more efficient pathway to align LLMs with humans.
翻译:使大型语言模型(LLMs)与人类对齐是有效利用其预训练能力完成广泛语言任务的关键步骤。当前的指令微调实践通常依赖于扩大数据集规模,而缺乏明确的数据质量保证策略,这可能会无意中引入噪声并降低模型性能。为应对这一挑战,我们提出了一种新颖高效的方法Nuggets,利用单样本学习从海量数据集中筛选高质量指令数据。Nuggets通过评估单个指令示例作为有效单样本示例的潜力,识别能显著提升多样化任务性能的样本。该方法基于候选示例对多样化锚点集困惑度的影响构建评分系统,从而筛选出对指令微调最有利的数据。通过在MT-Bench和Alpaca-Eval两个基准上的严格测试,我们证明使用Nuggets筛选的前1%示例进行指令微调,其性能显著优于使用完整数据集的传统方法。这些发现倡导了一种优先考虑质量的数据筛选范式,为LLMs与人类对齐提供了更高效的路径。