Training large language models (LLMs) with a large and diverse instruction dataset aligns the models to comprehend and follow human instructions. Recent works have shown that using a small set of high-quality instructions can outperform using large yet more noisy ones. Because instructions are unlabeled and their responses are natural text, traditional active learning schemes with the model's confidence cannot be directly applied to the selection of unlabeled instructions. In this work, we propose a novel method for instruction selection, called SelectLLM, that leverages LLMs for the selection of high-quality instructions. Our high-level idea is to use LLMs to estimate the usefulness and impactfulness of each instruction without the corresponding labels (i.e., responses), via prompting. SelectLLM involves two steps: dividing the unlabelled instructions using a clustering algorithm (e.g., CoreSet) to multiple clusters, and then prompting LLMs to choose high-quality instructions within each cluster. SelectLLM showed comparable or slightly better performance on the popular instruction benchmarks, compared to the recent state-of-the-art selection methods. All code and data are publicly available (https://github.com/minnesotanlp/select-llm).
翻译:训练大型语言模型(LLM)时,使用大规模且多样化的指令数据集有助于模型理解并遵循人类指令。近期研究表明,采用少量高质量指令的效果优于使用数量庞大但噪声较多的指令。由于指令未标注且其对应回答为自然语言文本,传统基于模型置信度的主动学习方案无法直接应用于未标注指令的选择。本文提出一种新颖的指令选择方法——SelectLLM,利用LLM自身筛选高质量指令。核心思路是通过提示(prompting)技术,使LLM在无需对应标签(即回答)的情况下估算每条指令的实用性和影响力。SelectLLM包含两步:首先采用聚类算法(如CoreSet)将未标注指令划分为多个簇,随后通过提示引导LLM在每个簇中挑选高质量指令。在主流指令基准测试中,SelectLLM展现出与最新最优选择方法相当甚至略优的性能。所有代码与数据均已开源(https://github.com/minnesotanlp/select-llm)。