Recent works have shown that by curating high quality and diverse instruction tuning datasets, we can significantly improve instruction-following capabilities. However, creating such datasets is difficult and most works rely on manual curation or proprietary language models. Automatic data curation is difficult as it is still not clear how we can define diversity for instruction tuning, how diversity and quality depend on one other, and how we can optimize dataset quality and diversity. To resolve these issue, we propose a new algorithm, Quality-Diversity Instruction Tuning (QDIT). QDIT provides a simple method to simultaneously control dataset diversity and quality, allowing us to conduct an in-depth study on the effect of diversity and quality on instruction tuning performance. From this study we draw two key insights (1) there is a natural tradeoff between data diversity and quality and (2) increasing data diversity significantly improves the worst case instruction following performance, therefore improving robustness. We validate the performance of QDIT on several large scale instruction tuning datasets, where we find it can substantially improve worst and average case performance compared to quality-driven data selection.
翻译:近期研究表明,通过构建高质量且多样化的指令调优数据集,可以显著提升模型遵循指令的能力。然而,创建此类数据集十分困难,现有工作大多依赖人工筛选或专有语言模型。自动数据构建面临诸多挑战:目前仍不清楚如何定义指令调优的多样性指标,如何理解多样性与质量之间的依存关系,以及如何协同优化数据集的质量与多样性。为解决这些问题,我们提出了一种新算法——质量-多样性指令调优(QDIT)。QDIT提供了一种同步控制数据集多样性与质量的简洁方法,使我们能够深入探究多样性与质量对指令调优性能的影响机制。通过系统性研究,我们获得两个关键发现:(1)数据多样性与质量存在天然权衡关系;(2)提升数据多样性可显著改善最差情况下的指令遵循性能,从而增强模型鲁棒性。我们在多个大规模指令调优数据集上验证了QDIT的性能,结果表明相较于纯质量驱动的数据选择方法,该算法能大幅提升最差情况与平均情况下的模型表现。