Instruction tuning has emerged as a paramount method for tailoring the behaviors of LLMs. Recent work has unveiled the potential for LLMs to achieve high performance through fine-tuning with a limited quantity of high-quality instruction data. Building upon this approach, we further explore the impact of prompt's robustness on the selection of high-quality instruction data. This paper proposes a pioneering framework of high-quality online instruction data mining for instruction tuning, focusing on the impact of prompt's robustness on the data mining process. Our notable innovation, is to generate the adversarial instruction data by conducting the attack for the prompt of online instruction data. Then, we introduce an Adversarial Instruction-Following Difficulty metric to measure how much help the adversarial instruction data can provide to the generation of the corresponding response. Apart from it, we propose a novel Adversarial Instruction Output Embedding Consistency approach to select high-quality online instruction data. We conduct extensive experiments on two benchmark datasets to assess the performance. The experimental results serve to underscore the effectiveness of our proposed two methods. Moreover, the results underscore the critical practical significance of considering prompt's robustness.
翻译:指令微调已成为定制大型语言模型行为的关键方法。近期研究揭示了通过少量高质量指令数据进行微调即可使大语言模型实现高性能的潜力。基于此方法,我们进一步探究了提示词鲁棒性对高质量指令数据选择的影响。本文提出了一种面向指令微调的高质量在线指令数据挖掘创新框架,重点关注提示词鲁棒性对数据挖掘过程的影响。我们的核心创新在于通过对在线指令数据的提示词实施攻击来生成对抗性指令数据。进而,我们提出了对抗性指令跟随难度度量标准,用以评估对抗性指令数据对生成相应回复所能提供的助益程度。此外,我们提出了一种新颖的对抗性指令输出嵌入一致性方法,用于筛选高质量的在线指令数据。我们在两个基准数据集上进行了大量实验以评估性能。实验结果充分证明了我们提出的两种方法的有效性。更重要的是,这些结果凸显了考虑提示词鲁棒性所具有的关键现实意义。