Instruction tuning relies on large instruction-response corpora whose quality and composition strongly affect downstream performance. We propose Answer Divergence-Guided Selection (ADG), which selects instruction data based on the geometric structure of multi-sample outputs. ADG draws several high-temperature generations per instruction, maps responses into an embedding space, and computes an output divergence score that jointly encodes dispersion magnitude and shape anisotropy. High scores correspond to instructions whose answers are both far apart and multi-modal, rather than clustered paraphrases along a single direction. Across two backbones and three public instruction pools, fine-tuning on only 10K ADG-selected examples consistently outperforms strong selectors on six benchmarks spanning reasoning, knowledge, and coding. Analyses further show that both dispersion magnitude and shape anisotropy are necessary, supporting answer divergence as a practical signal for instruction data selection. Code and appendix are included in the supplementary materials.
翻译:指令微调依赖于大规模的指令-回答语料库,其质量和构成显著影响下游任务性能。我们提出答案发散度引导选择方法(ADG),该方法基于多样本输出的几何结构选择指令数据。ADG为每条指令生成多个高温采样结果,将回答映射到嵌入空间,并计算同时编码分散幅度与形状各向异性的输出发散度得分。高得分对应的指令,其答案不仅彼此相距遥远,且呈现多模态分布,而非沿单一方向聚集的同义改写。在两个骨干模型和三个公开指令池上,仅用10K个ADG选择的样本进行微调,便在涵盖推理、知识与编程的六个基准测试中持续优于强选择器。进一步分析表明,分散幅度与形状各向异性均不可或缺,支持将答案发散度作为指令数据选择的实用信号。代码与附录包含在补充材料中。