Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.
翻译:指令调优语言模型日益依赖大规模多轮对话语料,但这些数据集常存在噪声多、结构不一致的问题,表现为话题漂移、重复闲聊及跨轮次答案格式不匹配。我们从数据选择视角出发,提出**MDS**(多轮对话选择)框架——一种对话层级评分机制,对完整对话而非孤立轮次进行评分。MDS结合全局覆盖阶段(在用户查询轨迹空间中进行分箱选择,保留代表性且无冗余的对话)与局部结构阶段(通过实体锚定的话题连贯性与信息递进性评估对话内部可靠性,并结合查询-答案格式一致性实现功能对齐)。在三个多轮基准测试及一个领域内银行测试集上,MDS优于强基线单轮选择器、对话级大语言模型评分器及启发式方法,在无参考与有参考指标上均取得最佳综合排名,且在相同训练预算下对长对话的鲁棒性更强。代码与资源已纳入补充材料。