The diversity of post-training data is critical for effective downstream performance in large language models (LLMs). Many existing approaches to constructing post-training data quantify diversity using text-based metrics that capture linguistic variation, but such metrics provide only weak signals for the task-relevant features that determine downstream performance. In this work, we introduce Feature Activation Coverage (FAC) which measures data diversity in an interpretable feature space. Building upon this metric, we further propose a diversity-driven data synthesis framework, named FAC Synthesis, that first uses a sparse autoencoder to identify missing features from a seed dataset, and then generates synthetic samples that explicitly reflect these features. Experiments show that our approach consistently improves both data diversity and downstream performance on various tasks, including instruction following, toxicity detection, reward modeling, and behavior steering. Interestingly, we identify a shared, interpretable feature space across model families (i.e., LLaMA, Mistral, and Qwen), enabling cross-model knowledge transfer. Our work provides a solid and practical methodology for exploring data-centric optimization of LLMs.
翻译:后训练数据的多样性对于大型语言模型在下游任务中的有效表现至关重要。现有许多构建后训练数据的方法通过基于文本的指标来量化多样性,这些指标捕捉了语言层面的变化,但此类指标仅为决定下游性能的任务相关特征提供微弱信号。本研究引入特征激活覆盖度,该指标在可解释的特征空间中衡量数据多样性。基于此度量,我们进一步提出一个以多样性驱动的数据合成框架(称为FAC合成框架),该框架首先使用稀疏自编码器从种子数据集中识别缺失特征,随后生成明确体现这些特征的合成样本。实验表明,我们的方法在多项任务(包括指令遵循、毒性检测、奖励建模和行为引导)中持续提升了数据多样性与下游性能。值得注意的是,我们发现不同模型系列(如LLaMA、Mistral和Qwen)间存在共享的可解释特征空间,从而实现了跨模型知识迁移。本研究为探索以数据为中心的大型语言模型优化提供了坚实且实用的方法论。