As the cost of training large language models continues to increase and high-quality training data become increasingly scarce, selecting high-value samples or synthesizing effective training data under limited data budgets has emerged as a critical research problem. Most existing data selection methods rely on static criteria, such as difficulty, uncertainty, or heuristics, and fail to model the evolving relationship between the model and the data. Inspired by the educational theory of the Zone of Proximal Development (ZPD), we propose ZPD Detector, a data selection framework that adopts a bidirectional perspective between models and data by explicitly modeling the alignment between sample difficulty and the model's current capability. ZPD Detector integrates difficulty calibration, model capability estimation based on Item Response Theory (IRT), and a capability-difficulty matching score to dynamically identify the most informative samples at each learning stage, improving data utilization efficiency; moreover, this dynamic matching strategy provides new insights into training strategy design. All code and data will be released after our work be accepted to support reproducible researc
翻译:随着大语言模型训练成本持续攀升及高质量训练数据日益稀缺,在有限数据预算下筛选高价值样本或合成有效训练数据已成为关键研究问题。现有数据选择方法多依赖静态标准(如难度、不确定性或启发式规则),未能建模模型与数据间的动态演化关系。受教育学中"最近发展区"理论启发,我们提出ZPD检测器——一种通过显式建模样本难度与模型当前能力对齐关系的双向数据选择框架。该框架融合难度校准、基于项目反应理论的模型能力估计以及能力-难度匹配评分,动态识别各学习阶段最具信息量的样本,提升数据利用效率;此外,这种动态匹配策略为训练策略设计提供了新视角。所有代码与数据将在论文录用后开源以支持可重复研究。