Shapley-based data valuation provides a principled way to quantify the contribution of training data, but its high computational cost makes it impractical in dynamic settings where tasks and training players evolve. Existing methods treat Shapley computation as a one-shot process and collapse contributions into aggregated scores, preventing reuse and requiring recomputation under any change. We introduce a new perspective that represents Shapley values as a player-by-task matrix and formulates dynamic valuation as a structured matrix maintenance problem. We exploit the fact that each task depends on a small subset of training players and that similar tasks yield similar valuations, leading to utility locality and coalition locality. Based on these insights, we propose D-Shap, a dynamic valuation framework that enables efficient updates by modifying only a small portion of the matrix: new task valuations are inferred via structure-aware interpolation, while updates induced by new players are confined to affected local matrix blocks. To eliminate the need for pre-specified evaluation tasks, we introduce self-valuation, which constructs the initial matrix directly from training data, supported by scalable subset reuse and coverage-aware anchor selection. Experiments across diverse models show that D-Shap performs task updates in milliseconds and reduces the cost of player updates by up to three orders of magnitude, while achieving valuation quality competitive with full recomputation.
翻译:基于沙普利值的数据估值提供了一种量化训练数据贡献的原则性方法,但其高昂的计算成本使其在任务和训练参与者动态演化的场景中难以实用。现有方法将沙普利计算视为一次性过程,并将贡献值压缩为聚合分数,这阻碍了复用机制,导致任何变化都需重新计算。我们提出一种新视角:将沙普利值表示为参与者-任务矩阵,并将动态估值形式化为结构化矩阵维护问题。我们利用每个任务仅依赖少量训练参与者、以及相似任务产生相似估值这一事实,推导出效用局部性和联盟局部性。基于这些洞察,我们提出D-Shap——一种动态估值框架,通过仅修改矩阵的极小部分实现高效更新:新任务估值通过结构感知插值推断,而新参与者引发的更新则被限制在受影响的局部矩阵块内。为消除预定义评估任务的需求,我们引入自估值机制——直接从训练数据构建初始矩阵,并辅以可扩展子集复用和覆盖感知锚点选择。跨多种模型的实验表明,D-Shap可在毫秒级完成任务更新,并将参与者更新成本降低三个数量级,同时保持与完全重算相当的估值质量。