With the rapid development of machine learning applications on time-series data, accurately assessing the value of training samples has become essential for data selection, noise detection, and model optimization. However, traditional data valuation methods usually assume that samples are independent and identically distributed, and thus ignore the time-varying nature of sample value in time-series data. This paper proposes an improved temporal Shapley data valuation method that enables accurate sample valuation for time-series data through a temporal decay mechanism and a multi-scale fusion strategy. Specifically, we propose three progressively enhanced temporal Shapley methods. Temporal-Decay Shapley (TDS) incorporates temporal information into Shapley value computation through exponential decay weights; the improved TDS adopts power exponential decay to better adapt to nonlinear temporal drift; and Multi-Scale Temporal-Decay Shapley (MS-TDS) constructs a multi-scale fusion mechanism that balances the value of short-term hotspot samples and long-term foundational samples through parallel multi-scale valuation and sample-level adaptive fusion. Experimental results show that the proposed methods generally outperform traditional methods in noise detection and high-value data identification tasks, with more evident advantages under most strongly temporal settings, thereby effectively improving the accuracy and robustness of data valuation.
翻译:随着机器学习在时间序列数据上的应用快速发展,准确评估训练样本的价值对数据选择、噪声检测和模型优化变得至关重要。然而,传统数据估值方法通常假设样本独立同分布,从而忽略了时间序列数据中样本价值的时变特性。本文提出一种改进的时序沙普利数据估值方法,通过时序衰减机制和多尺度融合策略实现对时间序列数据的精准样本估值。具体而言,我们提出三种渐进增强的时序沙普利方法:时序衰减沙普利(TDS)通过指数衰减权重将时序信息融入沙普利值计算;改进的TDS采用幂指数衰减以更好地适配非线性时序漂移;多尺度时序衰减沙普利(MS-TDS)构建多尺度融合机制,通过并行多尺度估值和样本级自适应融合,平衡短期热点样本与长期基础样本的价值。实验结果表明,所提方法在噪声检测和高价值数据识别任务中普遍优于传统方法,在大多数强时序设定下优势更为显著,从而有效提升了数据估值的准确性和鲁棒性。