Evaluating the contribution of individual data points to a model's prediction is critical for interpreting model predictions and improving model performance. Existing data contribution methods have been applied to various data types, including tabular data, images, and texts; however, their primary focus has been on i.i.d. settings. Despite the pressing need for principled approaches tailored to time series datasets, the problem of estimating data contribution in such settings remains unexplored, possibly due to challenges associated with handling inherent temporal dependencies. This paper introduces TimeInf, a data contribution estimation method for time-series datasets. TimeInf uses influence functions to attribute model predictions to individual time points while preserving temporal structures. Our extensive empirical results demonstrate that TimeInf outperforms state-of-the-art methods in identifying harmful anomalies and helpful time points for forecasting. Additionally, TimeInf offers intuitive and interpretable attributions of data values, allowing us to easily distinguish diverse anomaly patterns through visualizations.
翻译:评估单个数据点对模型预测的贡献度对于解释模型预测及提升模型性能至关重要。现有的数据贡献度评估方法已应用于表格数据、图像及文本等多种数据类型,但其主要关注点仍局限于独立同分布场景。尽管针对时间序列数据集开发原理性方法的需求日益迫切,但由于处理固有时间依赖性的相关挑战,该场景下的数据贡献度估计问题仍未得到充分探索。本文提出TimeInf——一种面向时间序列数据集的数据贡献度估计方法。该方法利用影响函数将模型预测归因于各时间点,同时保持时间结构完整性。大量实验结果表明,TimeInf在识别有害异常点和有益预测时间点方面均优于现有前沿方法。此外,TimeInf能够提供直观可解释的数据值归因结果,使我们能够通过可视化手段清晰区分不同类型的异常模式。