As machine learning models become increasingly prevalent in time series applications, Explainable Artificial Intelligence (XAI) methods are essential for understanding their predictions. Within XAI, feature attribution methods aim to identify which input features contribute the most to a model's prediction, with their evaluation typically relying on perturbation-based metrics. Through systematic empirical analysis across multiple datasets, model architectures, and perturbation strategies, we reveal previously overlooked class-dependent effects in these metrics: they show varying effectiveness across classes, achieving strong results for some while remaining less sensitive to others. In particular, we find that the most effective perturbation strategies often demonstrate the most pronounced class differences. Our analysis suggests that these effects arise from the learned biases of classifiers, indicating that perturbation-based evaluation may reflect specific model behaviors rather than intrinsic attribution quality. We propose an evaluation framework with a class-aware penalty term to help assess and account for these effects in evaluating feature attributions, offering particular value for class-imbalanced datasets. Although our analysis focuses on time series classification, these class-dependent effects likely extend to other structured data domains where perturbation-based evaluation is common.
翻译:随着机器学习模型在时间序列应用中日渐普及,可解释人工智能(XAI)方法对于理解模型预测变得至关重要。在XAI领域,特征归因方法旨在识别哪些输入特征对模型预测贡献最大,其评估通常依赖于基于扰动的度量指标。通过对多个数据集、模型架构和扰动策略的系统性实证分析,我们揭示了这些指标中先前被忽视的类别依赖效应:它们在不同类别间表现出差异化的有效性,对某些类别能取得强效结果,而对其他类别则保持较低的敏感性。特别地,我们发现最有效的扰动策略往往展现出最显著的类别差异。我们的分析表明,这些效应源于分类器习得的偏差,这意味着基于扰动的评估可能反映的是特定模型行为,而非归因方法的内在质量。我们提出了一个包含类别感知惩罚项的评估框架,以帮助评估特征归因时考量这些效应,尤其对类别不平衡数据集具有重要价值。尽管我们的分析聚焦于时间序列分类,但这些类别依赖效应很可能延伸至其他常采用基于扰动评估的结构化数据领域。