We investigate the problem of explainability in machine learning.To address this problem, Feature Attribution Methods (FAMs) measure the contribution of each feature through a perturbation test, where the difference in prediction is compared under different perturbations.However, such perturbation tests may not accurately distinguish the contributions of different features, when their change in prediction is the same after perturbation.In order to enhance the ability of FAMs to distinguish different features' contributions in this challenging setting, we propose to utilize the probability (PNS) that perturbing a feature is a necessary and sufficient cause for the prediction to change as a measure of feature importance.Our approach, Feature Attribution with Necessity and Sufficiency (FANS), computes the PNS via a perturbation test involving two stages (factual and interventional).In practice, to generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution.Finally, we combine FANS and gradient-based optimization to extract the subset with the largest PNS.We demonstrate that FANS outperforms existing feature attribution methods on six benchmarks.
翻译:我们研究了机器学习中的可解释性问题。为解决该问题,特征归因方法(FAMs)通过扰动测试衡量每个特征的贡献,即比较不同扰动下的预测差异。然而,当不同特征被扰动后预测变化相同时,这类扰动测试可能无法准确区分它们的贡献。为增强FAMs在此困难场景下区分不同特征贡献的能力,我们提出将特征扰动导致预测变化的必要且充分原因的概率(PNS)作为特征重要性的度量指标。我们的方法——基于必要性和充分性的特征归因(FANS),通过包含事实阶段和干预阶段的双阶段扰动测试来计算PNS。在实现中,为生成反事实样本,我们采用基于观测样本的重采样方法近似所需的条件分布。最后,我们结合FANS与基于梯度的优化方法,提取具有最大PNS的特征子集。实验表明,在六个基准测试上,FANS的性能优于现有特征归因方法。