Being able to provide explanations for a model's decision has become a central requirement for the development, deployment, and adoption of machine learning models. However, we are yet to understand what explanation methods can and cannot do. How do upstream factors such as data, model prediction, hyperparameters, and random initialization influence downstream explanations? While previous work raised concerns that explanations (E) may have little relationship with the prediction (Y), there is a lack of conclusive study to quantify this relationship. Our work borrows tools from causal inference to systematically assay this relationship. More specifically, we study the relationship between E and Y by measuring the treatment effect when intervening on their causal ancestors, i.e., on hyperparameters and inputs used to generate saliency-based Es or Ys. Our results suggest that the relationships between E and Y is far from ideal. In fact, the gap between 'ideal' case only increase in higher-performing models -- models that are likely to be deployed. Our work is a promising first step towards providing a quantitative measure of the relationship between E and Y, which could also inform the future development of methods for E with a quantitative metric.
翻译:为模型决策提供解释已成为机器学习模型开发、部署和应用的核心要求。然而,我们尚不明确解释方法的能力边界。诸如数据、模型预测、超参数和随机初始化等上游因素如何影响下游解释?尽管先前研究指出解释(E)与预测(Y)可能关联甚微,但尚缺乏量化这种关系的结论性研究。我们的工作借用因果推断工具系统性地分析这种关系。具体而言,我们通过测量干预其因果祖先(即用于生成基于显著性的E或Y的超参数与输入)时的处理效应,来研究E与Y之间的关系。结果表明,E与Y之间的关系远非理想状态。事实上,在性能更优(更可能被部署)的模型中,与“理想”情形之间的差距反而更大。我们的工作是迈向E与Y关系量化测量的第一步,有望为基于量化指标的E方法未来发展提供指导。