Previous work has highlighted that existing post-hoc explanation methods exhibit disparities in explanation fidelity (across 'race' and 'gender' as sensitive attributes), and while a large body of work focuses on mitigating these issues at the explanation metric level, the role of the data generating process and black box model in relation to explanation disparities remains largely unexplored. Accordingly, through both simulations as well as experiments on a real-world dataset, we specifically assess challenges to explanation disparities that originate from properties of the data: limited sample size, covariate shift, concept shift, omitted variable bias, and challenges based on model properties: inclusion of the sensitive attribute and appropriate functional form. Through controlled simulation analyses, our study demonstrates that increased covariate shift, concept shift, and omission of covariates increase explanation disparities, with the effect pronounced higher for neural network models that are better able to capture the underlying functional form in comparison to linear models. We also observe consistent findings regarding the effect of concept shift and omitted variable bias on explanation disparities in the Adult income dataset. Overall, results indicate that disparities in model explanations can also depend on data and model properties. Based on this systematic investigation, we provide recommendations for the design of explanation methods that mitigate undesirable disparities.
翻译:先前的研究已指出,现有的事后解释方法在解释保真度上(以‘种族’和‘性别’作为敏感属性)表现出差异,尽管大量工作致力于在解释度量层面缓解这些问题,但数据生成过程及黑箱模型与解释差异之间的关系在很大程度上仍未得到探索。因此,通过模拟实验以及在真实世界数据集上的实验,我们具体评估了源于数据属性(有限样本量、协变量偏移、概念偏移、遗漏变量偏差)以及模型属性(包含敏感属性与适当的函数形式)对解释差异带来的挑战。通过受控模拟分析,我们的研究表明,增大协变量偏移、概念偏移和协变量遗漏会增加解释差异,且相对于线性模型而言,这种现象在更能捕捉底层函数形式的神经网络模型中更为显著。在成人收入数据集中,我们也观察到关于概念偏移和遗漏变量偏差对解释差异影响的一致发现。总体而言,结果表明模型解释中的差异也可能取决于数据和模型的属性。基于这一系统性研究,我们为设计能够减少不良差异的解释方法提供了建议。