Machine learning (ML) algorithms can often differ in performance across domains. Understanding $\textit{why}$ their performance differs is crucial for determining what types of interventions (e.g., algorithmic or operational) are most effective at closing the performance gaps. Existing methods focus on $\textit{aggregate decompositions}$ of the total performance gap into the impact of a shift in the distribution of features $p(X)$ versus the impact of a shift in the conditional distribution of the outcome $p(Y|X)$; however, such coarse explanations offer only a few options for how one can close the performance gap. $\textit{Detailed variable-level decompositions}$ that quantify the importance of each variable to each term in the aggregate decomposition can provide a much deeper understanding and suggest much more targeted interventions. However, existing methods assume knowledge of the full causal graph or make strong parametric assumptions. We introduce a nonparametric hierarchical framework that provides both aggregate and detailed decompositions for explaining why the performance of an ML algorithm differs across domains, without requiring causal knowledge. We derive debiased, computationally-efficient estimators, and statistical inference procedures for asymptotically valid confidence intervals.
翻译:机器学习算法在不同领域中的性能往往存在差异。理解其性能差异的原因,对于确定何种干预措施(如算法改进或操作优化)最有助于缩小性能差距至关重要。现有方法侧重于将总性能差距进行**整体分解**,将其归因于特征分布$p(X)$的偏移与结果条件分布$p(Y|X)$的偏移所带来的影响;然而,这种粗粒度的解释仅能为弥合性能差距提供有限的方案选择。能够量化每个变量对整体分解中各成分贡献程度的**细粒度变量级分解**,可以提供更深层次的洞察,并引导更具针对性的干预措施。然而,现有方法要么需要已知完整因果图,要么依赖于强参数假设。我们提出了一种非参数化的分层框架,该框架无需因果知识即可提供解释ML算法跨领域性能差异的整体分解与细粒度分解。我们推导了去偏、计算高效的估计量,并建立了渐近有效置信区间的统计推断流程。