We address the task of probabilistic anomaly attribution in the black-box regression setting, where the goal is to compute the probability distribution of the attribution score of each input variable, given an observed anomaly. The training dataset is assumed to be unavailable. This task differs from the standard XAI (explainable AI) scenario, since we wish to explain the anomalous deviation from a black-box prediction rather than the black-box model itself. We begin by showing that mainstream model-agnostic explanation methods, such as the Shapley values, are not suitable for this task because of their ``deviation-agnostic property.'' We then propose a novel framework for probabilistic anomaly attribution that allows us to not only compute attribution scores as the predictive mean but also quantify the uncertainty of those scores. This is done by considering a generative process for perturbations that counter-factually bring the observed anomalous observation back to normalcy. We introduce a variational Bayes algorithm for deriving the distributions of per variable attribution scores. To the best of our knowledge, this is the first probabilistic anomaly attribution framework that is free from being deviation-agnostic.
翻译:我们针对黑盒回归场景下的概率性异常归因任务展开研究,其目标是在给定观测异常的情况下,计算每个输入变量归因分数的概率分布。假设训练数据集不可用。该任务与标准可解释人工智能(XAI)场景的区别在于,我们旨在解释黑盒预测中的异常偏差,而非黑盒模型本身。我们首先证明,主流模型无关解释方法(如Shapley值)因具有“偏差无关属性”而不适用于此任务。随后,我们提出一种新颖的概率性异常归因框架,该框架不仅能通过预测均值计算归因分数,还能量化这些分数的不确定性。其核心在于考虑一种生成式扰动过程——通过反事实方式将观测到的异常观测值恢复至正常状态。我们引入变分贝叶斯算法来推导各变量归因分数的分布。据我们所知,这是首个摆脱偏差无关属性的概率性异常归因框架。