Additive feature explanations rely primarily on game-theoretic notions such as the Shapley value by viewing features as cooperating players. The Shapley value's popularity in and outside of explainable AI stems from its axiomatic uniqueness. However, its computational complexity severely limits practicability. Most works investigate the uniform approximation of all features' Shapley values, needlessly consuming samples for insignificant features. In contrast, identifying the $k$ most important features can already be sufficiently insightful and yields the potential to leverage algorithmic opportunities connected to the field of multi-armed bandits. We propose Comparable Marginal Contributions Sampling (CMCS), a method for the top-$k$ identification problem utilizing a new sampling scheme taking advantage of correlated observations. We conduct experiments to showcase the efficacy of our method in compared to competitive baselines. Our empirical findings reveal that estimation quality for the approximate-all problem does not necessarily transfer to top-$k$ identification and vice versa.
翻译:加性特征解释方法主要基于博弈论概念(如Shapley值),将特征视为协作参与者。Shapley值在可解释AI领域内外广受欢迎,源于其公理化的唯一性。然而,其计算复杂度严重限制了实际应用。现有研究大多关注所有特征Shapley值的均匀近似,不必要地为次要特征消耗采样资源。相比之下,识别最重要的$k$个特征往往已能提供足够洞察,并具备利用多臂赌博机领域相关算法机遇的潜力。本文提出可比边际贡献采样法(CMCS),该方法通过利用相关观测值的新型采样方案解决top-$k$识别问题。我们通过实验证明,相较于竞争基线方法,本方法具有显著效能。实证结果表明:近似全特征问题的估计质量未必适用于top-$k$识别问题,反之亦然。