Exact computation of various machine learning explanations requires numerous model evaluations and in extreme cases becomes impractical. The computational cost of approximation increases with an ever-increasing size of data and model parameters. Many heuristics have been proposed to approximate post-hoc explanations efficiently. This paper shows that the standard i.i.d. sampling used in a broad spectrum of algorithms for explanation estimation leads to an approximation error worthy of improvement. To this end, we introduce Compress Then Explain (CTE), a new paradigm for more efficient and accurate explanation estimation. CTE uses distribution compression through kernel thinning to obtain a data sample that best approximates the marginal distribution. We show that CTE improves the estimation of removal-based local and global explanations with negligible computational overhead. It often achieves an on-par explanation approximation error using 2-3x less samples, i.e. requiring 2-3x less model evaluations. CTE is a simple, yet powerful, plug-in for any explanation method that now relies on i.i.d. sampling.
翻译:精确计算各类机器学习解释通常需要大量模型评估,在极端情况下甚至变得不可行。随着数据和模型参数规模的持续增长,近似计算成本不断攀升。现有研究提出了多种启发式方法以高效近似事后解释。本文指出,广泛应用于解释估计算法中的标准独立同分布采样会导致存在改进空间的近似误差。为此,我们提出"先压缩后解释"这一新范式,以实现更高效、更准确的解释估计。该方法通过核稀疏化进行分布压缩,从而获得最能逼近边缘分布的数据样本。我们证明,该方法能以可忽略的计算开销改进基于移除的局部与全局解释估计,通常仅需减少2-3倍样本量(即减少2-3倍模型评估次数)即可达到相当的解释近似误差。该方法是当前依赖独立同分布采样的任何解释方法的简易而强大的即插即用模块。