Despite the popularity of Shapley Values in explaining neural text classification models, computing them is prohibitive for large pretrained models due to a large number of model evaluations. In practice, Shapley Values are often estimated with a small number of stochastic model evaluations. However, we show that the estimated Shapley Values are sensitive to random seed choices -- the top-ranked features often have little overlap across different seeds, especially on examples with longer input texts. This can only be mitigated by aggregating thousands of model evaluations, which on the other hand, induces substantial computational overheads. To mitigate the trade-off between stability and efficiency, we develop an amortized model that directly predicts each input feature's Shapley Value without additional model evaluations. It is trained on a set of examples whose Shapley Values are estimated from a large number of model evaluations to ensure stability. Experimental results on two text classification datasets demonstrate that our amortized model estimates Shapley Values accurately with up to 60 times speedup compared to traditional methods. Furthermore, the estimated values are stable as the inference is deterministic. We release our code at https://github.com/yangalan123/Amortized-Interpretability.
翻译:尽管Shapley值在解释神经文本分类模型方面广受欢迎,但由于需要大量模型评估,对于大型预训练模型而言计算成本过高。实际应用中常通过少量随机模型评估来估计Shapley值,但研究表明:估计结果对随机种子选择高度敏感——不同种子下排名靠前的特征重叠度极低,尤其在输入文本较长的样本中更为显著。仅通过聚合数千次模型评估虽可缓解该问题,但会带来巨大计算开销。为权衡稳定性与效率,我们提出一种分摊模型,无需额外模型评估即可直接预测每个输入特征的Shapley值。该模型基于大量模型评估获得稳定Shapley值的样本集进行训练,在两类文本分类数据集上的实验表明:与经典方法相比,本分摊模型能以最高60倍的加速比准确估计Shapley值,且因推理过程确定性,估计结果具有稳定性。相关代码已开源至https://github.com/yangalan123/Amortized-Interpretability。