Shapley values have become a cornerstone of explainable AI, but they are computationally expensive to use, especially when features are dependent. Evaluating them requires approximating a large number of conditional expectations, either via Monte Carlo integration or regression. Until recently it has not been possible to fully exploit deep learning for the regression approach, because retraining for each conditional expectation takes too long. Tabular foundation models such as TabPFN overcome this computational hurdle by leveraging in-context learning, so each conditional expectation can be approximated without any re-training. In this paper, we compute Shapley values with multiple variants of TabPFN and compare their performance with state-of-the-art methods on both simulated and real datasets. In most cases, TabPFN yields the best performance; where it does not, it is only marginally worse than the best method, at a fraction of the runtime. We discuss further improvements and how tabular foundation models can be better adapted specifically for conditional Shapley value estimation.
翻译:Shapley值已成为可解释人工智能的基石,但其计算成本高昂,尤其在特征存在依赖关系时更为显著。评估Shapley值需要通过蒙特卡洛积分或回归方法近似大量条件期望。由于针对每个条件期望重新训练模型耗时过长,回归方法长期未能充分利用深度学习优势。以TabPFN为代表的表格基础模型通过情境学习机制突破了这一计算瓶颈,使得每个条件期望的近似计算无需任何重新训练。本文采用多种TabPFN变体计算Shapley值,并在模拟数据集和真实数据集上与前沿方法进行性能对比。在大多数情况下,TabPFN展现出最优性能;即使在少数未达到最优的情况下,其性能也仅略逊于最佳方法,而运行时间却大幅缩短。我们进一步探讨了改进方向,以及如何使表格基础模型更好地适配条件Shapley值估计任务。