Shapley and Banzhaf interactions capture the complex dynamics inherent in modern machine learning applications. However, current estimators for these higher-order interactions trade off between speed and accuracy. To overcome this limitation, we introduce ProxySHAP. ProxySHAP reconciles the high sample efficiency of tree-based proxy models with a principled path to consistency via residual correction. On a theoretical level, we derive a polynomial-time generalization of interventional TreeSHAP to compute exact interaction indices for tree ensembles, successfully bypassing exponential tree-depth dependencies in prior methods. Furthermore, we formally analyze the residual adjustment strategy, characterizing the specific conditions under which Maximum Sample Reuse (MSR) corrects proxy bias without its variance scaling exponentially with interaction size. Extensive benchmarking demonstrates that ProxySHAP sets a new state-of-the-art standard for approximation quality, including in large-scale applications with thousands of features. By achieving the lowest error in both small- and large-budget regimes, ProxySHAP significantly outperforms the prior best estimators ProxySPEX and KernelSHAP-IQ, while also delivering superior performance on downstream explainability tasks.
翻译:Shapley交互与Banzhaf交互能够捕捉现代机器学习应用中固有的复杂动态关系。然而,当前针对这些高阶交互的估计方法需要在速度与精度之间进行权衡。为突破这一局限,我们提出了ProxySHAP方法。该方法将基于树的代理模型的高样本效率与通过残差校正实现一致性的严谨路径相结合。在理论层面,我们将介入式TreeSHAP推广至多项式时间复杂度版本,首次实现树集成模型交互指数的精确计算,成功规避了现有方法中指数级树深度依赖问题。此外,我们严格分析了残差调整策略,明确了最大样本复用(MSR)在校正代理偏差时,其方差不会随交互规模呈指数增长的特定条件。大量基准测试表明,ProxySHAP在近似质量上树立了新的行业标杆,在包含数千特征的大规模应用场景中亦表现出色。通过在小预算与大预算场景下均实现最低误差,ProxySHAP显著优于此前最佳估计器ProxySPEX与KernelSHAP-IQ,并在下游可解释性任务中展现出更优性能。