TreeGrad-Ranker: Feature Ranking via $O(L)$-Time Gradients for Decision Trees

We revisit the use of probabilistic values, which include the well-known Shapley and Banzhaf values, to rank features for explaining the local predicted values of decision trees. The quality of feature rankings is typically assessed with the insertion and deletion metrics. Empirically, we observe that co-optimizing these two metrics is closely related to a joint optimization that selects a subset of features to maximize the local predicted value while minimizing it for the complement. However, we theoretically show that probabilistic values are generally unreliable for solving this joint optimization. Therefore, we explore deriving feature rankings by directly optimizing the joint objective. As the backbone, we propose TreeGrad, which computes the gradients of the multilinear extension of the joint objective in $O(L)$ time for decision trees with $L$ leaves; these gradients include weighted Banzhaf values. Building upon TreeGrad, we introduce TreeGrad-Ranker, which aggregates the gradients while optimizing the joint objective to produce feature rankings, and TreeGrad-Shap, a numerically stable algorithm for computing Beta Shapley values with integral parameters. In particular, the feature scores computed by TreeGrad-Ranker satisfy all the axioms uniquely characterizing probabilistic values, except for linearity, which itself leads to the established unreliability. Empirically, we demonstrate that the numerical error of Linear TreeShap can be up to $10^{15}$ times larger than that of TreeGrad-Shap when computing the Shapley value. As a by-product, we also develop TreeProb, which generalizes Linear TreeShap to support all probabilistic values. In our experiments, TreeGrad-Ranker performs significantly better on both insertion and deletion metrics. Our code is available at https://github.com/watml/TreeGrad.

翻译：我们重新评估了使用概率值（包括著名的Shapley值和Banzhaf值）对决策树局部预测值进行特征排序的可解释性方法。特征排序质量通常通过插入与删除指标进行评估。实验发现，这两个指标的协同优化与联合优化问题密切相关——即选择特征子集最大化局部预测值的同时最小化其补集预测值。然而，我们从理论上证明概率值通常无法可靠地解决此联合优化问题。因此，我们探索通过直接优化联合目标来推导特征排序。作为核心方法，我们提出TreeGrad，该算法能在$O(L)$时间内计算决策树（含$L$个叶节点）联合目标的多线性扩展梯度，这些梯度包含加权Banzhaf值。在此基础上，我们提出TreeGrad-Ranker——通过优化联合目标时聚合梯度生成特征排序，以及TreeGrad-Shap——一种数值稳定的带积分参数Beta Shapley值计算算法。特别地，TreeGrad-Ranker计算的特征分数满足唯一刻画概率值的所有公理，除线性性外——而正是线性性导致了已知的不可靠性。实验表明，在计算Shapley值时，Linear TreeShap的数值误差可达TreeGrad-Shap的$10^{15}$倍。作为副产品，我们还开发了TreeProb——将Linear TreeShap泛化为支持所有概率值的算法。在实验中，TreeGrad-Ranker在插入与删除指标上均表现出显著优势。我们的代码已开源在https://github.com/watml/TreeGrad。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

《概率结果下全局最优决策的高效树生成方法》最新30页报告

专知会员服务

17+阅读 · 2025年5月6日

【KDD2023】考虑约束的排序蒸馏令牌修剪，用于高效的Transformer推断

专知会员服务

23+阅读 · 2023年7月20日

推荐算法中的特征工程

专知会员服务

40+阅读 · 2022年9月9日

12篇顶会论文，深度学习时间序列预测经典方案汇总！

专知会员服务

55+阅读 · 2022年4月11日