The Importance of Parameters in Ranking Functions

How important is the weight of a given column in determining the ranking of tuples in a table? To address such an explanation question about a ranking function, we investigate the computation of SHAP scores for column weights, adopting a recent framework by Grohe et al.[ICDT'24]. The exact definition of this score depends on three key components: (1) the ranking function in use, (2) an effect function that quantifies the impact of using alternative weights on the ranking, and (3) an underlying weight distribution. We analyze the computational complexity of different instantiations of this framework for a range of fundamental ranking and effect functions, focusing on probabilistically independent finite distributions for individual columns. For the ranking functions, we examine lexicographic orders and score-based orders defined by the summation, minimum, and maximum functions. For the effect functions, we consider global, top-k, and local perspectives: global measures quantify the divergence between the perturbed and original rankings, top-k measures inspect the change in the set of top-k answers, and local measures capture the impact on an individual tuple of interest. Although all cases admit an additive fully polynomial-time randomized approximation scheme (FPRAS), we establish the complexity of exact computation, identifying which cases are solvable in polynomial time and which are #P-hard. We further show that all complexity results, lower bounds and upper bounds, extend to a related task of computing the Shapley value of whole columns (regardless of their weight).

翻译：在确定表中元组排序时，给定列的权重有多重要？为解答此类关于排序函数的解释性问题，我们基于Grohe等人[ICDT'24]近期提出的框架，研究了列权重SHAP分数的计算问题。该分数的准确定义取决于三个关键组成部分：(1) 所使用的排序函数，(2) 用于量化替代权重对排序影响的效应函数，以及(3) 底层权重分布。我们针对一系列基础排序函数与效应函数的不同实例化方案，分析了该框架的计算复杂度，重点关注各列概率独立的有限分布。在排序函数方面，我们考察了字典序以及由求和函数、最小值函数和最大值函数定义的基于分数的排序。在效应函数方面，我们考虑了全局视角、top-k视角和局部视角：全局度量量化扰动排序与原始排序之间的差异，top-k度量检测top-k答案集合的变化，局部度量则捕捉对特定目标元组的影响。尽管所有情况都可采用加性完全多项式时间随机近似方案（FPRAS），我们仍确定了精确计算的复杂度，识别出哪些情况可在多项式时间内求解，哪些属于#P难问题。我们进一步证明，所有复杂度结果（下界与上界）均可推广至计算整列（不考虑其权重）Shapley值的相关任务。

相关内容

排序

关注 313

排序是计算机内经常进行的一种操作，其目的是将一组“无序”的记录序列调整为“有序”的记录序列。分内部排序和外部排序。若整个排序过程不需要访问外存便能完成，则称此类排序问题为内部排序。反之，若参加排序的记录数量很大，整个序列的排序过程不可能在内存中完成，则称此类排序问题为外部排序。内部排序的过程是一个逐步扩大记录的有序序列长度的过程。

【斯坦福博士论文】数据高效强化学习: 决定在复杂的环境中学习什么

专知会员服务

35+阅读 · 2024年7月16日