Subset selection tasks, arise in recommendation systems and search engines and ask to select a subset of items that maximize the value for the user. The values of subsets often display diminishing returns, and hence, submodular functions have been used to model them. If the inputs defining the submodular function are known, then existing algorithms can be used. In many applications, however, inputs have been observed to have social biases that reduce the utility of the output subset. Hence, interventions to improve the utility are desired. Prior works focus on maximizing linear functions -- a special case of submodular functions -- and show that fairness constraint-based interventions can not only ensure proportional representation but also achieve near-optimal utility in the presence of biases. We study the maximization of a family of submodular functions that capture functions arising in the aforementioned applications. Our first result is that, unlike linear functions, constraint-based interventions cannot guarantee any constant fraction of the optimal utility for this family of submodular functions. Our second result is an algorithm for submodular maximization. The algorithm provably outputs subsets that have near-optimal utility for this family under mild assumptions and that proportionally represent items from each group. In empirical evaluation, with both synthetic and real-world data, we observe that this algorithm improves the utility of the output subset for this family of submodular functions over baselines.
翻译:子集选择任务出现在推荐系统和搜索引擎中,要求选择能够最大化用户价值的子集。子集的价值通常呈现边际递减效应,因此子模函数被用于建模此类问题。若定义子模函数的输入已知,则可采用现有算法。然而在许多应用中,观测到的输入存在社会性偏差,这会降低输出子集的效用。因此需要采取干预措施来提升效用。先前工作聚焦于线性函数(子模函数的特例)的最大化,表明基于公平性约束的干预不仅能保证比例代表性,还能在存在偏差时实现近最优效用。本研究探讨了一类子模函数的最大化问题——这类函数恰好能刻画上述应用场景中的目标函数。我们的第一个发现是:与线性函数不同,基于约束的干预无法保证此类子模函数的任何常数比例的最优效用。第二个成果是提出一种子模最大化算法。该算法在温和假设下可证明输出子集对此类函数具有近最优效用,且能实现各群体项目的比例代表性。在合成数据与真实数据的实证评估中,我们观察到该算法相比基线方法能显著提升此类子模函数输出子集的效用。