Maximizing a submodular function has a wide range of applications in machine learning and data mining. One such application is data summarization whose goal is to select a small set of representative and diverse data items from a large dataset. However, data items might have sensitive attributes such as race or gender, in this setting, it is important to design \emph{fairness-aware} algorithms to mitigate potential algorithmic bias that may cause over- or under- representation of particular groups. Motivated by that, we propose and study the classic non-monotone submodular maximization problem subject to novel group fairness constraints. Our goal is to select a set of items that maximizes a non-monotone submodular function, while ensuring that the number of selected items from each group is proportionate to its size, to the extent specified by the decision maker. We develop the first constant-factor approximation algorithms for this problem. We also extend the basic model to incorporate an additional global size constraint on the total number of selected items.
翻译:最大化子模函数在机器学习和数据挖掘中具有广泛的应用。其中一个应用是数据摘要,其目标是从大型数据集中选择一组具有代表性和多样性的小规模数据项。然而,数据项可能具有敏感属性(如种族或性别),在此情况下,设计*公平感知*算法以缓解可能导致特定群体过度或不足代表的算法偏差至关重要。受此启发,我们提出并研究了在新型群体公平约束下的经典非单调子模最大化问题。我们的目标是选择一组物品,最大化非单调子模函数,同时确保从每个群体中选择的物品数量与其规模成比例,并满足决策者指定的程度。我们为此问题开发了首个常数因子近似算法。我们还将基本模型扩展为包含对所选物品总数的额外全局大小约束。