Maximizing a submodular function has a wide range of applications in machine learning and data mining. One such application is data summarization whose goal is to select a small set of representative and diverse data items from a large dataset. However, data items might have sensitive attributes such as race or gender, in this setting, it is important to design \emph{fairness-aware} algorithms to mitigate potential algorithmic bias that may cause over- or under- representation of particular groups. Motivated by that, we propose and study the classic non-monotone submodular maximization problem subject to novel group fairness constraints. Our goal is to select a set of items that maximizes a non-monotone submodular function, while ensuring that the number of selected items from each group is proportionate to its size, to the extent specified by the decision maker. We develop the first constant-factor approximation algorithms for this problem. We also extend the basic model to incorporate an additional global size constraint on the total number of selected items.
翻译:最大化子模函数在机器学习和数据挖掘中具有广泛应用。其中一个应用是数据摘要,其目标是从大型数据集中选择一组具有代表性且多样性的数据项。然而,数据项可能具有敏感属性(如种族或性别),在这种情况下,设计具有公平性意识的算法至关重要,以减轻可能导致特定群体过度或不足代表性的算法偏见。受此启发,我们提出并研究了在新型群体公平性约束下的经典非单调子模最大化问题。我们的目标是选择一组数据项,以最大化非单调子模函数,同时确保从每个群体中选择的数据项数量与其规模成比例,具体比例由决策者指定。我们针对该问题开发了首个常数因子近似算法。此外,我们将基本模型扩展,加入了关于总选择项数量的全局大小约束。