Submodular function optimization has numerous applications in machine learning and data analysis, including data summarization which aims to identify a concise and diverse set of data points from a large dataset. It is important to implement fairness-aware algorithms when dealing with data items that may contain sensitive attributes like race or gender, to prevent biases that could lead to unequal representation of different groups. With this in mind, we investigate the problem of maximizing a monotone submodular function while meeting group fairness constraints. Unlike previous studies in this area, we allow for randomized solutions, with the objective being to calculate a distribution over feasible sets such that the expected number of items selected from each group is subject to constraints in the form of upper and lower thresholds, ensuring that the representation of each group remains balanced in the long term. Here a set is considered feasible if its size does not exceed a constant value of $b$. Our research includes the development of a series of approximation algorithms for this problem.
翻译:子模函数优化在机器学习和数据分析中具有广泛应用,包括旨在从大型数据集中识别简洁多样化数据点的数据摘要。当处理可能包含种族或性别等敏感属性的数据项时,实施公平性感知算法至关重要,以防止可能导致不同群体代表性失衡的偏差。基于此,我们研究了在满足群体公平性约束条件下最大化单调子模函数的问题。与以往该领域的研究不同,我们允许采用随机化解决方案,目标在于计算一个可行集合上的概率分布,使得从每个群体中选择的期望项目数量受到上下阈值形式的约束,从而确保各群体在长期内保持平衡的代表性。这里,当集合的大小不超过常数$b$时,该集合被视为可行。我们的研究包括针对该问题开发一系列近似算法。