Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.
翻译:流式子模最大化是从大规模数据集中选取代表性子集的自然模型。若数据点含有性别或种族等敏感属性,则需强制公平性以避免偏见与歧视,这一需求催生了公平机器学习算法的重要发展。近期,此类算法已被提出用于基数约束下的单调子模最大化问题。本文研究该问题在拟阵约束下的自然推广形式:我们提出了流式算法,并给出了效率、质量与公平性之间权衡的不可能结果。通过在多个经典真实应用场景(基于样本的聚类、电影推荐及社交网络最大覆盖问题)中的实验验证,我们的发现得到了实证支持。