Streaming submodular maximization is a natural model for the task of selecting a representative subset from a large-scale dataset. If datapoints have sensitive attributes such as gender or race, it becomes important to enforce fairness to avoid bias and discrimination. This has spurred significant interest in developing fair machine learning algorithms. Recently, such algorithms have been developed for monotone submodular maximization under a cardinality constraint. In this paper, we study the natural generalization of this problem to a matroid constraint. We give streaming algorithms as well as impossibility results that provide trade-offs between efficiency, quality and fairness. We validate our findings empirically on a range of well-known real-world applications: exemplar-based clustering, movie recommendation, and maximum coverage in social networks.
翻译:流式次模最大化是从大规模数据集中选取代表性子集的自然模型。若数据点具有性别或种族等敏感属性,则需强制执行公平性以避免偏见与歧视。这激发了人们对开发公平机器学习算法的浓厚兴趣。近期,此类算法已在基数约束下的单调次模最大化问题中得到发展。本文研究该问题在拟阵约束下的自然推广形式。我们给出了流式算法及不可行性结果,揭示了效率、质量与公平性之间的权衡关系。通过一系列经典实际应用——基于样例的聚类、电影推荐及社交网络最大覆盖问题——我们实证验证了研究发现。