Submodular optimization is a fundamental problem with many applications in machine learning, often involving decision-making over datasets with sensitive attributes such as gender or age. In such settings, it is often desirable to produce a diverse solution set that is fairly distributed with respect to these attributes. Motivated by this, we initiate the study of Fair Submodular Cover (FSC), where given a ground set $U$, a monotone submodular function $f:2^U\to\mathbb{R}_{\ge 0}$, a threshold $\tau$, the goal is to find a balanced subset of $S$ with minimum cardinality such that $f(S)\ge\tau$. We first introduce discrete algorithms for FSC that achieve a bicriteria approximation ratio of $(\frac{1}{\epsilon}, 1-O(\epsilon))$. We then present a continuous algorithm that achieves a $(\ln\frac{1}{\epsilon}, 1-O(\epsilon))$-bicriteria approximation ratio, which matches the best approximation guarantee of submodular cover without a fairness constraint. Finally, we complement our theoretical results with a number of empirical evaluations that demonstrate the effectiveness of our algorithms on instances of maximum coverage.
翻译:子模优化是机器学习中具有广泛应用的基础问题,通常涉及对包含性别或年龄等敏感属性的数据集进行决策。在此类场景中,通常期望生成一个与这些属性公平相关的多样化解集。受此启发,我们开创性地研究了公平子模覆盖问题:给定全集 $U$、单调子模函数 $f:2^U\to\mathbb{R}_{\ge 0}$ 及阈值 $\tau$,目标是找到满足 $f(S)\ge\tau$ 且具有最小基数的平衡子集 $S$。我们首先提出了针对FSC的离散算法,该算法实现了 $(\frac{1}{\epsilon}, 1-O(\epsilon))$ 的双准则近似比。随后提出了一种连续算法,其达到 $(\ln\frac{1}{\epsilon}, 1-O(\epsilon))$ 的双准则近似比,该结果与无公平约束的子模覆盖问题的最佳近似保证相匹配。最后,我们通过大量实证评估补充理论结果,验证了所提算法在最大覆盖实例上的有效性。