Popular centroid-based clustering methods are typically optimized for global objectives and may fail to adequately represent large groups of datapoints. To address this concern, recent work puts forward clustering analogs of social choice proportionality concepts, such as Proportionally Representative Fairness (also known as mPJR). For proportionality guarantees to be useful in practice, they must be (a) achievable and (b) efficiently auditable, so that one can check whether standard approaches, such as $k$-means, which are not guaranteed to provide proportional representation in general, nevertheless output proportional solutions on specific inputs. In this work, we study the computational complexity of verifying proportional representation in clustering. We first show that verifying mPJR is coNP-hard. Inspired by PJR+ -- a strengthening of PJR that is polynomial-time verifiable in the committee voting setting -- we introduce mPJR+ as its metric analog. However, verifying mPJR+ relies on repeated submodular minimization, rendering it impractical at scale. Hence, we introduce Default Coalitions mPJR+ (DC-mPJR+): a new proportionality concept that offers representation guarantees to a restricted set of coalitions around unselected centers, and as a result, admits an $O(mn \log n + mnk)$ verification algorithm. DC-mPJR+ is satisfied by SEAR and remains a meaningful proxy for global fairness: any solution satisfying $γ$-DC-mPJR+ also satisfies $(γ+ 2)$-mPJR+. Together, our results identify a practical and theoretically grounded path for auditing proportional representation in clustering.
翻译:流行的基于质心的聚类方法通常针对全局目标进行优化,可能无法充分代表大型数据点群。为解决这一问题,近期研究提出了社会选择比例性概念的聚类类似物,例如比例代表性公平性(也称为 mPJR)。为使比例性保障在实践中有用,它们必须(a)可实现且(b)可高效审计,以便能够检查标准方法(如 $k$-means,通常无法保证提供比例代表性)是否在特定输入上输出比例性解。本文研究了聚类中比例代表性验证的计算复杂性。我们首先证明验证 mPJR 是 coNP-难的。受 PJR+(一种在委员会投票设置中可多项式时间验证的 PJR 强化形式)启发,我们引入 mPJR+ 作为其度量类比。然而,验证 mPJR+ 依赖于重复的子模最小化,使其在大规模场景下不实用。因此,我们提出默认联盟 mPJR+(DC-mPJR+):一种新的比例性概念,为未选中心附近的受限联盟集提供代表性保障,并由此得到一个 $O(mn \log n + mnk)$ 的验证算法。DC-mPJR+ 可由 SEAR 满足,且仍是全局公平性的有意义代理:任何满足 $γ$-DC-mPJR+ 的解也满足 $(γ+ 2)$-mPJR+。综上,我们的结果识别出一条实用且有理论基础的路径,用于审计聚类中的比例代表性。