The advent of ML-driven decision-making and policy formation has led to an increasing focus on algorithmic fairness. As clustering is one of the most commonly used unsupervised machine learning approaches, there has naturally been a proliferation of literature on {\em fair clustering}. A popular notion of fairness in clustering mandates the clusters to be {\em balanced}, i.e., each level of a protected attribute must be approximately equally represented in each cluster. Building upon the original framework, this literature has rapidly expanded in various aspects. In this article, we offer a novel model-based formulation of fair clustering, complementing the existing literature which is almost exclusively based on optimizing appropriate objective functions.
翻译:机器学习驱动的决策制定与政策形成的兴起,使得算法公平性日益受到关注。作为最常用的无监督学习方法之一,聚类自然催生了大量关于"公平聚类"的研究文献。聚类公平性的一种主流定义要求聚类结果具有"平衡性",即受保护属性的每个层级在各类簇中必须保持近似相等的占比。基于此原始框架,相关文献在多个维度上迅速拓展。本文提出了一种新颖的基于模型的公平聚类方法,对现有几乎完全基于优化适当目标函数的研究体系形成重要补充。