Research in fair machine learning, and particularly clustering, has been crucial in recent years given the many ethical controversies that modern intelligent systems have posed. Ahmadian et al. [2020] established the study of fairness in \textit{hierarchical} clustering, a stronger, more structured variant of its well-known flat counterpart, though their proposed algorithm that optimizes for Dasgupta's [2016] famous cost function was highly theoretical. Knittel et al. [2023] then proposed the first practical fair approximation for cost, however they were unable to break the polynomial-approximate barrier they posed as a hurdle of interest. We break this barrier, proposing the first truly polylogarithmic-approximate low-cost fair hierarchical clustering, thus greatly bridging the gap between the best fair and vanilla hierarchical clustering approximations.
翻译:摘要:近年来,公平机器学习(特别是聚类)研究至关重要,因为现代智能系统引发了诸多伦理争议。Ahmadian等人[2020]开创了层次聚类中的公平性研究——这种聚类形式比其广为人知的扁平聚类更具结构性优势,但他们针对Dasgupta[2016]著名代价函数提出的优化算法高度理论化。此后Knittel等人[2023]提出了首个实用的公平近似代价算法,但未能突破其视为关键挑战的多项式近似壁垒。我们突破了该壁垒,提出了首个真正意义上的对数多项式近似低代价公平层次聚类方法,从而显著缩小了最佳公平层次聚类与标准层次聚类之间的近似差距。