Individual fairness guarantees are often desirable properties to have, but they become hard to formalize when the dataset contains outliers. Here, we investigate the problem of developing an individually fair $k$-means clustering algorithm for datasets that contain outliers. That is, given $n$ points and $k$ centers, we want that for each point which is not an outlier, there must be a center within the $\frac{n}{k}$ nearest neighbours of the given point. While a few of the recent works have looked into individually fair clustering, this is the first work that explores this problem in the presence of outliers for $k$-means clustering. For this purpose, we define and solve a linear program (LP) that helps us identify the outliers. We exclude these outliers from the dataset and apply a rounding algorithm that computes the $k$ centers, such that the fairness constraint of the remaining points is satisfied. We also provide theoretical guarantees that our method leads to a guaranteed approximation of the fair radius as well as the clustering cost. We also demonstrate our techniques empirically on real-world datasets.
翻译:个体公平性保证通常是期望具备的特性,但当数据集中存在离群点时,其形式化定义变得困难。本文研究针对包含离群点的数据集开发个体公平的k-均值聚类算法的问题。具体而言,给定n个点和k个中心点,我们要求每个非离群点在其n/k个最近邻范围内必须存在一个中心点。尽管近期少数研究已关注个体公平聚类,但本文是首个在存在离群点情况下探索k-均值聚类中该问题的研究。为此,我们定义并求解了一个线性规划模型以识别离群点。将这些离群点从数据集中排除后,我们应用舍入算法计算k个中心点,使得剩余点的公平性约束得到满足。我们同时提供了理论保证,证明我们的方法能够实现对公平半径和聚类成本的可控近似。最后,我们在真实世界数据集上实证验证了所提技术。