We consider the problem of deep fair clustering, which partitions data into clusters via the representations extracted by deep neural networks while hiding sensitive data attributes. To achieve fairness, existing methods present a variety of fairness-related objective functions based on the group fairness criterion. However, these works typically assume that the sensitive attributes are discrete and do not work for continuous sensitive variables, such as the proportion of the female population in an area. Besides, the potential of the representations learned from clustering tasks to improve performance on other tasks is ignored by existing works. In light of these limitations, we propose a flexible deep fair clustering method that can handle discrete and continuous sensitive attributes simultaneously. Specifically, we design an information bottleneck style objective function to learn fair and clustering-friendly representations. Furthermore, we explore for the first time the transferability of the extracted representations to other downstream tasks. Unlike existing works, we impose fairness at the representation level, which could guarantee fairness for the transferred task regardless of clustering results. To verify the effectiveness of the proposed method, we perform extensive experiments on datasets with discrete and continuous sensitive attributes, demonstrating the advantage of our method in comparison with state-of-the-art methods.
翻译:我们研究深度公平聚类问题,即通过深度神经网络提取的表示对数据进行聚类,同时隐藏敏感数据属性。为实现公平性,现有方法基于群体公平准则提出了多种公平性相关目标函数。然而,这些工作通常假设敏感属性是离散的,无法处理连续敏感变量(例如某地区女性人口比例)。此外,现有研究忽略了从聚类任务中学到的表示在提升其他任务性能方面的潜力。针对这些局限,我们提出了一种灵活的深度公平聚类方法,能够同时处理离散和连续敏感属性。具体而言,我们设计了一种信息瓶颈风格的目标函数,以学习公平且利于聚类的表示。进一步,我们首次探索了所提取表示向其他下游任务的可迁移性。不同于现有工作,我们在表示层面施加公平性约束,从而能独立于聚类结果确保迁移任务的公平性。为验证所提方法的有效性,我们在包含离散和连续敏感属性的数据集上进行了大量实验,结果表明我们的方法相较于现有最优方法具有显著优势。