Fair Data Pre-Processing with Imperfect Attribute Space

Fair data pre-processing is a widely used strategy for mitigating bias in machine learning. A promising line of research focuses on calibrating datasets to satisfy a designed fairness policy so that sensitive attributes influence outcomes only through clearly specified legitimate causal pathways. While effective on clean and information-rich data, these methods often break down in real-world scenarios with imperfect attribute spaces, where decision-relevant factors may be deemed unusable or even missing. To address this gap, we propose LatentPre, a novel framework that enables principled and robust fair data processing in practical settings. Instead of relying solely on observed attributes, LatentPre augments the fairness policy with latent attributes that capture essential but subtle signals, enabling the framework to operate as if the attribute space were perfect. These latent attributes are strategically introduced to guarantee identifiability and are estimated using a tailored expectation-maximization paradigm. The raw data is then carefully refined to conform to this latent-augmented policy, effectively removing biased patterns while preserving justifiable ones. Extensive experiments demonstrate that LatentPre consistently achieves strong fairness-utility trade-offs across diverse scenarios, advancing practical fairness-aware data management.

翻译：公平数据预处理是机器学习中缓解偏见广泛采用的策略。一类有前景的研究聚焦于校准数据集以符合设计的公平性策略，使得敏感属性仅通过明确指定的合法因果路径影响结果。尽管在干净且信息丰富的数据上效果显著，这些方法在现实场景中往往因属性空间不完美而失效——决策相关因素可能被认为不可用，甚至缺失。为解决这一局限，我们提出LatentPre框架，该创新框架能在实际环境中实现原则性且稳健的公平数据处理。LatentPre并非仅依赖观测属性，而是通过引入捕捉本质细微信号的潜在属性来增强公平性策略，使得框架能像在完美属性空间下运行。这些潜在属性被策略性地引入以保证可辨识性，并通过定制的期望最大化范式进行估计。随后对原始数据进行精心精炼以符合这种潜增广策略，有效去除有偏模式同时保留合理模式。大量实验表明，LatentPre能在不同场景下持续实现公平性与效用性的稳健权衡，推动实用导向的公平感知数据管理。

相关内容

属性

关注 2

一个具体事物，总是有许许多多的性质与关系，我们把一个事物的性质与关系，都叫作事物的属性。事物与属性是不可分的，事物都是有属性的事物，属性也都是事物的属性。一个事物与另一个事物的相同或相异，也就是一个事物的属性与另一事物的属性的相同或相异。由于事物属性的相同或相异，客观世界中就形成了许多不同的事物类。具有相同属性的事物就形成一类，具有不同属性的事物就分别地形成不同的类。

人工智能解释公平性：统一框架、公理与负责任AI的未来方向

专知会员服务

13+阅读 · 5月12日

不平衡数据学习的全面综述

专知会员服务

44+阅读 · 2025年2月15日

【ETHZ博士论文】算法补救的进展：确保因果一致性、公平性和鲁棒性，233页pdf

专知会员服务

29+阅读 · 2023年11月26日

【MIT博士论文】序列决策中的算法公平性，134页pdf

专知会员服务

25+阅读 · 2023年5月20日