Despite the impressive prediction ability, machine learning models show discrimination towards certain demographics and suffer from unfair prediction behaviors. To alleviate the discrimination, extensive studies focus on eliminating the unequal distribution of sensitive attributes via multiple approaches. However, due to privacy concerns, sensitive attributes are often either unavailable or missing in real-world scenarios. Therefore, several existing works alleviate the bias without sensitive attributes. Those studies face challenges, either in inaccurate predictions of sensitive attributes or the need to mitigate unequal distribution of manually defined non-sensitive attributes related to bias. The latter requires strong assumptions about the correlation between sensitive and non-sensitive attributes. As data distribution and task goals vary, the strong assumption on non-sensitive attributes may not be valid and require domain expertise. In this work, we propose an assumption-free framework to detect the related attributes automatically by modeling feature interaction for bias mitigation. The proposed framework aims to mitigate the unfair impact of identified biased feature interactions. Experimental results on four real-world datasets demonstrate that our proposed framework can significantly alleviate unfair prediction behaviors by considering biased feature interactions.
翻译:尽管机器学习模型具有令人印象深刻的预测能力,但其对某些人口群体表现出歧视,并遭受不公平的预测行为。为缓解歧视,大量研究侧重于通过多种方法消除敏感属性的不平等分布。然而,由于隐私问题,敏感属性在现实场景中往往不可用或缺失。因此,现有一些工作在不依赖敏感属性的情况下缓解偏差。这些研究面临挑战,要么是对敏感属性的预测不准确,要么需要缓解与偏差相关且人为定义的非敏感属性的不平等分布。后者需要对敏感属性与非敏感属性之间的相关性做出强假设。随着数据分布和任务目标的变化,关于非敏感属性的强假设可能不成立,且需要领域专业知识。在这项工作中,我们提出了一种无假设框架,通过建模特征交互来自动检测相关属性,以缓解偏差。该框架旨在减轻已识别的有偏特征交互所带来的不公平影响。在四个真实世界数据集上的实验结果表明,我们的框架通过考虑有偏特征交互,能够显著缓解不公平的预测行为。