Model fairness is an essential element for Trustworthy AI. While many techniques for model fairness have been proposed, most of them assume that the training and deployment data distributions are identical, which is often not true in practice. In particular, when the bias between labels and sensitive groups changes, the fairness of the trained model is directly influenced and can worsen. We make two contributions for solving this problem. First, we analytically show that existing in-processing fair algorithms have fundamental limits in accuracy and group fairness. We introduce the notion of correlation shifts, which can explicitly capture the change of the above bias. Second, we propose a novel pre-processing step that samples the input data to reduce correlation shifts and thus enables the in-processing approaches to overcome their limitations. We formulate an optimization problem for adjusting the data ratio among labels and sensitive groups to reflect the shifted correlation. A key benefit of our approach lies in decoupling the roles of pre- and in-processing approaches: correlation adjustment via pre-processing and unfairness mitigation on the processed data via in-processing. Experiments show that our framework effectively improves existing in-processing fair algorithms w.r.t. accuracy and fairness, both on synthetic and real datasets.
翻译:模型公平性是可信赖人工智能的基本要素。尽管已有大量针对模型公平性的技术被提出,但大多数方法假设训练数据与部署数据分布相同,而这一假设在实践中往往不成立。特别当标签与敏感群体之间的偏差发生变化时,训练模型的公平性会受到直接影响甚至恶化。针对这一问题,我们做出两项贡献:首先,通过理论分析证明现有处理内公平算法在准确性和群体公平性方面存在根本性局限。我们引入"相关性偏移"概念,该概念能显式捕捉上述偏差的变化;其次,提出一种新型预处理步骤,通过对输入数据进行采样降低相关性偏移,从而使处理内方法能够突破其局限性。我们构建了一个优化问题,通过调整标签与敏感群体间的数据比率来反映偏移后的相关性。本方法的关键优势在于分离了预处理与处理内方法的作用:通过预处理完成相关性调整,再通过处理内方法对处理后的数据进行不公平性缓解。实验证明,在合成数据集和真实数据集上,我们的框架能有效提升现有处理内公平算法的准确性与公平性。