FAIRER: Fairness as Decision Rationale Alignment

Deep neural networks (DNNs) have made significant progress, but often suffer from fairness issues, as deep models typically show distinct accuracy differences among certain subgroups (e.g., males and females). Existing research addresses this critical issue by employing fairness-aware loss functions to constrain the last-layer outputs and directly regularize DNNs. Although the fairness of DNNs is improved, it is unclear how the trained network makes a fair prediction, which limits future fairness improvements. In this paper, we investigate fairness from the perspective of decision rationale and define the parameter parity score to characterize the fair decision process of networks by analyzing neuron influence in various subgroups. Extensive empirical studies show that the unfair issue could arise from the unaligned decision rationales of subgroups. Existing fairness regularization terms fail to achieve decision rationale alignment because they only constrain last-layer outputs while ignoring intermediate neuron alignment. To address the issue, we formulate the fairness as a new task, i.e., decision rationale alignment that requires DNNs' neurons to have consistent responses on subgroups at both intermediate processes and the final prediction. To make this idea practical during optimization, we relax the naive objective function and propose gradient-guided parity alignment, which encourages gradient-weighted consistency of neurons across subgroups. Extensive experiments on a variety of datasets show that our method can significantly enhance fairness while sustaining a high level of accuracy and outperforming other approaches by a wide margin.

翻译：[translated abstract in Chinese] 深度神经网络（DNNs）虽已取得显著进展，但常面临公平性问题——深度模型在特定子群（如男性和女性）间通常表现出明显的精度差异。现有研究通过采用公平性感知损失函数约束最后一层输出并直接正则化DNNs来应对这一关键问题。尽管DNNs的公平性得到改善，但训练后网络如何做出公平预测仍不明确，这限制了公平性的进一步提升。本文从决策理由视角研究公平性，通过分析不同子群中神经元的影响力，定义参数对等分数来刻画网络的公平决策过程。大量实证研究表明，子群决策理由未对齐可能引发不公平问题。现有公平性正则化项无法实现决策理由对齐，原因在于其仅约束最后一层输出而忽略中间神经元的对齐。为解决该问题，我们将公平性形式化为一项新任务——决策理由对齐，要求DNNs的神经元在中间过程和最终预测中对不同子群产生一致响应。为使该思想在优化中切实可行，我们放宽朴素目标函数并提出梯度引导的对等对齐方法，该方法鼓励跨子群神经元间梯度加权的一致性。在多种数据集上的广泛实验表明，我们的方法能在保持高水平精度的同时显著增强公平性，并以大幅优势超越其他方法。