Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users aiming to use fair models often lack access to model parameters. In this paper, we introduce Inference-Time Rule Eraser (Eraser), a novel method focused on removing biased decision-making rules during inference to address fairness concerns without modifying model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) querying the model to distill biased rules into a patched model, and (2) excluding these biased rules during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.
翻译:机器学习模型常基于性别、种族及其他社会属性等有偏特征进行预测,这带来了显著的公平风险,尤其在招聘、金融和刑事司法等社会应用中。传统解决此问题的方法涉及使用公平性感知优化目标对神经网络进行再训练或微调。然而,这些方法可能因大量计算资源、复杂的工业测试及相关碳排放足迹而不切实际。此外,旨在使用公平模型的普通用户往往无法访问模型参数。本文提出推理时规则擦除(Eraser)这一新方法,专注于在推理过程中移除有偏决策规则,以在不修改模型权重的情况下解决公平问题。我们首先通过贝叶斯分析为修改模型输出以消除有偏规则奠定理论基础。接着,提出Eraser的具体实现,包括两个阶段:(1)查询模型以将偏差规则蒸馏到补丁模型中,(2)在推理阶段排除这些有偏规则。广泛实验验证了我们方法的有效性,展示了其在解决AI系统公平问题中的优越性能。