Machine learning models often make predictions based on biased features such as gender, race, and other social attributes, posing significant fairness risks, especially in societal applications, such as hiring, banking, and criminal justice. Traditional approaches to addressing this issue involve retraining or fine-tuning neural networks with fairness-aware optimization objectives. However, these methods can be impractical due to significant computational resources, complex industrial tests, and the associated CO2 footprint. Additionally, regular users often fail to fine-tune models because they lack access to model parameters In this paper, we introduce the Inference-Time Rule Eraser (Eraser), a novel method designed to address fairness concerns by removing biased decision-making rules from deployed models during inference without altering model weights. We begin by establishing a theoretical foundation for modifying model outputs to eliminate biased rules through Bayesian analysis. Next, we present a specific implementation of Eraser that involves two stages: (1) distilling the biased rules from the deployed model into an additional patch model, and (2) removing these biased rules from the output of the deployed model during inference. Extensive experiments validate the effectiveness of our approach, showcasing its superior performance in addressing fairness concerns in AI systems.
翻译:机器学习模型常基于性别、种族等社会属性这类偏置特征进行预测,这在招聘、银行信贷、刑事司法等社会应用场景中会引发严重的公平性风险。传统解决方法通常需要基于公平性优化目标对神经网络进行重新训练或微调。然而,这些方法往往因计算资源消耗巨大、工业测试流程复杂以及伴随的碳排放问题而难以实际应用。此外,普通用户通常因无法获取模型参数而难以实施微调操作。本文提出推理时规则擦除器(Eraser),这是一种创新方法,旨在不改变模型权重的前提下,在推理阶段从已部署模型中移除偏置决策规则以应对公平性问题。我们首先通过贝叶斯分析建立了修改模型输出以消除偏置规则的理论基础。随后提出Eraser的具体实现方案,该方案包含两个阶段:(1) 从已部署模型中蒸馏偏置规则至一个附加的补丁模型;(2) 在推理过程中从已部署模型输出中移除这些偏置规则。大量实验验证了本方法的有效性,展现了其在解决人工智能系统公平性问题方面的卓越性能。