We propose selective debiasing -- an inference-time safety mechanism designed to enhance the overall model quality in terms of prediction performance and fairness, especially in scenarios where retraining the model is impractical. The method draws inspiration from selective classification, where at inference time, predictions with low quality, as indicated by their uncertainty scores, are discarded. In our approach, we identify the potentially biased model predictions and, instead of discarding them, we remove bias from these predictions using LEACE -- a post-processing debiasing method. To select problematic predictions, we propose a bias quantification approach based on KL divergence, which achieves better results than standard uncertainty quantification methods. Experiments on text classification datasets with encoder-based classification models demonstrate that selective debiasing helps to reduce the performance gap between post-processing methods and debiasing techniques from the at-training and pre-processing categories.
翻译:我们提出选择性去偏——一种推理时安全机制,旨在提升模型在预测性能和公平性方面的整体质量,特别是在模型重训练不可行的场景中。该方法借鉴了选择性分类的思想,即在推理时根据不确定性分数剔除低质量预测。在我们的方法中,我们识别可能存在偏差的模型预测,并采用LEACE(一种后处理去偏方法)对这些预测进行去偏处理,而非直接丢弃。为筛选问题预测,我们提出基于KL散度的偏差量化方法,该方法相较于标准的不确定性量化方法取得了更优效果。基于编码器的文本分类模型在多个数据集上的实验表明,选择性去偏有助于缩小后处理方法与训练时、预处理类去偏技术之间的性能差距。