Existing debiasing methods inevitably make unreasonable or undesired predictions as they are designated and evaluated to achieve parity across different social groups but leave aside individual facts, resulting in modified existing knowledge. In this paper, we first establish a new bias mitigation benchmark BiasKE leveraging existing and additional constructed datasets, which systematically assesses debiasing performance by complementary metrics on fairness, specificity, and generalization. Meanwhile, we propose a novel debiasing method, Fairness Stamp (FAST), which enables editable fairness through fine-grained calibration on individual biased knowledge. Comprehensive experiments demonstrate that FAST surpasses state-of-the-art baselines with remarkable debiasing performance while not hampering overall model capability for knowledge preservation, highlighting the prospect of fine-grained debiasing strategies for editable fairness in LLMs.
翻译:现有去偏方法不可避免地做出不合理或非期望的预测,因为这些方法的设计与评估旨在实现不同社会群体间的统计均衡,却忽略了具体事实,导致对已有知识的篡改。本文首先构建了新的偏见缓解基准BiasKE,该基准整合现有数据集与额外构建的数据集,通过公平性、特异性与泛化性三个互补维度系统评估去偏性能。同时,我们提出一种新颖的去偏方法——公平性校准印记(FAST),该方法通过对个体偏见知识进行细粒度校准,实现可编辑的公平性。综合实验表明,FAST在保持知识完整性的同时,以显著优势超越现有最优基线方法,突显了细粒度去偏策略在实现LLMs可编辑公平性方面的广阔前景。