Fairness in Language Models (LMs) remains a longstanding challenge, given the inherent biases in training data that can be perpetuated by models and affect the downstream tasks. Recent methods employ expensive retraining or attempt debiasing during inference by constraining model outputs to contrast from a reference set of biased templates or exemplars. Regardless, they dont address the primary goal of fairness to maintain equitability across different demographic groups. In this work, we posit that inferencing LMs to generate unbiased output for one demographic under a context ensues from being aware of outputs for other demographics under the same context. To this end, we propose Counterfactually Aware Fair InferencE (CAFIE), a framework that dynamically compares the model understanding of diverse demographics to generate more equitable sentences. We conduct an extensive empirical evaluation using base LMs of varying sizes and across three diverse datasets and found that CAFIE outperforms strong baselines. CAFIE produces fairer text and strikes the best balance between fairness and language modeling capability
翻译:语言模型中的公平性仍是长期挑战,训练数据固有的偏见可能被模型延续并影响下游任务。现有方法采用昂贵的重新训练,或在推理阶段通过约束模型输出与一组有偏模板或样例的对比来实现去偏。然而,这些方法并未解决公平的核心目标——维持不同人口群体间的平等性。在本工作中,我们提出:在特定上下文中,语言模型为某一人口群体生成无偏输出,需以感知同一上下文中其他群体输出为前提。为此,我们提出反事实感知公平推理(CAFIE)框架,该框架动态比较模型对不同人口群体的理解,以生成更公平的语句。我们基于不同规模的基础语言模型,在三个不同数据集上进行了广泛的实证评估,发现CAFIE优于强基线方法。CAFIE能生成更公平的文本,并在公平性与语言建模能力间实现最佳平衡。