Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform 'soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.
翻译:嵌入在大语言模型的效能中扮演着关键角色,它们是模型理解上下文关系、培养更细腻的语言理解能力,并因此在需要基础人类语言理解的各类复杂任务中表现出色的基石。鉴于这些嵌入本身常反映或表现出偏见,这些模型也可能无意中学习到这种偏见。在此研究中,我们基于先前开创性工作,提出DeepSoftDebias算法,该算法利用神经网络执行“软去偏见”。我们在多种最新数据集、准确度指标及具有挑战性的自然语言处理任务上对算法进行了详尽评估。研究发现,DeepSoftDebias在减少性别、种族和宗教偏见方面优于当前最先进的方法。