Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform 'soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.
翻译:嵌入在大型语言模型的效能中起着关键作用。它们是这些模型理解上下文关系、促进对语言更细致理解的基础,从而使模型在众多需要人类语言基础理解的复杂任务中表现卓越。鉴于这些嵌入本身常常反映或表现出偏见,这些模型也可能无意中习得这种偏见。在本研究中,我们基于先前开创性工作,提出了DeepSoftDebias算法,该算法利用神经网络执行"软去偏"。我们在多种最先进的数据集、准确性指标和具有挑战性的自然语言处理任务上对该算法进行了全面评估。我们发现,在减少性别、种族和宗教偏见方面,DeepSoftDebias的表现优于当前最先进的方法。