Embeddings play a pivotal role in the efficacy of Large Language Models. They are the bedrock on which these models grasp contextual relationships and foster a more nuanced understanding of language and consequently perform remarkably on a plethora of complex tasks that require a fundamental understanding of human language. Given that these embeddings themselves often reflect or exhibit bias, it stands to reason that these models may also inadvertently learn this bias. In this work, we build on the seminal previous work and propose DeepSoftDebias, an algorithm that uses a neural network to perform 'soft debiasing'. We exhaustively evaluate this algorithm across a variety of SOTA datasets, accuracy metrics, and challenging NLP tasks. We find that DeepSoftDebias outperforms the current state-of-the-art methods at reducing bias across gender, race, and religion.
翻译:嵌入在大语言模型效能中起着关键作用。它们是这些模型理解上下文关系、促进对语言更细致理解的基础,进而在众多需要基础人类语言理解的复杂任务中表现卓越。鉴于这些嵌入本身常常反映或表现出偏见,这些模型也可能无意中习得此类偏见。在本研究中,我们基于先前开创性工作,提出了DeepSoftDebias算法,该算法利用神经网络执行"软去偏"。我们在多种最先进的数据集、准确性指标及具有挑战性的自然语言处理任务上对该算法进行了全面评估。研究发现,DeepSoftDebias在降低性别、种族和宗教偏见方面优于当前最先进的方法。