This research delves into the current literature on bias in Natural Language Processing Models and the techniques proposed to mitigate the problem of bias, including why it is important to tackle bias in the first place. Additionally, these techniques are further analysed in the light of newly developed models that tower in size over past editions. To achieve those aims, the authors of this paper conducted their research on GPT3 by OpenAI, the largest NLP model available to consumers today. With 175 billion parameters in contrast to BERTs 340 million, GPT3 is the perfect model to test the common pitfalls of NLP models. Tests were conducted through the development of an Applicant Tracking System using GPT3. For the sake of feasibility and time constraints, the tests primarily focused on gender bias, rather than all or multiple types of bias. Finally, current mitigation techniques are considered and tested to measure their degree of functionality.
翻译:本研究深入探讨了自然语言处理模型中偏见的现有文献,以及为解决偏见问题而提出的技术(包括为何首先解决偏见至关重要)。此外,这些技术还针对规模远超以往版本的新兴模型进行了进一步分析。为实现这些目标,本文作者对OpenAI开发的GPT3(当前面向消费者的最大规模自然语言处理模型)开展了研究。凭借1750亿个参数(对比BERT的3.4亿个参数),GPT3是测试自然语言处理模型常见问题的理想模型。测试通过构建基于GPT3的申请人跟踪系统进行。鉴于可行性与时间限制,测试主要聚焦于性别偏见,而非所有或多种类型的偏见。最后,对当前缓解技术进行了考量与测试,以衡量其功能有效性。