With the increasing role of Natural Language Processing (NLP) in various applications, challenges concerning bias and stereotype perpetuation are accentuated, which often leads to hate speech and harm. Despite existing studies on sexism and misogyny, issues like homophobia and transphobia remain underexplored and often adopt binary perspectives, putting the safety of LGBTQIA+ individuals at high risk in online spaces. In this paper, we assess the potential harm caused by sentence completions generated by English large language models (LLMs) concerning LGBTQIA+ individuals. This is achieved using QueerBench, our new assessment framework, which employs a template-based approach and a Masked Language Modeling (MLM) task. The analysis indicates that large language models tend to exhibit discriminatory behaviour more frequently towards individuals within the LGBTQIA+ community, reaching a difference gap of 7.2% in the QueerBench score of harmfulness.
翻译:随着自然语言处理(NLP)在各种应用中的作用日益增强,有关偏见和刻板印象延续的挑战也愈发凸显,这常常导致仇恨言论和伤害。尽管已有关于性别歧视和厌女症的研究,但恐同和恐跨等问题仍未得到充分探讨,且往往采用二元视角,使LGBTQIA+个体在网络空间面临高风险。本文通过使用我们新的评估框架QueerBench,评估了英语大型语言模型(LLMs)生成的句子补全对LGBTQIA+个体可能造成的潜在伤害。该框架采用基于模板的方法和掩码语言建模(MLM)任务。分析表明,大型语言模型倾向于更频繁地对LGBTQIA+社群内的个体表现出歧视行为,在QueerBench有害性评分中差异差距达到7.2%。