"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation

Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. Given the recent popularity and adoption of language generation technologies, the potential to further marginalize this population only grows. Although a multitude of NLP fairness literature focuses on illuminating and addressing gender biases, assessing gender harms for TGNB identities requires understanding how such identities uniquely interact with societal gender norms and how they differ from gender binary-centric perspectives. Such measurement frameworks inherently require centering TGNB voices to help guide the alignment between gender-inclusive NLP and whom they are intended to serve. Towards this goal, we ground our work in the TGNB community and existing interdisciplinary literature to assess how the social reality surrounding experienced marginalization of TGNB persons contributes to and persists within Open Language Generation (OLG). This social knowledge serves as a guide for evaluating popular large language models (LLMs) on two key aspects: (1) misgendering and (2) harmful responses to gender disclosure. To do this, we introduce TANGO, a dataset of template-based real-world text curated from a TGNB-oriented community. We discover a dominance of binary gender norms reflected by the models; LLMs least misgendered subjects in generated text when triggered by prompts whose subjects used binary pronouns. Meanwhile, misgendering was most prevalent when triggering generation with singular they and neopronouns. When prompted with gender disclosures, TGNB disclosure generated the most stigmatizing language and scored most toxic, on average. Our findings warrant further research on how TGNB harms manifest in LLMs and serve as a broader case study toward concretely grounding the design of gender-inclusive AI in community voices and interdisciplinary literature.

翻译：跨性别与非二元性别（TGNB）群体在日常生活中不成比例地遭受歧视和排斥。随着语言生成技术的普及与应用，进一步边缘化这一群体的可能性与日俱增。尽管大量自然语言处理公平性文献致力于揭示和解决性别偏见，但评估跨性别与非二元性别身份所受的性别伤害，需要理解此类身份如何与社会性别规范独特地互动，以及如何区别于以性别二元为中心的观点。此类衡量框架本质上要求以跨性别与非二元性别群体的声音为中心，以指导性别包容性自然语言处理与其服务对象之间的契合。为此，我们立足于跨性别与非二元性别社群及现有跨学科文献，评估围绕跨性别与非二元性别者所经历边缘化的社会现实如何促成并延续于开放语言生成中。这一社会知识作为指导，用于评估流行的大语言模型在两大关键方面：（1）性别误称；（2）对性别披露的有害回应。为此，我们引入TANGO，一个基于模板的真实世界文本数据集，该数据集源自面向跨性别与非二元性别的社群。我们发现模型反映出二元性别规范的主导地位；当触发提示中主语使用二元代词时，大语言模型在生成文本中对主语的性别误称最少。同时，当使用单数“他们”和新创代词触发生成时，性别误称最为普遍。在提示性别披露时，跨性别与非二元性别披露平均产生的污名化语言最多且毒性得分最高。我们的发现需要进一步研究跨性别与非二元性别伤害如何在大语言模型中显现，并作为更广泛的案例研究，致力于具体地以社群声音和跨学科文献为基础设计性别包容性人工智能。