"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation

Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. Given the recent popularity and adoption of language generation technologies, the potential to further marginalize this population only grows. Although a multitude of NLP fairness literature focuses on illuminating and addressing gender biases, assessing gender harms for TGNB identities requires understanding how such identities uniquely interact with societal gender norms and how they differ from gender binary-centric perspectives. Such measurement frameworks inherently require centering TGNB voices to help guide the alignment between gender-inclusive NLP and whom they are intended to serve. Towards this goal, we ground our work in the TGNB community and existing interdisciplinary literature to assess how the social reality surrounding experienced marginalization by TGNB persons contributes to and persists within Open Language Generation (OLG). By first understanding their marginalization stressors, we evaluate (1) misgendering and (2) harmful responses to gender disclosure. To do this, we introduce the TANGO dataset, comprising of template-based text curated from real-world text within a TGNB-oriented community. We discover a dominance of binary gender norms within the models; LLMs least misgendered subjects in generated text when triggered by prompts whose subjects used binary pronouns. Meanwhile, misgendering was most prevalent when triggering generation with singular they and neopronouns. When prompted with gender disclosures, LLM text contained stigmatizing language and scored most toxic when triggered by TGNB gender disclosure. Our findings warrant further research on how TGNB harms manifest in LLMs and serve as a broader case study toward concretely grounding the design of gender-inclusive AI in community voices and interdisciplinary literature.

翻译：跨性别与非二元性别（TGNB）群体在日常生活中不成比例地遭受歧视和排斥。随着近年来语言生成技术的普及与应用，进一步边缘化该群体的风险与日俱增。尽管大量自然语言处理公平性文献聚焦于揭示和应对性别偏见，但评估TGNB身份下的性别伤害，需要理解此类身份如何与社会性别规范独特互动，以及它们如何与二元性别中心视角相区别。此类测量框架本质上要求以TGNB声音为核心，以指导性别包容性自然语言处理与其服务对象之间的对齐。为此，本研究立足于TGNB社群及现有跨学科文献，评估TGNB个体所经历的社会边缘化现实如何促成并持续存在于开放语言生成（OLG）中。通过首先理解其边缘化压力源，我们评估了（1）错误性别指称和（2）对性别表露的有害回应。为此，我们引入了TANGO数据集，该数据集包含基于模板的文本，这些文本源自面向TGNB社群的真实语境。我们发现模型内部存在二元性别规范的主导性：当触发生成提示中的主语使用二元代词时，大型语言模型在生成的文本中错误性别指称最少；而以单数“they”和新兴代词为触发时，错误性别指称最为普遍。当使用性别表露作为提示时，大型语言模型生成的文本含有污名化语言，并在由TGNB性别表露触发时毒性得分最高。我们的研究结果呼吁进一步研究TGNB伤害在大型语言模型中的表现形式，并作为一项广泛的案例研究，为基于社群声音和跨学科文献的性别包容性人工智能设计提供具体依据。