"I'm fully who I am": Towards Centering Transgender and Non-Binary Voices to Measure Biases in Open Language Generation

Transgender and non-binary (TGNB) individuals disproportionately experience discrimination and exclusion from daily life. Given the recent popularity and adoption of language generation technologies, the potential to further marginalize this population only grows. Although a multitude of NLP fairness literature focuses on illuminating and addressing gender biases, assessing gender harms for TGNB identities requires understanding how such identities uniquely interact with societal gender norms and how they differ from gender binary-centric perspectives. Such measurement frameworks inherently require centering TGNB voices to help guide the alignment between gender-inclusive NLP and whom they are intended to serve. Towards this goal, we ground our work in the TGNB community and existing interdisciplinary literature to assess how the social reality surrounding experienced marginalization by TGNB persons contributes to and persists within Open Language Generation (OLG). By first understanding their marginalization stressors, we evaluate (1) misgendering and (2) harmful responses to gender disclosure. To do this, we introduce the TANGO dataset, comprising of template-based text curated from real-world text within a TGNB-oriented community. We discover a dominance of binary gender norms within the models; LLMs least misgendered subjects in generated text when triggered by prompts whose subjects used binary pronouns. Meanwhile, misgendering was most prevalent when triggering generation with singular they and neopronouns. When prompted with gender disclosures, LLM text contained stigmatizing language and scored most toxic when triggered by TGNB gender disclosure. Our findings warrant further research on how TGNB harms manifest in LLMs and serve as a broader case study toward concretely grounding the design of gender-inclusive AI in community voices and interdisciplinary literature.

翻译：跨性别与非二元（TGNB）群体在日常生活中遭受着不成比例的歧视与排斥。随着语言生成技术的日益普及与采用，该群体进一步被边缘化的可能性也随之增长。尽管大量自然语言处理领域的公平性研究聚焦于揭示和解决性别偏见，但评估对TGNB身份造成的性别伤害，需要理解这些身份如何独特地与社会性别规范互动，以及它们如何区别于以二元性别为中心的观点。此类评估框架本质要求以TGNB的声音为中心，以帮助指导性别包容性自然语言处理与其服务对象之间的契合。为实现这一目标，我们将工作扎根于TGNB社区及现有的跨学科文献，以评估围绕TGNB个体所经历边缘化的社会现实如何促成并持续存在于开放语言生成中。通过首先理解其边缘化压力源，我们评估了（1）错误性别指代及（2）对性别披露的有害回应。为此，我们引入了TANGO数据集，该数据集包含从面向TGNB社区的真实文本中提取的基于模板的文本。我们发现模型中二元性别规范占主导地位：当触发提示词的主语使用二元代词时，大语言模型在生成文本中错误性别指代的对象最少。与此同时，当用单数“they”和新代名词触发生成时，错误性别指代最为普遍。当以性别披露为提示时，大语言模型生成的文本包含污名化语言，且在被TGNB性别披露触发时毒性得分最高。我们的研究结果呼吁进一步研究TGNB伤害在大语言模型中的表现形式，并作为一个更广泛的案例研究，为在社区声音与跨学科文献中具体地奠定性别包容性人工智能的设计基础。