We present WinoQueer: a benchmark specifically designed to measure whether large language models (LLMs) encode biases that are harmful to the LGBTQ+ community. The benchmark is community-sourced, via application of a novel method that generates a bias benchmark from a community survey. We apply our benchmark to several popular LLMs and find that off-the-shelf models generally do exhibit considerable anti-queer bias. Finally, we show that LLM bias against a marginalized community can be somewhat mitigated by finetuning on data written about or by members of that community, and that social media text written by community members is more effective than news text written about the community by non-members. Our method for community-in-the-loop benchmark development provides a blueprint for future researchers to develop community-driven, harms-grounded LLM benchmarks for other marginalized communities.
翻译:我们提出WinoQueer:一个专门设计用于衡量大型语言模型(LLMs)是否编码对LGBTQ+社群有害偏见的基准测试。该基准测试通过社群贡献构建,采用了一种创新方法——基于社群调查生成偏见基准。我们将这一基准应用于多个流行LLM,发现未经微调的模型普遍表现出显著的反同性恋偏见。最后,我们表明可以通过基于该社群成员撰写或涉及该社群的数据进行微调,一定程度上缓解LLM对边缘化社群的偏见,且社群成员撰写的社交媒体文本比非成员撰写的新闻文本效果更佳。这种社群参与式基准开发方法为未来研究人员构建面向其他边缘化社群的、以危害为基础的LLM基准提供了蓝图。