Large Language Models (LLMs) can generate biased and toxic responses. Yet most prior work on LLM gender bias evaluation requires predefined gender-related phrases or gender stereotypes, which are challenging to be comprehensively collected and are limited to explicit bias evaluation. In addition, we believe that instances devoid of gender-related language or explicit stereotypes in inputs can still induce gender bias in LLMs. Thus, in this work, we propose a conditional text generation mechanism without the need for predefined gender phrases and stereotypes. This approach employs three types of inputs generated through three distinct strategies to probe LLMs, aiming to show evidence of explicit and implicit gender biases in LLMs. We also utilize explicit and implicit evaluation metrics to evaluate gender bias in LLMs under different strategies. Our experiments demonstrate that an increased model size does not consistently lead to enhanced fairness and all tested LLMs exhibit explicit and/or implicit gender bias, even when explicit gender stereotypes are absent in the inputs.
翻译:大语言模型(LLMs)可能生成带有偏见和有害的回应。然而,现有关于LLM性别偏见评估的研究多需预定义的性别相关短语或性别刻板印象,这些内容难以全面收集且仅限于显性偏见评估。此外,我们认为,即使输入中不含性别相关语言或显性刻板印象,仍可能诱发LLM的性别偏见。因此,本研究提出一种无需预定义性别短语和刻板印象的条件文本生成机制。该方法通过三种不同策略生成的输入类型来探测LLM,旨在揭示LLM中显性与隐性性别偏见的证据。我们还采用显性与隐性评估指标,衡量不同策略下LLM的性别偏见程度。实验表明,模型规模的增大并不总能带来公平性的提升,且所有测试的LLM均表现出显性和/或隐性性别偏见,即便在输入中完全不存在显性性别刻板印象的情况下也是如此。