As Large Language Models (LLMs) continue to evolve, they are increasingly being employed in numerous studies to simulate societies and execute diverse social tasks. However, LLMs are susceptible to societal biases due to their exposure to human-generated data. Given that LLMs are being used to gain insights into various societal aspects, it is essential to mitigate these biases. To that end, our study investigates the presence of implicit gender biases in multi-agent LLM interactions and proposes two strategies to mitigate these biases. We begin by creating a dataset of scenarios where implicit gender biases might arise, and subsequently develop a metric to assess the presence of biases. Our empirical analysis reveals that LLMs generate outputs characterized by strong implicit bias associations (>= 50\% of the time). Furthermore, these biases tend to escalate following multi-agent interactions. To mitigate them, we propose two strategies: self-reflection with in-context examples (ICE); and supervised fine-tuning. Our research demonstrates that both methods effectively mitigate implicit biases, with the ensemble of fine-tuning and self-reflection proving to be the most successful.
翻译:随着大语言模型(LLMs)的持续发展,它们正被越来越多地应用于模拟社会和执行多样化社会任务的研究中。然而,由于大语言模型接触的是人类生成的数据,它们容易受到社会偏见的影响。鉴于大语言模型正被用于获取对社会各个层面的洞见,缓解这些偏见至关重要。为此,本研究探讨了多智能体大语言模型交互中存在的隐性性别偏见,并提出了两种缓解这些偏见的策略。我们首先创建了一个可能产生隐性性别偏见的场景数据集,随后开发了一种评估偏见存在程度的度量标准。我们的实证分析表明,大语言模型生成的输出具有强烈的隐性偏见关联(出现频率≥50%)。此外,这些偏见在多智能体交互后往往趋于加剧。为了缓解这些偏见,我们提出了两种策略:基于上下文示例(ICE)的自我反思;以及监督微调。我们的研究表明,这两种方法都能有效缓解隐性偏见,其中微调与自我反思的组合策略被证明最为成功。