Large Language Models (LLMs) often exhibit gender bias, resulting in unequal treatment of male and female subjects across different contexts. To address this issue, we propose a novel data generation framework that fosters exploratory thinking in LLMs. Our approach prompts models to generate story pairs featuring male and female protagonists in structurally identical, morally ambiguous scenarios, then elicits and compares their moral judgments. When inconsistencies arise, the model is guided to produce balanced, gender-neutral judgments. These story-judgment pairs are used to fine-tune or optimize the models via Direct Preference Optimization (DPO). Experimental results show that our method significantly reduces gender bias while preserving or even enhancing general model capabilities. We will release the code and generated data. We release the code and generated data at: https://github.com/WeiKangda/LLMs-Exploratory-Bias-Mitigation/tree/main.
翻译:大型语言模型(LLMs)常表现出性别偏见,导致在不同语境中对男性和女性主体存在区别对待。为应对这一问题,我们提出了一种新颖的数据生成框架,旨在促进LLMs的探索性思维。我们的方法引导模型生成结构完全相同、具有道德模糊性的情景下分别以男性和女性为主角的故事对,进而激发并比较模型对二者的道德判断。当出现不一致时,模型会被引导产生平衡、性别中立的判断。这些故事-判断对被用于通过直接偏好优化(DPO)对模型进行微调或优化。实验结果表明,我们的方法在保持甚至提升模型通用能力的同时,显著降低了性别偏见。我们将公开代码及生成的数据。代码与生成数据发布于:https://github.com/WeiKangda/LLMs-Exploratory-Bias-Mitigation/tree/main。