Pre-training large language models (LLMs) on vast text corpora enhances natural language processing capabilities but risks encoding social biases, particularly gender bias. While parameter-modification methods like fine-tuning mitigate bias, they are resource-intensive, unsuitable for closed-source models, and lack adaptability to evolving societal norms. Instruction-based approaches offer flexibility but often compromise task performance. To address these limitations, we propose $\textbf{FaIRMaker}$, an automated and model-independent framework that employs an $\textbf{auto-search and refinement}$ paradigm to adaptively generate Fairwords, which act as instructions integrated into input queries to reduce gender bias and enhance response quality. Extensive experiments demonstrate that FaIRMaker automatically searches for and dynamically refines Fairwords, effectively mitigating gender bias while preserving task integrity and ensuring compatibility with both API-based and open-source LLMs.
翻译:在庞大文本语料库上预训练大型语言模型(LLMs)增强了自然语言处理能力,但也存在编码社会偏见(尤其是性别偏见)的风险。尽管参数修改方法(如微调)可缓解偏见,但其资源消耗大、不适用于闭源模型,且难以适应不断演变的社会规范。基于指令的方法虽具灵活性,却常以牺牲任务性能为代价。为应对这些局限,我们提出 $\textbf{FaIRMaker}$——一种自动化且模型无关的框架,采用 $\textbf{自动搜索与精炼}$ 范式自适应生成公平词(Fairwords),作为指令集成至输入查询中,以降低性别偏见并提升响应质量。大量实验表明,FaIRMaker 能自动搜索并动态精炼公平词,在保持任务完整性的同时有效缓解性别偏见,并确保与基于 API 及开源 LLMs 的兼容性。