Although achieving promising performance, recent analyses show that current generative large language models (LLMs) may still capture dataset biases and utilize them for generation, leading to poor generalizability and harmfulness of LLMs. However, due to the diversity of dataset biases and the over-optimization problem, previous prior-knowledge-based debiasing methods and fine-tuning-based debiasing methods may not be suitable for current LLMs. To address this issue, we explore combining active learning with the causal mechanisms and propose a casual-guided active learning (CAL) framework, which utilizes LLMs itself to automatically and autonomously identify informative biased samples and induce the bias patterns. Then a cost-effective and efficient in-context learning based method is employed to prevent LLMs from utilizing dataset biases during generation. Experimental results show that CAL can effectively recognize typical biased instances and induce various bias patterns for debiasing LLMs.
翻译:尽管近期生成式大型语言模型(LLMs)取得了令人瞩目的性能,但最新分析表明,当前模型仍可能捕捉数据集偏差并利用其进行生成,导致LLMs的泛化能力不足及潜在危害性。然而,由于数据集偏差的多样性和过度优化问题,以往基于先验知识的去偏方法及基于微调的去偏方法可能不适用于当前LLMs。为解决该问题,本研究探索将主动学习与因果机制相结合,提出一种因果引导的主动学习(CAL)框架。该框架利用LLMs自身能力,自动且自主地识别信息量丰富的偏差样本并归纳偏差模式,继而采用一种基于上下文学习的高效低成本方法,防止LLMs在生成过程中利用数据集偏差。实验结果表明,CAL能有效识别典型偏差实例,并归纳多种偏差模式以实现LLMs的去偏。