As the breadth and depth of language model applications continue to expand rapidly, it is increasingly important to build efficient frameworks for measuring and mitigating the learned or inherited social biases of these models. In this paper, we present our work on evaluating instruction fine-tuned language models' ability to identify bias through zero-shot prompting, including Chain-of-Thought (CoT) prompts. Across LLaMA and its two instruction fine-tuned versions, Alpaca 7B performs best on the bias identification task with an accuracy of 56.7%. We also demonstrate that scaling up LLM size and data diversity could lead to further performance gain. This is a work-in-progress presenting the first component of our bias mitigation framework. We will keep updating this work as we get more results.
翻译:随着语言模型应用的广度和深度持续快速扩展,构建高效的框架来测量和缓解这些模型习得或继承的社会偏见变得日益重要。本文评估了指令微调语言模型通过零样本提示(包括思维链提示)识别偏见的能力。在LLaMA及其两个指令微调版本中,Alpaca 7B在偏见识别任务上表现最佳,准确率达56.7%。我们还证明,扩大大型语言模型规模和数据多样性可进一步提升性能。本工作尚在进行中,呈现了偏见缓解框架的第一个组成部分。我们将随着获取更多结果持续更新此项研究。