Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).
翻译:指令微调大型语言模型近期展现出根据自然语言指令生成适当响应的卓越能力。然而,一个开放的研究问题涉及训练模型及其响应的固有偏见。例如,如果用于微调语言模型的数据主要由具有特定政治偏见的作者撰写,那么生成的答案可能共享这种偏见。当前研究致力于去偏见化或抑制潜在偏见答案。通过本演示,我们对指令微调中的偏见采取不同视角:我们旨在使偏见显式化和透明化,而非抑制它们。为此,我们提出OpinionGPT,这是一个网络演示工具,用户可提出疑问并选择所有希望调查的偏见类别。该演示使用针对每种选定偏见文本微调的模型回答问题,支持并排比较。为训练底层模型,我们识别了11种不同偏见(政治、地理、性别、年龄),并构建了一个指令微调语料库,其中每个答案均由属于这些人口统计特征之一的成员撰写。本文介绍OpinionGPT,阐述如何训练感知偏见的模型,并展示该网络应用程序。