Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use.
翻译:现有去偏技术通常基于训练或需要访问模型内部结构和输出分布,因此终端用户无法根据自身需求调整大语言模型的输出。本研究探究结构化提示技术能否为公平文本生成提供新机遇。我们评估了一个以终端用户为中心的迭代去偏框架,该框架在提示中引入系统2思维过程,通过单步、多步、指令和角色等变体,引导模型进行逻辑性、反思性及批判性文本生成。通过系统评估多个大语言模型在多种数据集和不同提示策略下的表现,我们发现基于系统2的复杂隐含提示在降低输出平均偏差方面显著优于其他技术,且在下游任务中保持竞争性表现。本研究为面向终端用户的LLM评估框架设计提供了研究方向,并揭示了其潜在应用价值。