Generating a Chain of Thought (CoT) has been shown to consistently improve large language model (LLM) performance on a wide range of NLP tasks. However, prior work has mainly focused on logical reasoning tasks (e.g. arithmetic, commonsense QA); it remains unclear whether improvements hold for more diverse types of reasoning, especially in socially situated contexts. Concretely, we perform a controlled evaluation of zero-shot CoT across two socially sensitive domains: harmful questions and stereotype benchmarks. We find that zero-shot CoT reasoning in sensitive domains significantly increases a model's likelihood to produce harmful or undesirable output, with trends holding across different prompt formats and model variants. Furthermore, we show that harmful CoTs increase with model size, but decrease with improved instruction following. Our work suggests that zero-shot CoT should be used with caution on socially important tasks, especially when marginalized groups or sensitive topics are involved.
翻译:已有研究证实,生成思维链(Chain of Thought, CoT)能持续提升大语言模型在各类自然语言处理任务上的表现。然而,现有工作主要聚焦逻辑推理型任务(如算术、常识问答),尚不明确这种提升是否能推广到更多样化的推理类型,尤其是在社会情境化语境中。具体而言,我们对零样本CoT在两类社会敏感领域(有害问题与刻板印象基准)进行了受控评估,发现敏感领域中的零样本CoT推理会显著增加模型产生有害或不良输出的可能性,且该趋势在不同提示格式与模型变体中保持稳定。此外,我们证明有害CoT会随模型规模增大而增加,但随指令遵循能力提升而减少。本研究提示,零样本CoT在涉及社会重要任务时应谨慎使用,尤其是在涉及边缘化群体或敏感话题时。