The conformity effect describes the tendency of individuals to align their responses with the majority. Studying this bias in large language models (LLMs) is crucial, as LLMs are increasingly used in various information-seeking and decision-making tasks as conversation partners to improve productivity. Thus, conformity to incorrect responses can compromise their effectiveness. In this paper, we adapt psychological experiments to examine the extent of conformity in state-of-the-art LLMs. Our findings reveal that all models tested exhibit varying levels of conformity toward the majority, regardless of their initial choice or correctness, across different knowledge domains. Notably, we are the first to show that LLMs are more likely to conform when they are more uncertain in their own prediction. We further explore factors that influence conformity, such as training paradigms and input characteristics, finding that instruction-tuned models are less susceptible to conformity, while increasing the naturalness of majority tones amplifies conformity. Finally, we propose two interventions--Devil's Advocate and Question Distillation--to mitigate conformity, providing insights into building more robust language models.
翻译:从众效应描述了个人倾向于使自己的回应与多数人保持一致的现象。研究大语言模型中的这种偏差至关重要,因为大语言模型正越来越多地作为对话伙伴被用于各种信息检索和决策任务中以提升生产力。因此,对错误回答的从众行为会损害其有效性。本文通过改编心理学实验,考察了前沿大语言模型中从众效应的程度。我们的研究结果表明,所有被测试的模型在不同知识领域均表现出不同程度的从众倾向,无论其初始选择或答案正确与否。值得注意的是,我们首次证明当大语言模型对其自身预测更不确定时,它们更有可能从众。我们进一步探讨了影响从众的因素,如训练范式和输入特征,发现经过指令微调的模型较不易从众,而增加多数意见表达的自然度则会增强从众。最后,我们提出了两种干预措施——"魔鬼代言人"法和问题精炼法——以减轻从众效应,为构建更稳健的语言模型提供了见解。