This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements. Furthermore, our analyses delve into knowledge-probing experiments on factual keywords and different categories of sycophancy mitigation.
翻译:本研究探讨了大型语言模型(LLMs)的谄媚性倾向,即这些模型倾向于提供符合用户期望的答案,即使这些答案并非完全正确。这一探索的动机源于观察到的常见行为:人们在互联网上搜索事实时,往往带着片面或误导性的知识。类似于使用网络搜索引擎,用户可能回忆起零散的误导性关键词,并将其提交给LLM,期望获得全面的回答。我们对多个LLM的实证分析表明,当输入误导性关键词时,这些模型存在放大错误信息的潜在风险。此外,我们系统评估了四种现有的幻觉缓解策略,以降低LLMs的谄媚行为。实验证明这些策略在生成事实正确的陈述方面具有有效性。进一步的分析深入探讨了针对事实性关键词的知识探测实验,以及不同类别的谄媚缓解方法。