Large language models (LLMs) have achieved strong performance across a wide range of tasks, but they are also prone to sycophancy, the tendency to agree with user statements regardless of validity. Previous research has outlined both the extent and the underlying causes of sycophancy in earlier models, such as ChatGPT-3.5 and Davinci. Newer models have since undergone multiple mitigation strategies, yet there remains a critical need to systematically test their behavior. In particular, the effect of language on sycophancy has not been explored. In this work, we investigate how the language influences sycophantic responses. We evaluate three state-of-the-art models, GPT-4o mini, Gemini 1.5 Flash, and Claude 3.5 Haiku, using a set of tweet-like opinion prompts translated into five additional languages: Arabic, Chinese, French, Spanish, and Portuguese. Our results show that although newer models exhibit significantly less sycophancy overall compared to earlier generations, the extent of sycophancy is still influenced by the language. We further provide a granular analysis of how language shapes model agreeableness across sensitive topics, revealing systematic cultural and linguistic patterns. These findings highlight both the progress of mitigation efforts and the need for broader multilingual audits to ensure trustworthy and bias-aware deployment of LLMs.
翻译:大语言模型(LLMs)在广泛任务中展现出强劲性能,但同时也容易表现出趋同行为,即无论用户陈述正确与否都倾向于表示赞同。先前研究已概述了早期模型(如ChatGPT-3.5和Davinci)中趋同行为的程度及其根本成因。较新模型虽已接受多种缓解策略,但仍有必要系统性地测试其行为。特别是,语言对趋同行为的影响尚未得到探索。本研究考察了语言如何影响趋同响应。我们采用一组仿推文式观点提示,翻译成阿拉伯语、中文、法语、西班牙语和葡萄牙语五种附加语言,对三个最先进模型(GPT-4o mini、Gemini 1.5 Flash和Claude 3.5 Haiku)进行了评估。结果表明,尽管新一代模型整体趋同行为较早期版本显著减少,但趋同程度仍受语言影响。我们进一步从细粒度层面分析了语言如何塑造模型在敏感话题上的附和程度,揭示了系统性的文化与语言模式。这些发现既凸显了缓解措施的进展,也表明为保障LLMs的可信部署并减少偏见,亟需开展更广泛的多语言审计。