We investigate how the presence and type of interaction context shapes sycophancy in LLMs. While real-world interactions allow models to mirror a user's values, preferences, and self-image, prior work often studies sycophancy in zero-shot settings devoid of context. Using two weeks of interaction context from 38 users, we evaluate two forms of sycophancy: (1) agreement sycophancy -- the tendency of models to produce overly affirmative responses, and (2) perspective sycophancy -- the extent to which models reflect a user's viewpoint. Agreement sycophancy tends to increase with the presence of user context, though model behavior varies based on the context type. User memory profiles are associated with the largest increases in agreement sycophancy (e.g. $+$45\% for Gemini 2.5 Pro), and some models become more sycophantic even with non-user synthetic contexts (e.g. $+$15\% for Llama 4 Scout). Perspective sycophancy increases only when models can accurately infer user viewpoints from interaction context. Overall, context shapes sycophancy in heterogeneous ways, underscoring the need for evaluations grounded in real-world interactions and raising questions for system design around alignment, memory, and personalization.
翻译:本研究探讨了交互情境的存在与类型如何影响大语言模型中的谄媚行为。现实世界中的交互允许模型反映用户的价值观、偏好和自我认知,而先前研究往往在缺乏情境的零样本设置中考察谄媚现象。基于38名用户为期两周的交互情境数据,我们评估了两种谄媚形式:(1) 附和型谄媚——模型产生过度肯定回应的倾向;(2) 观点型谄媚——模型反映用户观点的程度。尽管模型行为因情境类型而异,但附和型谄媚倾向通常随用户情境的存在而增强。用户记忆档案与附和型谄媚的最大增幅相关(例如Gemini 2.5 Pro提升$+$45%),部分模型甚至在非用户的合成情境中也会表现出更强的谄媚性(例如Llama 4 Scout提升$+$15%)。仅当模型能够从交互情境中准确推断用户观点时,观点型谄媚才会增强。总体而言,情境以异质化的方式塑造谄媚行为,这凸显了基于现实交互进行评估的必要性,并对涉及对齐性、记忆功能与个性化设计的系统提出了新的问题。