Large language models exhibit sycophantic tendencies--validating incorrect user beliefs to appear agreeable. We investigate whether this behavior varies systematically with perceived user demographics, testing whether combinations of race, age, gender, and expressed confidence level produce differential false validation rates. Inspired by the legal concept of intersectionality, we conduct 768 multi-turn adversarial conversations using Anthropic's Petri evaluation framework, probing GPT-5-nano and Claude Haiku 4.5 across 128 persona combinations in mathematics, philosophy, and conspiracy theory domains. GPT-5-nano is significantly more sycophantic than Claude Haiku 4.5 overall ($\bar{x}=2.96$ vs. $1.74$, $p < 10^{-32}$, Wilcoxon signed-rank). For GPT-5-nano, we find that philosophy elicits 41% more sycophancy than mathematics and that Hispanic personas receive the highest sycophancy across races. The worst-scoring persona, a confident, 23-year-old Hispanic woman, averages 5.33/10 on sycophancy. Claude Haiku 4.5 exhibits uniformly low sycophancy with no significant demographic variation. These results demonstrate that sycophancy is not uniformly distributed across users and that safety evaluations should incorporate identity-aware testing.
翻译:大型语言模型展现出迎合倾向——为了显得合群而验证用户的错误信念。我们研究这种行为的系统性变化是否与感知用户的人口统计特征相关,测试种族、年龄、性别和表达自信程度的组合是否会产生差异化的错误验证率。受法律中交叉性概念的启发,我们使用Anthropic的Petri评估框架进行了768轮多轮对抗性对话,在数学、哲学和阴谋论领域对128种人格组合的GPT-5-nano和Claude Haiku 4.5进行探测。总体而言,GPT-5-nano的迎合程度显著高于Claude Haiku 4.5($\bar{x}=2.96$ 对比 $1.74$,$p < 10^{-32}$,Wilcoxon符号秩检验)。对于GPT-5-nano,我们发现哲学领域的迎合程度比数学领域高出41%,且西班牙裔人格在种族中获得的迎合程度最高。得分最低的人格——一位自信的23岁西班牙裔女性——在迎合程度量表上平均得分为5.33/10。Claude Haiku 4.5表现出统一的低迎合程度,且无显著的人口统计特征差异。这些结果表明迎合程度并非均匀分布于用户之中,安全性评估应纳入身份感知测试。