Large language models (LLMs) are increasingly used as conversational partners for companionship, emotional disclosure, and interpersonal advice, but the social dynamics of these interactions can create harms that are not captured by capability-oriented or traditional safety evaluations. We introduce the Social AI Design Code, a framework for evaluating whether LLMs align with user welfare in social interactions, including whether they encourage harmful intimacy, dependence, or prolonged engagement. To evaluate these risks in natural and diverse user-LLM interactions, we operationalize the code with EUDAIMONIA, a benchmark of 969 user inputs and 3,147 design-requirement violation checks built from WildChat through weak-to-strong filtration, multi-model relabeling, and controlled rewriting. Evaluating 22 recent LLMs, we find that even the strongest models, Claude-Opus-4.7 and GPT-5.5, violate 30.7% and 27.2% of checks, respectively. Extended thinking does not reduce violation rates, suggesting that these failures are persistent social-alignment problems rather than deficits solvable through test-time reasoning alone.
翻译:大型语言模型(LLM)日益被用作陪伴、情感表露及人际建议的对话伙伴,但这些交互中的社会动态可能造成能力导向型评估或传统安全评估未能捕捉的伤害。我们提出“社交人工智能设计守则”框架,用于评估LLM在社会交互中是否符合用户福祉,包括是否鼓励有害亲密关系、依赖或长时间参与。为评估自然且多样化的用户-LLM交互中的这些风险,我们通过弱到强过滤、多模型重标注及可控改写,基于WildChat数据集构建了包含969条用户输入与3,147项设计需求违规检查的基准测试集EUDAIMONIA,对该守则进行操作化。评估22个近期LLM后发现,即使是表现最强的模型Claude-Opus-4.7与GPT-5.5,也分别违反了30.7%和27.2%的检查项。扩展推理并未降低违规率,表明这些故障是持续存在的社交对齐问题,而非仅通过测试时推理即可解决的缺陷。