Large language models are increasingly used to mediate everyday interpersonal dilemmas, yet how their advisory defaults interact with the concentrated moral orders of specific communities remains poorly understood. This article compares four assistant-style LLMs with community-endorsed advice on 11,565 posts from r/relationship_advice, using the subreddit as a concentrated, vote-ratified moral formation whose prescriptive clarity makes divergence measurable. Across models, LLMs identify many of the same dynamics as human commenters, but are markedly less likely to convert that recognition into directive authorization for action. The gap is sharpest where community consensus is strongest: on high-consensus posts involving abuse or safety threats, models recommend exit at roughly half the human rate while maintaining elevated levels of hedging, validation, and therapeutic framing. The article describes this pattern as recognition without authorization: the capacity to register harm while withholding socially ratified permission for consequential action. This divergence is not incidental but structural: a portable advisory style that remains validating, risk-averse, and weakly directive across contexts. Safety alignment is one plausible contributor to this pattern, alongside training-data averaging and broader assistant design. The article argues that model divergence can be reframed from a technical error to a way of seeing what standardized assistant norms flatten when they encounter situated moral worlds.
翻译:大型语言模型日益被用于调解日常人际困境,然而其建议的默认模式如何与特定社群集中的道德秩序相互作用,仍鲜为人知。本文比较了四种助手式大型语言模型与社群认可的建议,基于来自r/relationship_advice子版块的11,565篇帖子,将该子版块视为一种集中的、由投票认可的道德形态,其明确的规范性使差异可衡量。在各模型中,大型语言模型识别出许多与人类评论者相同的动态,但明显不太可能将该识别转化为可指导行动的授权。这种差距在社群共识最强的领域最为显著:在涉及虐待或安全威胁的高共识帖子上,模型建议退出的频率大约是人类的一半,同时保持更高水平的回避、验证和治疗框架。本文将此模式描述为“识别而无授权”:即登记伤害的能力,同时扣留社会认可的对重要行动许可。这种分歧并非偶然,而是结构性的:一种便携式建议风格,在不同情境中保持验证性、风险规避和弱指导性。安全性对齐是这一模式的潜在促成因素之一,此外还有训练数据平均化和更广泛的助手设计。本文认为,模型分歧可以从技术错误重新定义为一种视角,用以观察标准化助手规范在遭遇具体道德世界时所扁平化的内容。