Large language models (LLMs) alignment ensures model behaviors reflect human value. Existing alignment strategies primarily follow two paths: one assumes a universal value set for a unified goal (i.e., one-size-fits-all), while the other treats every individual as unique to customize models (i.e., individual-level). However, assuming a monolithic value space marginalizes minority norms, while tailoring individual models is prohibitively expensive. Recognizing that human society is organized into social clusters with high intra-group value alignment, we propose community-level alignment as a "middle ground". Practically, we introduce CommunityBench, the first large-scale benchmark for community-level alignment evaluation, featuring four tasks grounded in Common Identity and Common Bond theory. With CommunityBench, we conduct a comprehensive evaluation of various foundation models on CommunityBench, revealing that current LLMs exhibit limited capacity to model community-specific preferences. Furthermore, we investigate the potential of community-level alignment in facilitating individual modeling, providing a promising direction for scalable and pluralistic alignment.
翻译:大型语言模型(LLM)的对齐旨在确保模型行为反映人类价值观。现有的对齐策略主要遵循两条路径:其一假设存在适用于统一目标的普适价值集合(即“一刀切”模式),其二则将每个个体视为独特对象以定制模型(即个体层面)。然而,假设单一价值空间会边缘化少数群体规范,而为每个个体定制模型则成本高昂。认识到人类社会是由具有高度组内价值一致性的社会集群所构成,我们提出社区层面对齐作为“中间路径”。具体而言,我们提出了CommunityBench——首个面向社区层面对齐评估的大规模基准,其基于共同身份与共同纽带理论构建了四项任务。借助CommunityBench,我们对多种基础模型进行了全面评估,结果表明当前LLM建模社区特定偏好的能力有限。此外,我们探究了社区层面对齐在促进个体建模方面的潜力,为可扩展且多元化的对齐研究提供了新方向。