As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.
翻译:随着大语言模型(LLM)服务于全球用户,其对齐任务必须从强制执行普遍共识转向尊重文化多元主义。我们证明,当密集模型被迫拟合冲突的价值分布时,会产生\textbf{均值坍缩}(Mean Collapse),即收敛到无法代表不同群体的通用平均值。我们将此归因于\textbf{文化稀疏性}(Cultural Sparsity):梯度干扰阻碍密集参数覆盖不同的文化模式。为解决这一问题,我们提出\textbf{\textsc{CuMA}}(\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters),一种将对齐任务重构为\textbf{条件容量分离}问题的框架。通过引入人口感知路由机制,\textsc{CuMA}内化\textit{潜在文化拓扑结构}(Latent Cultural Topology),将冲突梯度显式解耦至专业化专家子空间。在WorldValuesBench、Community Alignment和PRISM上的广泛评估表明,\textsc{CuMA}实现了最先进性能,显著优于密集基线模型和纯语义专家混合(MoE)方法。关键的是,我们的分析证实\textsc{CuMA}有效缓解了均值坍缩,保留了文化多样性。代码开源于https://github.com/Throll/CuMA。