As Large Language Models (LLMs) serve a global audience, alignment must transition from enforcing universal consensus to respecting cultural pluralism. We demonstrate that dense models, when forced to fit conflicting value distributions, suffer from \textbf{Mean Collapse}, converging to a generic average that fails to represent diverse groups. We attribute this to \textbf{Cultural Sparsity}, where gradient interference prevents dense parameters from spanning distinct cultural modes. To resolve this, we propose \textbf{\textsc{CuMA}} (\textbf{Cu}ltural \textbf{M}ixture of \textbf{A}dapters), a framework that frames alignment as a \textbf{conditional capacity separation} problem. By incorporating demographic-aware routing, \textsc{CuMA} internalizes a \textit{Latent Cultural Topology} to explicitly disentangle conflicting gradients into specialized expert subspaces. Extensive evaluations on WorldValuesBench, Community Alignment, and PRISM demonstrate that \textsc{CuMA} achieves state-of-the-art performance, significantly outperforming both dense baselines and semantic-only MoEs. Crucially, our analysis confirms that \textsc{CuMA} effectively mitigates mean collapse, preserving cultural diversity. Our code is available at https://github.com/Throll/CuMA.
翻译:随着大语言模型(LLMs)服务全球用户,对齐机制必须从强制普遍共识转向尊重文化多元性。我们发现,当密集模型被迫适应相互冲突的价值分布时,会遭受**均值坍缩**问题,即收敛到一个无法代表多元群体的通用平均值。我们将此归因于**文化稀疏性**——梯度干扰阻碍了密集参数覆盖不同的文化模式。为解决这一问题,我们提出**CuMA**(文化适配器混合框架),该框架将对齐问题构建为**条件化能力分离**任务。通过引入基于人口统计的路由机制,CuMA内化了**潜在文化拓扑结构**,从而将冲突梯度显式解耦至专用专家子空间。在WorldValuesBench、Community Alignment和PRISM数据集上的大量评估表明,CuMA实现了最先进的性能,显著优于密集基线模型和仅基于语义的混合专家模型。关键的是,我们的分析证实CuMA能有效缓解均值坍缩现象,保持文化多样性。代码已开源:https://github.com/Throll/CuMA。