Despite their global prevalence, many Large Language Models (LLMs) are aligned to a monolithic, often Western-centric set of values. This paper investigates the more challenging task of fine-grained value alignment: examining whether LLMs can emulate the distinct cultural values of demographic subgroups. Using Singapore as a case study and the World Values Survey (WVS), we examine the value landscape and show that even state-of-the-art models like GPT-4.1 achieve only 57.4% accuracy in predicting subgroup modal preferences. We construct a dataset of over 20,000 samples to train and evaluate a range of models. We demonstrate that simple fine-tuning on structured numerical preferences yields substantial gains, improving accuracy on unseen, out-of-distribution subgroups by an average of 17.4%. These gains partially transfer to open-ended generation. However, we find significant pre-existing performance biases, where models better emulate young, male, Chinese, and Christian personas. Furthermore, while fine-tuning improves average performance, it widens the disparity between subgroups when measured by distance-aware metrics. Our work offers insights into the limits and fairness implications of subgroup-level cultural alignment.
翻译:尽管大型语言模型(LLMs)在全球范围内广泛应用,但其价值观往往锚定于单一且常以西方为中心的价值体系。本文聚焦更精细的价值观对齐任务:探究LLMs能否模拟人口统计亚群体独特的文化价值观。以新加坡为案例,结合世界价值观调查(WVS),我们分析了价值观分布,并揭示即使如GPT-4.1等最先进模型,在预测亚群体模态偏好时准确率也仅为57.4%。我们构建了包含两万多个样本的数据集,用于训练和评估多种模型。研究表明,对结构化数值偏好进行简单微调即可带来显著提升,对未见过的分布外亚群体平均准确率提高17.4%。这些提升部分可迁移至开放式生成任务。然而,我们发现模型存在显著的预存性能偏差,对年轻人、男性、华裔及基督徒群体的模拟表现更优。此外,尽管微调提升了平均性能,但基于距离感知的指标评估显示,亚群体间的性能差距反而扩大。本研究揭示了亚群体层面文化对齐的局限性及其公平性启示。