Large language models (LLMs) are used worldwide, yet exhibit Western cultural tendencies. Many countries are now building ``regional'' or ``sovereign'' LLMs, but it remains unclear whether they reflect local values and practices or merely speak local languages. Using India as a case study, we evaluate six Indic and six global LLMs on two dimensions -- values and practices -- grounded in nationally representative surveys and community-sourced QA datasets. Across tasks, Indic models do not align better with Indian norms than global models; in fact, a U.S. respondent is a closer proxy for Indian values than any Indic model. We further run a user study with 115 Indian users and find that writing suggestions from both global and Indic LLMs introduce Westernized or exoticized writing. Prompting and regional fine-tuning fail to recover alignment and can even degrade existing knowledge. We attribute this to scarce culturally grounded data, especially for pretraining. We position cultural evaluation as a first-class requirement alongside multilingual benchmarks and offer a reusable, community-grounded methodology. We call for native, community-authored corpora and thickxwide evaluations to build truly sovereign LLMs.
翻译:大语言模型(LLMs)在全球范围内得到应用,却呈现出西方文化倾向。许多国家正在构建“区域性”或“主权性”大语言模型,但这些模型是否真正反映了本土价值观与实践,抑或仅能使用当地语言,尚不明确。本研究以印度为例,基于全国代表性调查和社区众包问答数据集,从价值观与实践两个维度评估了六个印度本土模型和六个全球模型。在所有任务中,印度本土模型并未比全球模型更贴合印度规范;事实上,美国受访者比任何印度本土模型都更接近印度价值观的代理。我们进一步对115名印度用户开展用户研究,发现无论是全球还是印度本土大语言模型提供的写作建议,均会引入西化或异域化的表达方式。提示工程与区域性微调未能恢复文化对齐,甚至可能损害现有知识。我们将此归因于缺乏文化根基的数据,特别是在预训练阶段。本研究主张将文化评估确立为与多语言基准同等重要的一级需求,并提供了一套可复用、扎根社区的方法论。我们呼吁构建由本土社区创作的语料库,并开展深入广泛的评估,以建设真正具有主权性的大语言模型。