Multilingual NLP is often treated as a route to global inclusion, but linguistic coverage and cultural competence frequently diverge. This paper synthesizes over 50 papers spanning multilingual performance inequality, cross-lingual transfer, culture-aware evaluation, cultural alignment, multimodal benchmarks, benchmark-design critique, and community-grounded data practices. Across this literature, training data coverage remains important, but outcomes are also shaped by tokenization, prompt language, translated benchmark design, culturally grounded supervision, modality, and who authors or validates evaluation data. We argue that culturally grounded NLP should move beyond treating languages as isolated rows in benchmark tables and instead model communicative ecologies: the institutions, scripts, domains, modalities, and communities through which language is used. We propose a layered evaluation and reporting agenda centered on representation audits, mixed elicitation, ecological validity, community validation, adaptation provenance, within-language variation, and maintenance of living cultural resources.
翻译:多语言自然语言处理(NLP)常被视为通往全球包容的途径,但语言覆盖与文化能力往往相互偏离。本文综合了50余篇论文,涵盖多语言性能不平等、跨语言迁移、文化感知评估、文化对齐、多模态基准、基准设计批评以及社区扎根的数据实践。在这些文献中,训练数据覆盖仍具重要性,但结果同样受到分词、提示语言、翻译基准设计、文化根基监督、模态以及评估数据的作者或验证者等因素的影响。我们认为,文化扎根的NLP应超越将语言视为基准表中孤立条目的做法,转而建模沟通生态:即语言得以使用的制度、脚本、领域、模态与社区。我们提出一个分层评估与报告议程,核心包括代表性审计、混合引导、生态效度、社区验证、适配溯源、语内变异以及活态文化资源的维护。