Characterizing Cultural Localization in AI-Generated Stories

The global use of artificial intelligence has increased interest in assessing the ability to generate culturally localized content, including stories. Cultural localization in stories often occurs through either templated localization -- the use of cultural markers (e.g., names, locations) in a generic narrative -- or holistic localization -- the variation of plots, values, and themes, in addition to cultural markers. We propose a method to measure the degree to which content was generated through templated localization. Specifically, we identify the lexical tokens that distinguish stories across nationalities and measure the similarity of the narratives that remain after removing them. In stories generated by five models on 125 topics for 193 nationalities, our method is able to detect that only a small subset (9-17%) of the vocabulary accounts for the variation across nationalities and that the narratives that remain after removing them contain repeated multi-word sequences, suggesting the presence of a shared culturally-agnostic narrative template. Finally, we characterize the cultural markers for their stereotypicality and offensiveness, finding that markers from 19 countries, mostly located in the Global South, are on average offensive.

翻译：人工智能的全球应用日益引发对其生成文化本地化内容能力的关注，尤其是故事创作方面。故事中的文化本地化通常通过两种方式实现：模板化本地化——在通用叙事中使用文化标记（如姓名、地点）；或整体本地化——除文化标记外，对情节、价值观和主题进行变化。我们提出了一种衡量内容通过模板化本地化生成程度的方法。具体而言，我们识别区分不同国家故事的语言标记，并测量移除这些标记后剩余叙事的相似性。在基于125个主题、193个国家的五个模型生成的故事中，我们的方法能够检测到仅有一小部分词汇（9-17%）解释了国家间的差异，而移除这些词汇后剩余的叙事中反复出现多词序列，暗示存在一个共享的、与文化无关的叙事模板。最后，我们刻画了文化标记的刻板性和冒犯性程度，发现来自19个国家（主要位于全球南方）的标记平均具有冒犯性。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

【新书】生成式人工智能：概念与应用

专知会员服务

47+阅读 · 2025年3月18日

AI产业系列深度报告（一）：生成式AI多领域落地，赋能传媒行业发展

专知会员服务

24+阅读 · 2024年6月29日

基于深度学习的中文文本分类综述

专知会员服务

25+阅读 · 2024年5月9日

可解释生成人工智能 (GenXAI)：综述、概念化与研究议程

专知会员服务

39+阅读 · 2024年4月19日