Millions of users across the globe turn to AI chatbots for their creative needs, inviting widespread interest in understanding how they represent diverse cultures. However, evaluating cultural representations in open-ended tasks remains challenging and underexplored. In this work, we present TALES, an evaluation of cultural misrepresentations in LLM-generated stories for diverse Indian cultural identities. First, we develop TALES-Tax, a taxonomy of cultural misrepresentations by collating insights from participants with lived experiences in India through focus groups (N=9) and individual surveys (N=15). Using TALES-Tax, we evaluate 6 models through a large-scale annotation study spanning 2925 annotations from 108 annotators with lived experience and native language proficiency from across 71 regions in India and 14 languages. Concerningly, we find that 88% of the generated stories contain misrepresentations, and such errors are more prevalent in mid- and low-resourced languages and stories based in peri-urban regions in India. We also transform the annotations into TALES-QA, a standalone question bank to evaluate the cultural knowledge of models.
翻译:全球数百万用户向AI聊天机器人寻求创意需求,引发了对其如何表征多元文化的广泛关注。然而,在开放式任务中评估文化表征仍具挑战性且研究不足。本研究提出TALES,针对多样印度文化身份,评估大语言模型生成故事中的文化误表征问题。首先,我们通过焦点小组(N=9)和个人问卷调查(N=15),汇集来自具有印度生活经验的参与者的见解,构建了文化误表征分类法TALES-Tax。基于TALES-Tax,我们通过一项大规模标注研究评估了6个模型,该研究涵盖来自印度71个地区和14种语言的108位具有生活经验和母语能力的标注者完成的2925条标注。令人担忧的是,我们发现88%的生成故事存在误表征,且此类错误在中等及低资源语言以及基于印度城乡结合地区的故事中更为普遍。我们还将标注结果转化为独立问题库TALES-QA,用于评估模型的文化知识。