Generative AI model outputs have been increasingly evaluated for their (in)ability to represent non-Western cultures. We argue that these evaluations often operate through reductive ideals of representation, abstracted from how people define their own representation and neglecting the inherently interpretive and contextual nature of cultural representation. In contrast to these 'thin' evaluations, we introduce the idea of 'thick evaluations:' a more granular, situated, and discursive measurement framework for evaluating representations of social worlds in AI outputs, steeped in communities' own understandings of representation. We develop this evaluation framework through workshops in South Asia, by studying the 'thick' ways in which people interpret and assign meaning to AI-generated images of their own cultures. We introduce practices for thicker evaluations of representation that expand the understanding of representation underpinning AI evaluations and by co-constructing metrics with communities, bringing measurement in line with the experiences of communities on the ground.
翻译:生成式人工智能模型的输出结果正日益被评估其(无法)表征非西方文化的能力。我们认为,这些评估往往通过简化的表征理想进行操作,脱离了人们如何定义自身表征的方式,并忽视了文化表征固有的解释性与情境性本质。与这些“薄描式”评估相对,我们提出“厚描式评估”的理念:一种更精细、情境化且具话语性的测量框架,用于评估人工智能输出中社会世界的表征,其根基在于社群自身对表征的理解。我们通过在南亚地区开展研讨会,研究人们对人工智能生成的自身文化图像进行诠释与意义赋予的“厚描”方式,从而发展出这一评估框架。我们引入了厚描式表征评估的实践方法,通过扩展支撑人工智能评估的表征理解,并与社群共同构建度量标准,使测量更贴近在地社群的实际经验。