Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION-Aesthetics Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of "aesthetics" toward more pluralistic evaluation.
翻译:视觉生成式AI模型通常采用"一刀切"的美学吸引力度量进行训练。然而,"美学"判断本质上与个人品味和文化价值观紧密相连,这引发了"视觉生成式AI模型究竟代表了谁的审美"的质疑。本研究针对广泛应用于视觉生成图像模型(如Stable Diffusion)训练数据筛选及AI生成图像质量评估的美学评价模型——LAION美学预测器(LAP)展开分析。为探究LAP的度量本质,我们通过三个数据集对该模型进行审计。首先,我们考察了美学过滤对LAION美学数据集(约12亿张图像)的影响,该数据集是使用LAP从LAION-5B中筛选得到的。研究发现,LAP会不成比例地保留标题提及女性的图像,同时过滤掉标题提及男性或LGBTQ+群体的图像。随后,我们使用LAP对两个艺术数据集中的约33万张图像进行评分,发现该模型对西方和日本艺术家的风景、城市景观及肖像类写实作品评分最高。这种美学评价模型的算法凝视,实质上强化了西方艺术史中存在的帝国主义凝视与男性凝视。为追溯这些偏见的起源,我们对LAP创建相关的公开资料进行了数字民族志研究。研究发现,LAP的开发过程反映了审计中发现的偏见,例如用于训练LAP的美学评分主要来自英语国家摄影师和西方AI爱好者。基于此,我们探讨了美学评价如何延续表征性伤害,并呼吁AI开发者从规定性的"美学"度量转向更具多元性的评估体系。