Visual generative AI models are trained using a one-size-fits-all measure of aesthetic appeal. However, what is deemed "aesthetic" is inextricably linked to personal taste and cultural values, raising the question of whose taste is represented in visual generative AI models. In this work, we study an aesthetic evaluation model--LAION Aesthetic Predictor (LAP)--that is widely used to curate datasets to train visual generative image models, like Stable Diffusion, and evaluate the quality of AI-generated images. To understand what LAP measures, we audited the model across three datasets. First, we examined the impact of aesthetic filtering on the LAION-Aesthetics Dataset (approximately 1.2B images), which was curated from LAION-5B using LAP. We find that the LAP disproportionally filters in images with captions mentioning women, while filtering out images with captions mentioning men or LGBTQ+ people. Then, we used LAP to score approximately 330k images across two art datasets, finding the model rates realistic images of landscapes, cityscapes, and portraits from western and Japanese artists most highly. In doing so, the algorithmic gaze of this aesthetic evaluation model reinforces the imperial and male gazes found within western art history. In order to understand where these biases may have originated, we performed a digital ethnography of public materials related to the creation of LAP. We find that the development of LAP reflects the biases we found in our audits, such as the aesthetic scores used to train LAP primarily coming from English-speaking photographers and western AI-enthusiasts. In response, we discuss how aesthetic evaluation can perpetuate representational harms and call on AI developers to shift away from prescriptive measures of "aesthetics" toward more pluralistic evaluation.
翻译:视觉生成式AI模型通常采用一种"一刀切"的审美吸引力度量进行训练。然而,"审美"标准与个人品味及文化价值观密不可分,这引发了视觉生成式AI模型究竟代表谁之审美的问题。本研究针对广泛应用于视觉生成图像模型(如Stable Diffusion)训练数据筛选及AI生成图像质量评估的审美评价模型——LAION审美预测器(LAP)展开分析。为探究LAP的度量本质,我们通过三个数据集对该模型进行审计。首先,我们考察了审美过滤对LAION-Aesthetics数据集(约12亿张图像)的影响,该数据集是使用LAP从LAION-5B中筛选得到的。研究发现,LAP会不成比例地筛选入标题提及女性的图像,同时过滤掉标题提及男性或LGBTQ+群体的图像。随后,我们使用LAP对两个艺术数据集中的约33万张图像进行评分,发现该模型对西方和日本艺术家的风景、城市景观及肖像类写实作品评分最高。这种审美评价模型的算法凝视,实则强化了西方艺术史中存在的帝国主义凝视与男性凝视。为追溯这些偏见的起源,我们对LAP创建相关的公开材料进行了数字民族志研究。发现LAP的开发过程反映了审计中发现的偏见,例如用于训练LAP的审美评分主要来自英语摄影师和西方AI爱好者。基于此,我们探讨了审美评价如何延续表征性危害,并呼吁AI开发者从规定性的"审美"度量转向更具多元性的评价体系。