In the fields of Experimental and Computational Aesthetics, numerous image datasets have been created over the last two decades. In the present work, we provide a comparative overview of twelve image datasets that include aesthetic ratings (beauty, liking or aesthetic quality) and investigate the reproducibility of results across different datasets. Specifically, we examine how consistently the ratings can be predicted by using either (A) a set of 20 previously studied statistical image properties, or (B) the layers of a convolutional neural network developed for object recognition. Our findings reveal substantial variation in the predictability of aesthetic ratings across the different datasets. However, consistent similarities were found for datasets containing either photographs or paintings, suggesting different relevant features in the aesthetic evaluation of these two image genres. To our surprise, statistical image properties and the convolutional neural network predict aesthetic ratings with similar accuracy, highlighting a significant overlap in the image information captured by the two methods. Nevertheless, the discrepancies between the datasets call into question the generalizability of previous research findings on single datasets. Our study underscores the importance of considering multiple datasets to improve the validity and generalizability of research results in the fields of experimental and computational aesthetics.
翻译:在实验美学与计算美学领域,过去二十年间涌现了大量包含审美评分(美感、喜好或审美质量)的图像数据集。本研究对十二个包含审美评分的图像数据集进行了比较性综述,并探究了不同数据集间结果的可重复性。具体而言,我们考察了分别通过以下两种方法预测评分的一致性:(A)一组此前研究的20个统计图像属性,或(B)为物体识别开发的卷积神经网络的各层。研究结果显示,不同数据集的审美评分可预测性存在显著差异。但包含照片或绘画作品的数据集之间呈现出一致的相似性,表明这两种图像类型在审美评估中存在不同的相关特征。令人意外的是,统计图像属性与卷积神经网络在预测审美评分时展现出相近的准确率,揭示出两种方法捕获的图像信息存在显著重叠。然而,数据集间的差异性对基于单一数据集所得研究结论的普适性提出质疑。本研究强调了在实验与计算美学领域,应当考虑多数据集以提升研究成果的有效性与泛化能力。