The notion of visual similarity is essential for computer vision, and in applications and studies revolving around vector embeddings of images. However, the scarcity of benchmark datasets poses a significant hurdle in exploring how these models perceive similarity. Here we introduce Style Aligned Artwork Datasets (SALADs), and an example of fruit-SALAD with 10,000 images of fruit depictions. This combined semantic category and style benchmark comprises 100 instances each of 10 easy-to-recognize fruit categories, across 10 easy distinguishable styles. Leveraging a systematic pipeline of generative image synthesis, this visually diverse yet balanced benchmark demonstrates salient differences in semantic category and style similarity weights across various computational models, including machine learning models, feature extraction algorithms, and complexity measures, as well as conceptual models for reference. This meticulously designed dataset offers a controlled and balanced platform for the comparative analysis of similarity perception. The SALAD framework allows the comparison of how these models perform semantic category and style recognition task to go beyond the level of anecdotal knowledge, making it robustly quantifiable and qualitatively interpretable.
翻译:视觉相似性的概念对于计算机视觉以及围绕图像向量嵌入的应用和研究至关重要。然而,基准数据集的稀缺性构成了探索这些模型如何感知相似性的重大障碍。本文介绍了风格对齐艺术作品数据集(SALADs),并以一个包含10,000幅水果描绘图像的fruit-SALAD为例。该结合了语义类别和风格的基准数据集包含10个易于识别的水果类别,每个类别100个实例,跨越10种易于区分的风格。通过利用生成式图像合成的系统化流程,这个视觉多样且平衡的基准数据集展示了包括机器学习模型、特征提取算法、复杂度度量以及作为参考的概念模型在内的各种计算模型在语义类别和风格相似性权重上的显著差异。这个精心设计的数据集为相似性感知的比较分析提供了一个受控且平衡的平台。SALAD框架使得比较这些模型在执行语义类别和风格识别任务时的表现能够超越轶事知识的层面,使其具有稳健的可量化性和定性的可解释性。