Classifying artworks presents a significant challenge due to the complex interplay of fine-grained details and abstract features that condition the style or genre of an artwork. This paper presents a systematic investigation of the effectiveness of supervised and self-supervised backbones as feature extractors for both artwork classification and retrieval, with a particular focus on paintings. We conduct an extensive experimental evaluation using the DINO family and CLIP models, assessing multiple classification strategies and feature representations. Our results demonstrate that employing a self-supervised backbone leads to consistent improvements in artwork classification performance. Moreover, our work provides insights into the applicability of classification and retrieval modules in real-world applications, such as virtual reality (VR) applications that support museum navigation.
翻译:艺术品分类因其细粒度细节与抽象特征(这些特征决定了艺术品的风格或流派)之间复杂的相互作用而面临重大挑战。本文系统研究了监督式和自监督式骨干网络作为特征提取器在艺术品分类与检索中的有效性,特别聚焦于绘画作品。我们利用DINO系列模型和CLIP模型开展了广泛的实验评估,考察了多种分类策略和特征表示。结果表明,采用自监督骨干网络能够在艺术品分类性能上实现持续提升。此外,我们的研究还为分类与检索模块在现实应用(例如支持博物馆导航的虚拟现实应用)中的适用性提供了见解。