In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.
翻译:近年来,最初为语言任务开发的Transformer模型已成功应用于视觉领域。研究表明,视觉Transformer在图像分类、目标检测和语义分割等广泛任务中达到了最先进水平。尽管已有大量研究证明卷积神经网络在艺术品归属和鉴定任务中展现出良好效果,但本文旨在探讨视觉Transformer的优越性是否可延伸至艺术品鉴定领域,从而提升基于计算机的艺术品真伪鉴定的可靠性。通过精心构建的文森特·梵高真迹数据集及两个对比数据集,我们比较了Swin Transformer与EfficientNet在艺术品鉴定中的表现。在使用包含仿作和近似作品(与梵高风格相近的画家作品)的标准对比集时,EfficientNet总体性能最优;而当对比集仅包含仿作时,Swin Transformer以超过85%的鉴定准确率优于EfficientNet。这些结果表明,视觉Transformer是艺术品鉴定领域极具潜力的有力竞争者,特别是在提升计算机识别艺术仿作能力方面表现突出。