In recent years, Transformers, initially developed for language, have been successfully applied to visual tasks. Vision Transformers have been shown to push the state-of-the-art in a wide range of tasks, including image classification, object detection, and semantic segmentation. While ample research has shown promising results in art attribution and art authentication tasks using Convolutional Neural Networks, this paper examines if the superiority of Vision Transformers extends to art authentication, improving, thus, the reliability of computer-based authentication of artworks. Using a carefully compiled dataset of authentic paintings by Vincent van Gogh and two contrast datasets, we compare the art authentication performances of Swin Transformers with those of EfficientNet. Using a standard contrast set containing imitations and proxies (works by painters with styles closely related to van Gogh), we find that EfficientNet achieves the best performance overall. With a contrast set that only consists of imitations, we find the Swin Transformer to be superior to EfficientNet by achieving an authentication accuracy of over 85%. These results lead us to conclude that Vision Transformers represent a strong and promising contender in art authentication, particularly in enhancing the computer-based ability to detect artistic imitations.
翻译:近年来,最初为语言任务开发的Transformer模型已被成功应用于视觉任务。视觉Transformer在图像分类、目标检测和语义分割等广泛任务中展现了前沿性能。尽管已有大量研究证明卷积神经网络在艺术品归属与鉴定任务中取得了显著成果,但本文旨在探究视觉Transformer的优越性能是否可延伸至艺术鉴定领域,从而提升计算机辅助艺术品鉴定的可靠性。通过精心构建的梵高真迹数据集及两个对比数据集,我们比较了Swin Transformer与EfficientNet在艺术鉴定任务中的表现。采用包含仿制品及风格逼近梵高的画家作品(代理作品)的标准对比集时,EfficientNet整体表现最优;而仅含仿制品的对比集测试中,Swin Transformer以超过85%的鉴定准确率优于EfficientNet。这些结果表明,视觉Transformer在艺术鉴定领域,特别是增强计算机识别艺术仿制品的能力方面,具有强劲且富有前景的应用潜力。