Classification of AI-manipulated content is receiving great attention, for distinguishing different types of manipulations. Most of the methods developed so far fail in the open-set scenario, that is when the algorithm used for the manipulation is not represented by the training set. In this paper, we focus on the classification of synthetic face generation and manipulation in open-set scenarios, and propose a method for classification with a rejection option. The proposed method combines the use of Vision Transformers (ViT) with a hybrid approach for simultaneous classification and localization. Feature map correlation is exploited by the ViT module, while a localization branch is employed as an attention mechanism to force the model to learn per-class discriminative features associated with the forgery when the manipulation is performed locally in the image. Rejection is performed by considering several strategies and analyzing the model output layers. The effectiveness of the proposed method is assessed for the task of classification of facial attribute editing and GAN attribution.
翻译:AI生成内容的分类因需区分不同操作类型而备受关注。目前已有方法在开集场景下失效,即当训练集中未包含操作所用算法时。本文聚焦于开集场景下合成人脸生成与操作的分类问题,提出一种带拒绝选项的分类方法。该方法将Vision Transformer(ViT)与混合架构相结合,实现同步分类与定位。ViT模块利用特征图相关性,而定位分支作为注意力机制,迫使模型在图像局部操作时学习与伪造相关的类别判别性特征。拒绝策略通过分析模型输出层及多种策略实现。本文方法在面部属性编辑分类与GAN归属任务中的有效性得到了评估。