Following their success in visual recognition tasks, Vision Transformers(ViTs) are being increasingly employed for image restoration. As a few recent works claim that ViTs for image classification also have better robustness properties, we investigate whether the improved adversarial robustness of ViTs extends to image restoration. We consider the recently proposed Restormer model, as well as NAFNet and the "Baseline network" which are both simplified versions of a Restormer. We use Projected Gradient Descent (PGD) and CosPGD, a recently proposed adversarial attack tailored to pixel-wise prediction tasks for our robustness evaluation. Our experiments are performed on real-world images from the GoPro dataset for image deblurring. Our analysis indicates that contrary to as advocated by ViTs in image classification works, these models are highly susceptible to adversarial attacks. We attempt to improve their robustness through adversarial training. While this yields a significant increase in robustness for Restormer, results on other networks are less promising. Interestingly, the design choices in NAFNet and Baselines, which were based on iid performance, and not on robust generalization, seem to be at odds with the model robustness. Thus, we investigate this further and find a fix.
翻译:继Vision Transformer(ViT)在视觉识别任务中取得巨大成功后,此类模型正越来越多地被应用于图像修复领域。鉴于近期有研究声称应用于图像分类的ViT同样具备更优的鲁棒性,我们探究此类改进的对抗鲁棒性是否能延伸至图像修复任务。我们选取了近期提出的Restormer模型,及其两个简化版本NAFNet与"基线网络"作为研究对象。在鲁棒性评估中,我们采用投影梯度下降(PGD)与专为逐像素预测任务设计的CosPGD攻击方法。实验基于GoPro数据集的真实世界图像进行去模糊处理。分析结果表明,与图像分类ViT研究中的结论相反,这类模型极易受到对抗攻击。我们尝试通过对抗训练提升模型鲁棒性——该方法虽使Restormer的鲁棒性显著提升,但对其他网络的效果却不尽如人意。值得注意的是,NAFNet与基线网络的设计选择(基于独立同分布性能而非鲁棒泛化性)似乎与模型鲁棒性存在冲突。为此我们展开进一步研究,最终找到了解决方案。