SRTransGAN: Image Super-Resolution using Transformer based Generative Adversarial Network

Image super-resolution aims to synthesize high-resolution image from a low-resolution image. It is an active area to overcome the resolution limitations in several applications like low-resolution object-recognition, medical image enhancement, etc. The generative adversarial network (GAN) based methods have been the state-of-the-art for image super-resolution by utilizing the convolutional neural networks (CNNs) based generator and discriminator networks. However, the CNNs are not able to exploit the global information very effectively in contrast to the transformers, which are the recent breakthrough in deep learning by exploiting the self-attention mechanism. Motivated from the success of transformers in language and vision applications, we propose a SRTransGAN for image super-resolution using transformer based GAN. Specifically, we propose a novel transformer-based encoder-decoder network as a generator to generate 2x images and 4x images. We design the discriminator network using vision transformer which uses the image as sequence of patches and hence useful for binary classification between synthesized and real high-resolution images. The proposed SRTransGAN outperforms the existing methods by 4.38 % on an average of PSNR and SSIM scores. We also analyze the saliency map to understand the learning ability of the proposed method.

翻译：图像超分辨率旨在从低分辨率图像合成高分辨率图像，是克服低分辨率目标识别、医学图像增强等应用中分辨率限制的重要研究领域。基于生成对抗网络（GAN）的方法通过卷积神经网络（CNN）构建生成器与判别器网络，已成为图像超分辨率的当前最优技术。然而，与近期通过自注意力机制在深度学习领域取得突破的Transformer相比，CNN在全局信息提取方面存在局限。受Transformer在语言和视觉任务中成功应用的启发，我们提出SRTransGAN——一种基于Transformer的生成对抗网络实现图像超分辨率。具体而言，我们创新性地设计基于Transformer的编码器-解码器网络作为生成器，用于生成2倍和4倍超分辨率图像；同时采用视觉Transformer构建判别器网络，将图像视为序列化图像块，从而有效区分合成高分辨率图像与真实高分辨率图像。实验表明，所提出的SRTransGAN在PSNR和SSIM指标上平均超越现有方法4.38%。此外，通过显著性图谱分析验证了该方法的学习能力。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日