Generative Adversarial Networks (GANs) have been widely used to recover vivid textures in image super-resolution (SR) tasks. In particular, one discriminator is utilized to enable the SR network to learn the distribution of real-world high-quality images in an adversarial training manner. However, the distribution learning is overly coarse-grained, which is susceptible to virtual textures and causes counter-intuitive generation results. To mitigate this, we propose the simple and effective Semantic-aware Discriminator (denoted as SeD), which encourages the SR network to learn the fine-grained distributions by introducing the semantics of images as a condition. Concretely, we aim to excavate the semantics of images from a well-trained semantic extractor. Under different semantics, the discriminator is able to distinguish the real-fake images individually and adaptively, which guides the SR network to learn the more fine-grained semantic-aware textures. To obtain accurate and abundant semantics, we take full advantage of recently popular pretrained vision models (PVMs) with extensive datasets, and then incorporate its semantic features into the discriminator through a well-designed spatial cross-attention module. In this way, our proposed semantic-aware discriminator empowered the SR network to produce more photo-realistic and pleasing images. Extensive experiments on two typical tasks, i.e., SR and Real SR have demonstrated the effectiveness of our proposed methods.
翻译:生成对抗网络(GANs)已被广泛应用于图像超分辨率(SR)任务中以恢复逼真纹理。具体而言,判别器通过对抗训练方式使SR网络学习真实高质图像的分布。然而,这种分布学习过于粗粒度,易产生虚假纹理并导致反直觉的生成结果。为解决此问题,我们提出简洁有效的语义感知判别器(SeD),通过引入图像语义作为条件,促使SR网络学习细粒度分布。具体地,我们从预训练的语义提取器中挖掘图像语义信息。在不同语义条件下,判别器能够独立且自适应地区分真实与伪图像,从而引导SR网络学习更精细的语义感知纹理。为获取准确且丰富的语义信息,我们充分利用近期流行的基于大规模数据集预训练的视觉模型(PVMs),并通过精心设计的空间交叉注意力模块将其语义特征融入判别器。通过这种方式,所提出的语义感知判别器使SR网络能够生成更逼真、更令人满意的图像。在超分辨率(SR)与真实超分辨率(Real SR)两项典型任务上的大量实验证明了所提方法的有效性。