One-shot image generation (OSG) with generative adversarial networks that learn from the internal patches of a given image has attracted world wide attention. In recent studies, scholars have primarily focused on extracting features of images from probabilistically distributed inputs with pure convolutional neural networks (CNNs). However, it is quite difficult for CNNs with limited receptive domain to extract and maintain the global structural information. Therefore, in this paper, we propose a novel structure-preserved method TcGAN with individual vision transformer to overcome the shortcomings of the existing one-shot image generation methods. Specifically, TcGAN preserves global structure of an image during training to be compatible with local details while maintaining the integrity of semantic-aware information by exploiting the powerful long-range dependencies modeling capability of the transformer. We also propose a new scaling formula having scale-invariance during the calculation period, which effectively improves the generated image quality of the OSG model on image super-resolution tasks. We present the design of the TcGAN converter framework, comprehensive experimental as well as ablation studies demonstrating the ability of TcGAN to achieve arbitrary image generation with the fastest running time. Lastly, TcGAN achieves the most excellent performance in terms of applying it to other image processing tasks, e.g., super-resolution as well as image harmonization, the results further prove its superiority.
翻译:单样本图像生成(OSG)通过生成对抗网络从给定图像的内部补丁中学习,已引起全球关注。在近期研究中,学者们主要关注利用纯卷积神经网络(CNNs)从概率分布输入中提取图像特征。然而,具有有限感受域的CNNs难以提取并保持全局结构信息。因此,本文提出一种新颖的结构保持方法TcGAN,结合独立视觉Transformer,以克服现有单样本图像生成方法的不足。具体而言,TcGAN在训练过程中保持图像的全局结构以兼容局部细节,同时通过利用Transformer强大的长距离依赖建模能力维护语义感知信息的完整性。我们还提出一种在计算周期内具有尺度不变性的新缩放公式,有效提升了OSG模型在图像超分辨率任务中的生成质量。本文介绍了TcGAN转换器框架的设计、全面的实验及消融研究,证明了TcGAN以最快运行时间实现任意图像生成的能力。最后,TcGAN在应用于其他图像处理任务(如超分辨率与图像协调)时展现出最优性能,其结果进一步证明了其优越性。