There is a prevalent opinion in the recent literature that Diffusion-based models outperform GAN-based counterparts on the Image Super Resolution (ISR) problem. However, in most studies, Diffusion-based ISR models were trained longer and utilized larger networks than the GAN baselines. This raises the question of whether the superiority of Diffusion models is due to the Diffusion paradigm being better suited for the ISR task or if it is a consequence of the increased scale and computational resources used in contemporary studies. In our work, we compare Diffusion-based and GAN-based Super Resolution under controlled settings, where both approaches are matched in terms of architecture, model and dataset size, and computational budget. We show that a GAN-based model can achieve results comparable to a Diffusion-based model. Additionally, we explore the impact of design choices such as text conditioning and augmentation on the performance of ISR models, showcasing their effect on several downstream tasks. We will release the inference code and weights of our scaled GAN.
翻译:近期文献中存在一种普遍观点,认为基于扩散的模型在图像超分辨率任务上表现优于基于生成对抗网络的模型。然而,在大多数研究中,基于扩散的图像超分辨率模型的训练时间更长,且使用的网络规模大于生成对抗网络基线模型。这引发了一个问题:扩散模型的优越性究竟是源于扩散范式本身更适用于图像超分辨率任务,还是当代研究中增加的模型规模与计算资源所导致的结果?在本研究中,我们在受控设置下比较了基于扩散与基于生成对抗网络的超分辨率方法,确保两种方法在架构、模型与数据集规模以及计算预算方面均保持一致。我们证明基于生成对抗网络的模型能够取得与基于扩散的模型相当的结果。此外,我们探究了文本条件化与数据增强等设计选择对图像超分辨率模型性能的影响,并展示了这些选择在多个下游任务中的作用。我们将公开所提出的规模化生成对抗网络的推理代码与权重。