Recent deep-learning-based single image super-resolution (SISR) methods have shown impressive performance whereas typical methods train their networks by minimizing the pixel-wise distance with respect to a given high-resolution (HR) image. However, despite the basic training scheme being the predominant choice, its use in the context of ill-posed inverse problems has not been thoroughly investigated. In this work, we aim to provide a better comprehension of the underlying constituent by decomposing target HR images into two subcomponents: (1) the optimal centroid which is the expectation over multiple potential HR images, and (2) the inherent noise defined as the residual between the HR image and the centroid. Our findings show that the current training scheme cannot capture the ill-posed nature of SISR and becomes vulnerable to the inherent noise term, especially during early training steps. To tackle this issue, we propose a novel optimization method that can effectively remove the inherent noise term in the early steps of vanilla training by estimating the optimal centroid and directly optimizing toward the estimation. Experimental results show that the proposed method can effectively enhance the stability of vanilla training, leading to overall performance gain. Codes are available at github.com/2minkyulee/ECO.
翻译:近年来,基于深度学习的单图像超分辨率方法展现出卓越性能,但典型训练方案通过最小化与给定高分辨率图像的像素级距离来优化网络。然而,尽管这种基础训练方案是主流选择,其在病态逆问题背景下的应用尚未得到深入探究。本研究通过将目标高分辨率图像分解为两个子成分,旨在深化对潜在构成要素的理解:(1)最优质心——多个潜在高分辨率图像的期望值;(2)固有噪声——高分辨率图像与质心之间的残差。研究发现,当前训练方案无法捕捉单图像超分辨率的病态本质,且在训练早期阶段易受固有噪声项影响。为解决此问题,我们提出了一种新型优化方法,通过估计最优质心并直接对该估计值进行优化,有效消除原始训练早期阶段的固有噪声项。实验结果表明,所提方法能显著增强原始训练的稳定性,进而实现整体性能提升。代码发布于github.com/2minkyulee/ECO。