Most of the existing blind image Super-Resolution (SR) methods assume that the blur kernels are space-invariant. However, the blur involved in real applications are usually space-variant due to object motion, out-of-focus, etc., resulting in severe performance drop of the advanced SR methods. To address this problem, we firstly introduce two new datasets with out-of-focus blur, i.e., NYUv2-BSR and Cityscapes-BSR, to support further researches of blind SR with space-variant blur. Based on the datasets, we design a novel Cross-MOdal fuSion network (CMOS) that estimate both blur and semantics simultaneously, which leads to improved SR results. It involves a feature Grouping Interactive Attention (GIA) module to make the two modalities interact more effectively and avoid inconsistency. GIA can also be used for the interaction of other features because of the universality of its structure. Qualitative and quantitative experiments compared with state-of-the-art methods on above datasets and real-world images demonstrate the superiority of our method, e.g., obtaining PSNR/SSIM by +1.91/+0.0048 on NYUv2-BSR than MANet.
翻译:现有大多数盲图像超分辨率(SR)方法均假设模糊核是空间不变的。然而,实际应用中的模糊常因物体运动、失焦等因素呈现空间变异性,导致先进SR方法性能严重下降。针对该问题,我们首先引入两个包含失焦模糊的新数据集NYUv2-BSR和Cityscapes-BSR,以支持空间变分模糊盲SR的进一步研究。基于这些数据集,我们设计了一种新颖的跨模态融合网络(CMOS),该网络可同时估计模糊与语义信息,从而提升超分辨率结果。该网络包含特征分组交互注意力(GIA)模块,使两种模态更有效交互并避免不一致性。由于GIA结构的通用性,其还可用于其他特征的交互。在上述数据集和真实图像上与现有最优方法进行的定性与定量实验表明,本方法具有优越性,例如在NYUv2-BSR数据集上,相比MANet,PSNR/SSIM分别提升+1.91/+0.0048。