Guided Image Restoration via Simultaneous Feature and Image Guided Fusion

Guided image restoration (GIR), such as guided depth map super-resolution and pan-sharpening, aims to enhance a target image using guidance information from another image of the same scene. Currently, joint image filtering-inspired deep learning-based methods represent the state-of-the-art for GIR tasks. Those methods either deal with GIR in an end-to-end way by elaborately designing filtering-oriented deep neural network (DNN) modules, focusing on the feature-level fusion of inputs; or explicitly making use of the traditional joint filtering mechanism by parameterizing filtering coefficients with DNNs, working on image-level fusion. The former ones are good at recovering contextual information but tend to lose fine-grained details, while the latter ones can better retain textual information but might lead to content distortions. In this work, to inherit the advantages of both methodologies while mitigating their limitations, we proposed a Simultaneous Feature and Image Guided Fusion (SFIGF) network, that simultaneously considers feature and image-level guided fusion following the guided filter (GF) mechanism. In the feature domain, we connect the cross-attention (CA) with GF, and propose a GF-inspired CA module for better feature-level fusion; in the image domain, we fully explore the GF mechanism and design GF-like structure for better image-level fusion. Since guided fusion is implemented in both feature and image domains, the proposed SFIGF is expected to faithfully reconstruct both contextual and textual information from sources and thus lead to better GIR results. We apply SFIGF to 4 typical GIR tasks, and experimental results on these tasks demonstrate its effectiveness and general availability.

翻译：引导图像修复（Guided Image Restoration, GIR），例如引导深度图超分辨率和全色锐化，旨在利用同一场景中另一幅图像的引导信息来增强目标图像。目前，基于联合图像滤波的深度学习方法是GIR任务的最先进技术。这些方法要么通过精心设计面向滤波的深度神经网络（DNN）模块，以端到端方式处理GIR，专注于输入的特征级融合；要么通过利用DNN参数化滤波系数，显式使用传统联合滤波机制，在图像级融合上工作。前者擅长恢复上下文信息，但容易丢失细粒度细节，而后者能更好地保留纹理信息，但可能导致内容失真。本文中，为继承两种方法的优点并缓解其局限性，提出了一种同时进行特征与图像引导融合（Simultaneous Feature and Image Guided Fusion, SFIGF）网络，该网络遵循引导滤波（GF）机制，同时考虑特征级和图像级引导融合。在特征域中，将交叉注意力（CA）与GF结合，提出一种基于GF的CA模块以实现更好的特征级融合；在图像域中，充分探索GF机制，设计类似GF的结构以实现更好的图像级融合。由于引导融合在特征和图像域中均实现，所提出的SFIGF有望从源图像中忠实地重建上下文和纹理信息，从而获得更优的GIR结果。我们将SFIGF应用于4种典型GIR任务，实验结果表明了其有效性和通用性。