Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.
翻译:对于在大型数据集上训练的最先进的基于学习的修复方法而言,完成复杂语义环境和受损图像多样孔洞模式的图像修复仍然具有挑战性。捕获受损图像同一场景的参考图像为完成受损图像提供了信息丰富的指导,因为它与受损图像孔洞区域共享相似的纹理和结构先验。在本工作中,我们提出了一种基于Transformer的编码器-解码器网络,命名为TransRef,用于参考引导的图像修复。具体而言,引导过程通过参考嵌入程序逐步进行,其中参考特征随后与受损图像的特征进行对齐和融合。为了精确利用参考特征进行引导,我们提出了参考块对齐(Ref-PA)模块来对齐参考图像与受损图像的块特征并协调它们的风格差异,同时提出了参考块Transformer(Ref-PT)模块来细化嵌入的参考特征。此外,为了促进参考引导图像修复任务的研究,我们构建了一个包含5万对输入图像和参考图像的公开基准数据集。定量和定性评估均证明了参考信息及所提出方法在完成复杂孔洞方面相对于最先进方法的有效性。代码和数据集可通过 https://github.com/Cameltr/TransRef 访问。