Image inpainting for completing complicated semantic environments and diverse hole patterns of corrupted images is challenging even for state-of-the-art learning-based inpainting methods trained on large-scale data. A reference image capturing the same scene of a corrupted image offers informative guidance for completing the corrupted image as it shares similar texture and structure priors to that of the holes of the corrupted image. In this work, we propose a transformer-based encoder-decoder network, named TransRef, for reference-guided image inpainting. Specifically, the guidance is conducted progressively through a reference embedding procedure, in which the referencing features are subsequently aligned and fused with the features of the corrupted image. For precise utilization of the reference features for guidance, a reference-patch alignment (Ref-PA) module is proposed to align the patch features of the reference and corrupted images and harmonize their style differences, while a reference-patch transformer (Ref-PT) module is proposed to refine the embedded reference feature. Moreover, to facilitate the research of reference-guided image restoration tasks, we construct a publicly accessible benchmark dataset containing 50K pairs of input and reference images. Both quantitative and qualitative evaluations demonstrate the efficacy of the reference information and the proposed method over the state-of-the-art methods in completing complex holes. Code and dataset can be accessed at https://github.com/Cameltr/TransRef.
翻译:针对复杂语义环境和多样化损坏图像孔洞模式的图像修复任务,即使是最先进的基于大规模数据训练的学习型修复方法也面临挑战。捕捉受损图像同一场景的参考图像,因其与损坏图像孔洞区域共享相似的纹理和结构先验信息,可为修复受损图像提供有效指引。本文提出一种基于Transformer的编码器-解码器网络TransRef,用于参考引导的图像修复。具体而言,通过参考嵌入过程逐步实现引导,其中参考特征与受损图像特征逐次对齐并融合。为精准利用参考特征进行引导,提出参考补丁对齐(Ref-PA)模块,用于对齐参考图像与受损图像的补丁特征并协调其风格差异;同时提出参考补丁Transformer(Ref-PT)模块,用于精炼嵌入的参考特征。此外,为促进参考引导图像修复任务的研究,我们构建了包含5万对输入图像与参考图像的公开基准数据集。定量与定性评估均表明,参考信息及所提方法在修复复杂孔洞方面优于现有方法。代码与数据集可从https://github.com/Cameltr/TransRef获取。