Image inpainting aims to fill the missing hole of the input. It is hard to solve this task efficiently when facing high-resolution images due to two reasons: (1) Large reception field needs to be handled for high-resolution image inpainting. (2) The general encoder and decoder network synthesizes many background pixels synchronously due to the form of the image matrix. In this paper, we try to break the above limitations for the first time thanks to the recent development of continuous implicit representation. In detail, we down-sample and encode the degraded image to produce the spatial-adaptive parameters for each spatial patch via an attentional Fast Fourier Convolution(FFC)-based parameter generation network. Then, we take these parameters as the weights and biases of a series of multi-layer perceptron(MLP), where the input is the encoded continuous coordinates and the output is the synthesized color value. Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing. Then, the continuous position encoding will be helpful to synthesize the photo-realistic high-frequency textures by re-sampling the coordinate in a higher resolution. Also, our framework enables us to query the coordinates of missing pixels only in parallel, yielding a more efficient solution than the previous methods. Experiments show that the proposed method achieves real-time performance on the 2048$\times$2048 images using a single GTX 2080 Ti GPU and can handle 4096$\times$4096 images, with much better performance than existing state-of-the-art methods visually and numerically. The code is available at: https://github.com/NiFangBaAGe/CoordFill.
翻译:图像修复旨在填补输入图像中的缺失区域。面对高分辨率图像时,由于以下两个原因,高效解决该任务具有挑战性:(1)高分辨率图像修复需要处理大感受野;(2)由于图像矩阵的形式,通用编码器-解码器网络会同步合成许多背景像素。本文首次尝试利用连续隐式表示的最新发展突破上述限制。具体而言,我们通过基于注意力快速傅里叶卷积(FFC)的参数生成网络,对退化图像进行下采样和编码,为每个空间块生成空间自适应参数。随后,将这些参数作为一系列多层感知器(MLP)的权重和偏置,其输入为编码后的连续坐标,输出为合成的颜色值。得益于所提结构,我们仅以较低分辨率对高分辨率图像进行编码以捕获更大感受野,继而通过在高分辨率下对坐标进行重采样,连续位置编码有助于合成逼真的高频纹理。此外,我们的框架支持仅并行查询缺失像素坐标,比现有方法提供更高效的解决方案。实验表明,该方法在单张GTX 2080 Ti GPU上可实现2048×2048图像的实时处理,并能处理4096×4096图像,在视觉质量和数值指标上均显著优于现有最先进方法。代码开源地址:https://github.com/NiFangBaAGe/CoordFill。