Restoring reasonable and realistic content for arbitrary missing regions in images is an important yet challenging task. Although recent image inpainting models have made significant progress in generating vivid visual details, they can still lead to texture blurring or structural distortions due to contextual ambiguity when dealing with more complex scenes. To address this issue, we propose the Semantic Pyramid Network (SPN) motivated by the idea that learning multi-scale semantic priors from specific pretext tasks can greatly benefit the recovery of locally missing content in images. SPN consists of two components. First, it distills semantic priors from a pretext model into a multi-scale feature pyramid, achieving a consistent understanding of the global context and local structures. Within the prior learner, we present an optional module for variational inference to realize probabilistic image inpainting driven by various learned priors. The second component of SPN is a fully context-aware image generator, which adaptively and progressively refines low-level visual representations at multiple scales with the (stochastic) prior pyramid. We train the prior learner and the image generator as a unified model without any post-processing. Our approach achieves the state of the art on multiple datasets, including Places2, Paris StreetView, CelebA, and CelebA-HQ, under both deterministic and probabilistic inpainting setups.
翻译:恢复图像中任意缺失区域的合理且真实内容是一项重要但具有挑战性的任务。尽管最近的图像修复模型在生成生动视觉细节方面取得了显著进展,但在处理更复杂场景时,由于上下文模糊性,仍可能导致纹理模糊或结构失真。为解决这一问题,我们提出了语义金字塔网络(SPN),其核心思想是从特定前置任务中学习多尺度语义先验,可极大促进图像局部缺失内容的恢复。SPN包含两个组件。首先,它将前置模型中的语义先验蒸馏为多尺度特征金字塔,实现全局上下文与局部结构的一致理解。在先验学习器中,我们引入了一个可选模块用于变分推断,从而基于多种学习先验实现概率性图像修复。SPN的第二个组件是一个全上下文感知的图像生成器,该生成器利用(随机)先验金字塔自适应地逐步优化多尺度低层视觉表示。我们将先验学习器与图像生成器作为统一模型进行训练,无需任何后处理。在确定性和概率性修复设置下,我们的方法在多个数据集(包括Places2、Paris StreetView、CelebA和CelebA-HQ)上均达到了最先进的性能。