Structure-guided image completion aims to inpaint a local region of an image according to an input guidance map from users. While such a task enables many practical applications for interactive editing, existing methods often struggle to hallucinate realistic object instances in complex natural scenes. Such a limitation is partially due to the lack of semantic-level constraints inside the hole region as well as the lack of a mechanism to enforce realistic object generation. In this work, we propose a learning paradigm that consists of semantic discriminators and object-level discriminators for improving the generation of complex semantics and objects. Specifically, the semantic discriminators leverage pretrained visual features to improve the realism of the generated visual concepts. Moreover, the object-level discriminators take aligned instances as inputs to enforce the realism of individual objects. Our proposed scheme significantly improves the generation quality and achieves state-of-the-art results on various tasks, including segmentation-guided completion, edge-guided manipulation and panoptically-guided manipulation on Places2 datasets. Furthermore, our trained model is flexible and can support multiple editing use cases, such as object insertion, replacement, removal and standard inpainting. In particular, our trained model combined with a novel automatic image completion pipeline achieves state-of-the-art results on the standard inpainting task.
翻译:结构引导的图像补全旨在根据用户输入的引导图,对图像局部区域进行修复。尽管该任务支持多种交互式编辑的实际应用,现有方法在复杂自然场景中生成逼真物体实例时仍面临挑战。这一局限性部分源于空洞区域内缺乏语义级约束,以及缺少强制生成真实物体的机制。本研究提出一种结合语义判别器与物体级判别器的学习范式,旨在提升复杂语义与物体的生成质量。具体而言,语义判别器利用预训练视觉特征增强生成视觉概念的逼真度;物体级判别器则以对齐后的实例为输入,确保单个物体的真实性。本方案显著提升了生成质量,并在多种任务中取得最先进成果,包括基于分割引导的补全、边缘引导编辑及全景引导编辑(在Places2数据集上)。此外,训练后的模型具备灵活性,可支持物体插入、替换、移除及标准修复等多种编辑场景。特别是,结合新型自动图像补全流水线后,本模型在标准修复任务中达到了最先进水平。