Semantic image editing requires inpainting pixels following a semantic map. It is a challenging task since this inpainting requires both harmony with the context and strict compliance with the semantic maps. The majority of the previous methods proposed for this task try to encode the whole information from erased images. However, when an object is added to a scene such as a car, its style cannot be encoded from the context alone. On the other hand, the models that can output diverse generations struggle to output images that have seamless boundaries between the generated and unerased parts. Additionally, previous methods do not have a mechanism to encode the styles of visible and partially visible objects differently for better performance. In this work, we propose a framework that can encode visible and partially visible objects with a novel mechanism to achieve consistency in the style encoding and final generations. We extensively compare with previous conditional image generation and semantic image editing algorithms. Our extensive experiments show that our method significantly improves over the state-of-the-art. Our method not only achieves better quantitative results but also provides diverse results. Please refer to the project web page for the released code and demo: https://github.com/hakansivuk/DivSem.
翻译:语义图像编辑要求根据语义图进行像素修复。这一任务颇具挑战性,因为修复过程既要与上下文和谐一致,又需严格遵循语义图。以往针对该任务提出的多数方法试图编码擦除图像的完整信息。然而,当向场景中添加物体(如汽车)时,其风格无法仅从上下文编码获得。另一方面,能够生成多样化结果的模型难以输出生成区域与未擦除区域之间具有无缝边界的图像。此外,以往方法缺乏对可见与部分可见物体进行差异化风格编码的机制以实现更优性能。本研究提出一种框架,通过新颖机制实现对可见与部分可见物体的编码,从而在风格编码与最终生成结果中保持一致性。我们与先前的条件图像生成及语义图像编辑算法进行了广泛对比。大量实验表明,本方法显著优于现有技术。该方法不仅取得了更优的定量结果,还能提供多样化的生成结果。相关代码与演示请参阅项目网页:https://github.com/hakansivuk/DivSem。