MaGIC: Multi-modality Guided Image Completion

Vanilla image completion approaches exhibit sensitivity to large missing regions, attributed to the limited availability of reference information for plausible generation. To mitigate this, existing methods incorporate the extra cue as a guidance for image completion. Despite improvements, these approaches are often restricted to employing a single modality (e.g., segmentation or sketch maps), which lacks scalability in leveraging multi-modality for more plausible completion. In this paper, we propose a novel, simple yet effective method for Multi-modal Guided Image Completion, dubbed MaGIC, which not only supports a wide range of single modality as the guidance (e.g., text, canny edge, sketch, segmentation, depth, and pose), but also adapts to arbitrarily customized combination of these modalities (i.e., arbitrary multi-modality) for image completion. For building MaGIC, we first introduce a modality-specific conditional U-Net (MCU-Net) that injects single-modal signal into a U-Net denoiser for single-modal guided image completion. Then, we devise a consistent modality blending (CMB) method to leverage modality signals encoded in multiple learned MCU-Nets through gradient guidance in latent space. Our CMB is training-free, thereby avoids the cumbersome joint re-training of different modalities, which is the secret of MaGIC to achieve exceptional flexibility in accommodating new modalities for completion. Experiments show the superiority of MaGIC over state-of-the-art methods and its generalization to various completion tasks. Our project with code and models is available at yeates.github.io/MaGIC-Page/.

翻译：传统的图像补全方法在面对大面积缺失区域时表现敏感，这是由于用于合理生成的可参考信息有限。为缓解这一问题，现有方法引入额外线索作为图像补全的引导。尽管有所改进，这些方法通常局限于使用单一模态（例如分割图或草图），缺乏利用多模态实现更合理补全的可扩展性。本文提出一种新颖、简洁且有效的多模态引导图像补全方法，名为MaGIC，它不仅支持广泛的单一模态作为引导（例如文本、Canny边缘、草图、分割、深度和姿态），还能适应这些模态的任意定制组合（即任意多模态）进行图像补全。为构建MaGIC，我们首先引入一种模态特定条件U-Net（MCU-Net），将单模态信号注入U-Net去噪器，实现单模态引导的图像补全。然后，我们设计了一致性模态混合方法，通过潜在空间中的梯度引导，利用多个已学习MCU-Net中编码的模态信号。我们的CMB无需训练，因此避免了不同模态间繁琐的联合再训练，这是MaGIC在容纳新模态进行补全时实现卓越灵活性的关键。实验表明，MaGIC在性能上优于最先进方法，并能泛化到多种补全任务。我们的项目含代码和模型，可访问yeates.github.io/MaGIC-Page/。