In recent years, there has been a significant focus on research related to text-guided image inpainting. However, the task remains challenging due to several constraints, such as ensuring alignment between the image and the text, and maintaining consistency in distribution between corrupted and uncorrupted regions. In this paper, thus, we propose a dual affine transformation generative adversarial network (DAFT-GAN) to maintain the semantic consistency for text-guided inpainting. DAFT-GAN integrates two affine transformation networks to combine text and image features gradually for each decoding block. Moreover, we minimize information leakage of uncorrupted features for fine-grained image generation by encoding corrupted and uncorrupted regions of the masked image separately. Our proposed model outperforms the existing GAN-based models in both qualitative and quantitative assessments with three benchmark datasets (MS-COCO, CUB, and Oxford) for text-guided image inpainting.
翻译:近年来,文本引导图像修复相关研究受到广泛关注。然而,由于需确保图像与文本之间的对齐性、保持受损区域与未受损区域在分布上的一致性等多项约束,该任务仍具挑战性。为此,本文提出一种双仿射变换生成对抗网络(DAFT-GAN),以保持文本引导修复过程中的语义一致性。DAFT-GAN 集成两个仿射变换网络,在每个解码块中逐步融合文本与图像特征。此外,通过对掩码图像的受损与未受损区域分别编码,我们最小化未受损特征的信息泄漏,从而实现细粒度图像生成。在三个文本引导图像修复基准数据集(MS-COCO、CUB 和 Oxford)上,我们提出的模型在定性与定量评估中均优于现有的基于 GAN 的模型。