Scene Text Editing (STE) aims to substitute text in an image with new desired text while preserving the background and styles of the original text. However, present techniques present a notable challenge in the generation of edited text images that exhibit a high degree of clarity and legibility. This challenge primarily stems from the inherent diversity found within various text types and the intricate textures of complex backgrounds. To address this challenge, this paper introduces a three-stage framework for transferring texts across text images. Initially, we introduce a text-swapping network that seamlessly substitutes the original text with the desired replacement. Subsequently, we incorporate a background inpainting network into our framework. This specialized network is designed to skillfully reconstruct background images, effectively addressing the voids left after the removal of the original text. This process meticulously preserves visual harmony and coherence in the background. Ultimately, the synthesis of outcomes from the text-swapping network and the background inpainting network is achieved through a fusion network, culminating in the creation of the meticulously edited final image. A demo video is included in the supplementary material.
翻译:场景文本编辑(STE)旨在用新的目标文本替换图像中的原有文字,同时保留背景与原文本风格。然而,现有技术在生成高清晰度与可读性的编辑文本图像方面仍面临显著挑战,这一挑战主要源于文本类型的固有多样性以及复杂背景的精细纹理。为解决该问题,本文提出了一种三阶段框架,用于实现跨文本图像的文本迁移。首先,我们引入了一个文本交换网络,可无缝地将原始文本替换为目标文本。随后,我们将背景修复网络集成到框架中,该专用网络能够精细地重构背景图像,有效填补原始文本移除后留下的空缺区域,从而严格保持背景的视觉和谐与连贯性。最终,通过融合网络整合文本交换网络与背景修复网络的输出结果,生成经过精细编辑的最终图像。补充材料中提供了演示视频。