Scene Text Editing (STE) is a challenging research problem, and it aims to modify existing texts in an image while preserving the background and the font style of the original text of the image. Due to its various real-life applications, researchers have explored several approaches toward STE in recent years. However, most of the existing STE methods show inferior editing performance because of (1) complex image backgrounds, (2) various font styles, and (3) varying word lengths within the text. To address such inferior editing performance issues, in this paper, we propose a novel font-agnostic scene text editing framework, named FAST, for simultaneously generating text in arbitrary styles and locations while preserving a natural and realistic appearance through combined mask generation and style transfer. The proposed approach differs from the existing methods as they directly modify all image pixels. Instead, the proposed method has introduced a filtering mechanism to remove background distractions, allowing the network to focus solely on the text regions where editing is required. Additionally, a text-style transfer module has been designed to mitigate the challenges posed by varying word lengths. Extensive experiments and ablations have been conducted, and the results demonstrate that the proposed method outperforms the existing methods both qualitatively and quantitatively.
翻译:场景文本编辑(Scene Text Editing, STE)是一个具有挑战性的研究问题,其目标是在保留图像背景和原始文本字体风格的同时,修改图像中的现有文本。由于其在现实生活中的广泛应用,近年来研究者们探索了多种STE方法。然而,现有的大部分STE方法由于(1)复杂的图像背景、(2)多样的字体风格以及(3)文本中可变的单词长度,编辑性能较差。为了解决这些编辑性能不佳的问题,本文提出了一种新颖的字体无关场景文本编辑框架FAST,该框架通过结合掩码生成和风格迁移,能够同时生成任意风格和位置的文本,同时保持自然逼真的外观。所提出的方法与现有方法不同,现有方法直接修改所有图像像素,而本文方法引入了一种过滤机制来去除背景干扰,使网络仅聚焦于需要编辑的文本区域。此外,我们设计了一个文本风格迁移模块,以减轻可变单词长度带来的挑战。我们进行了大量实验和消融研究,结果表明,所提出的方法在定性和定量上均优于现有方法。