Textual information in a captured scene plays an important role in scene interpretation and decision making. Though there exist methods that can successfully detect and interpret complex text regions present in a scene, to the best of our knowledge, there is no significant prior work that aims to modify the textual information in an image. The ability to edit text directly on images has several advantages including error correction, text restoration and image reusability. In this paper, we propose a method to modify text in an image at character-level. We approach the problem in two stages. At first, the unobserved character (target) is generated from an observed character (source) being modified. We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color. Next, we replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters. Our method works as a unified platform for modifying text in images. We present the effectiveness of our method on COCO-Text and ICDAR datasets both qualitatively and quantitatively.
翻译:场景中的文本信息在场景理解与决策中起着重要作用。尽管已有方法能够成功检测并解释场景中存在的复杂文本区域,但据我们所知,目前尚无旨在修改图像中文本信息的显著前期工作。直接在图像上编辑文本的能力具有多种优势,包括错误纠正、文本恢复和图像复用。本文提出了一种在字符级别修改图像中文本的方法。我们分两个阶段处理该问题:首先,通过修改观察到的字符(源字符)生成未观测到的字符(目标字符)。为此,我们提出了两种不同的神经网络架构——(a)FANnet,用于保持与源字体在结构上的一致性;(b)Colornet,用于保留源文本颜色。其次,我们用生成的字符替换源字符,同时保持与相邻字符在几何和视觉上的一致性。我们的方法作为统一平台,可用于修改图像中的文本。我们通过COCO-Text和ICDAR数据集,从定性和定量两方面验证了该方法的有效性。