Textual information in a captured scene plays an important role in scene interpretation and decision making. Though there exist methods that can successfully detect and interpret complex text regions present in a scene, to the best of our knowledge, there is no significant prior work that aims to modify the textual information in an image. The ability to edit text directly on images has several advantages including error correction, text restoration and image reusability. In this paper, we propose a method to modify text in an image at character-level. We approach the problem in two stages. At first, the unobserved character (target) is generated from an observed character (source) being modified. We propose two different neural network architectures - (a) FANnet to achieve structural consistency with source font and (b) Colornet to preserve source color. Next, we replace the source character with the generated character maintaining both geometric and visual consistency with neighboring characters. Our method works as a unified platform for modifying text in images. We present the effectiveness of our method on COCO-Text and ICDAR datasets both qualitatively and quantitatively.
翻译:捕获场景中的文本信息在场景理解和决策制定中起着重要作用。尽管现有方法能够成功检测并解释场景中存在的复杂文本区域,但据我们所知,目前尚无旨在修改图像中文本信息的显著先前工作。直接在图像上编辑文本的能力具有多重优势,包括错误校正、文本修复和图像可重用性。本文提出一种在字符级别修改图像文本的方法。我们通过两个阶段处理该问题:首先,从被修改的观测字符(源字符)生成未观测字符(目标字符)。我们提出两种不同的神经网络架构——(a) 用于实现与源字体结构一致性的FANnet,以及(b) 用于保持源颜色一致性的Colornet。随后,我们将源字符替换为生成的字符,同时保持与相邻字符的几何和视觉一致性。本方法可作为修改图像文本的统一平台。我们通过定性与定量实验,在COCO-Text和ICDAR数据集上验证了本方法的有效性。