Scene text removal (STR) is a challenging task due to the complex text fonts, colors, sizes, and background textures in scene images. However, most previous methods learn both text location and background inpainting implicitly within a single network, which weakens the text localization mechanism and makes a lossy background. To tackle these problems, we propose a simple Progressive Segmentation-guided Scene Text Removal Network(PSSTRNet) to remove the text in the image iteratively. It contains two decoder branches, a text segmentation branch, and a text removal branch, with a shared encoder. The text segmentation branch generates text mask maps as the guidance for the regional removal branch. In each iteration, the original image, previous text removal result, and text mask are input to the network to extract the rest part of the text segments and cleaner text removal result. To get a more accurate text mask map, an update module is developed to merge the mask map in the current and previous stages. The final text removal result is obtained by adaptive fusion of results from all previous stages. A sufficient number of experiments and ablation studies conducted on the real and synthetic public datasets demonstrate our proposed method achieves state-of-the-art performance. The source code of our work is available at: \href{https://github.com/GuangtaoLyu/PSSTRNet}{https://github.com/GuangtaoLyu/PSSTRNet.}
翻译:场景文本移除(STR)是一项具有挑战性的任务,原因在于场景图像中复杂的文本字体、颜色、尺寸及背景纹理。然而,以往大多数方法在单一网络中隐式地同时学习文本定位与背景修复,这削弱了文本定位机制并导致背景信息缺失。为解决这些问题,我们提出了一种简洁的渐进式分割引导场景文本移除网络(PSSTRNet),以迭代方式移除图像中的文本。该网络包含两个解码分支(文本分割分支与文本移除分支),并共享一个编码器。文本分割分支生成文本掩码图,作为区域移除分支的引导。在每次迭代中,原始图像、上一阶段的文本移除结果及文本掩码共同输入网络,以提取剩余文本片段并获得更干净的文本移除结果。为获取更精确的文本掩码图,我们开发了一个更新模块,用于融合当前阶段与前一阶段的掩码图。最终文本移除结果通过对所有先前阶段的输出进行自适应融合而获得。在真实与合成公共数据集上开展的大量实验与消融研究表明,所提方法达到了最先进的性能水平。本工作的源代码可通过以下链接获取:\href{https://github.com/GuangtaoLyu/PSSTRNet}{https://github.com/GuangtaoLyu/PSSTRNet}。