FETNet: Feature Erasing and Transferring Network for Scene Text Removal

The scene text removal (STR) task aims to remove text regions and recover the background smoothly in images for private information protection. Most existing STR methods adopt encoder-decoder-based CNNs, with direct copies of the features in the skip connections. However, the encoded features contain both text texture and structure information. The insufficient utilization of text features hampers the performance of background reconstruction in text removal regions. To tackle these problems, we propose a novel Feature Erasing and Transferring (FET) mechanism to reconfigure the encoded features for STR in this paper. In FET, a Feature Erasing Module (FEM) is designed to erase text features. An attention module is responsible for generating the feature similarity guidance. The Feature Transferring Module (FTM) is introduced to transfer the corresponding features in different layers based on the attention guidance. With this mechanism, a one-stage, end-to-end trainable network called FETNet is constructed for scene text removal. In addition, to facilitate research on both scene text removal and segmentation tasks, we introduce a novel dataset, Flickr-ST, with multi-category annotations. A sufficient number of experiments and ablation studies are conducted on the public datasets and Flickr-ST. Our proposed method achieves state-of-the-art performance using most metrics, with remarkably higher quality scene text removal results. The source code of our work is available at: \href{https://github.com/GuangtaoLyu/FETNet}{https://github.com/GuangtaoLyu/FETNet.

翻译：场景文本擦除（STR）任务旨在移除图像中的文本区域并平滑恢复背景，以保护隐私信息。现有大多数STR方法采用基于编码器-解码器的卷积神经网络，并通过跳跃连接直接复制特征。然而，编码后的特征同时包含文本纹理与结构信息。对文本特征的利用不足会阻碍文本擦除区域的背景重建性能。针对这些问题，本文提出一种新颖的特征擦除与迁移（FET）机制，以重构用于STR的编码特征。在FET中，设计了特征擦除模块（FEM）以擦除文本特征，注意力模块负责生成特征相似性引导，特征迁移模块（FTM）则基于注意力引导在不同层间迁移相应特征。基于该机制，构建了名为FETNet的单阶段端到端可训练网络用于场景文本擦除。此外，为促进场景文本擦除与分割任务的研究，我们引入带有多类别标注的新数据集Flickr-ST。在公开数据集与Flickr-ST上进行了充分的实验与消融研究。所提方法在多数指标上达到最先进性能，并生成质量显著更高的场景文本擦除结果。本工作源代码已发布于：\href{https://github.com/GuangtaoLyu/FETNet}{https://github.com/GuangtaoLyu/FETNet}。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日