Focus Entirety and Perceive Environment for Arbitrary-Shaped Text Detection

Due to the diversity of scene text in aspects such as font, color, shape, and size, accurately and efficiently detecting text is still a formidable challenge. Among the various detection approaches, segmentation-based approaches have emerged as prominent contenders owing to their flexible pixel-level predictions. However, these methods typically model text instances in a bottom-up manner, which is highly susceptible to noise. In addition, the prediction of pixels is isolated without introducing pixel-feature interaction, which also influences the detection performance. To alleviate these problems, we propose a multi-information level arbitrary-shaped text detector consisting of a focus entirety module (FEM) and a perceive environment module (PEM). The former extracts instance-level features and adopts a top-down scheme to model texts to reduce the influence of noises. Specifically, it assigns consistent entirety information to pixels within the same instance to improve their cohesion. In addition, it emphasizes the scale information, enabling the model to distinguish varying scale texts effectively. The latter extracts region-level information and encourages the model to focus on the distribution of positive samples in the vicinity of a pixel, which perceives environment information. It treats the kernel pixels as positive samples and helps the model differentiate text and kernel features. Extensive experiments demonstrate the FEM's ability to efficiently support the model in handling different scale texts and confirm the PEM can assist in perceiving pixels more accurately by focusing on pixel vicinities. Comparisons show the proposed model outperforms existing state-of-the-art approaches on four public datasets.

翻译：由于场景文本在字体、颜色、形状和大小等方面的多样性，准确且高效地检测文本仍然是一项艰巨的挑战。在各种检测方法中，基于分割的方法因其灵活的像素级预测能力而成为突出的竞争者。然而，这些方法通常以自底向上的方式建模文本实例，极易受到噪声干扰。此外，像素预测过程孤立，未引入像素特征交互，这也影响了检测性能。为缓解这些问题，我们提出了一种多信息层级的任意形状文本检测器，由聚焦整体模块（FEM）和感知环境模块（PEM）组成。前者提取实例级特征，并采用自顶向下的方案建模文本以减少噪声影响。具体而言，它为同一实例内的像素分配一致的整体信息以增强其内聚性。此外，它强调尺度信息，使模型能有效区分不同尺度的文本。后者提取区域级信息，促使模型关注像素邻近区域正样本的分布，从而感知环境信息。它将核像素视为正样本，帮助模型区分文本特征与核特征。大量实验证明FEM能有效支持模型处理不同尺度的文本，并证实PEM可通过关注像素邻域来辅助更准确地感知像素。对比实验表明，所提模型在四个公开数据集上优于现有的先进方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日