The irregular contour representation is one of the tough challenges in scene text detection. Although segmentation-based methods have achieved significant progress with the help of flexible pixel prediction, the overlap of geographically close texts hinders detecting them separately. To alleviate this problem, some shrink-based methods predict text kernels and expand them to restructure texts. However, the text kernel is an artificial object with incomplete semantic features that are prone to incorrect or missing detection. In addition, different from the general objects, the geometry features (aspect ratio, scale, and shape) of scene texts vary significantly, which makes it difficult to detect them accurately. To consider the above problems, we propose an effective spotlight text detector (STD), which consists of a spotlight calibration module (SCM) and a multivariate information extraction module (MIEM). The former concentrates efforts on the candidate kernel, like a camera focus on the target. It obtains candidate features through a mapping filter and calibrates them precisely to eliminate some false positive samples. The latter designs different shape schemes to explore multiple geometric features for scene texts. It helps extract various spatial relationships to improve the model's ability to recognize kernel regions. Ablation studies prove the effectiveness of the designed SCM and MIEM. Extensive experiments verify that our STD is superior to existing state-of-the-art methods on various datasets, including ICDAR2015, CTW1500, MSRA-TD500, and Total-Text.
翻译:不规则轮廓表示是场景文本检测中的严峻挑战之一。尽管基于分割的方法借助灵活的像素预测取得了显著进展,但地理上邻近文本的重叠阻碍了它们的独立检测。为缓解此问题,一些基于收缩的方法预测文本内核并将其扩展以重构文本。然而,文本内核是人工定义的对象,其语义特征不完整,容易导致错误或漏检。此外,与一般物体不同,场景文本的几何特征(长宽比、尺度和形状)差异显著,这使其难以被准确检测。针对上述问题,我们提出一种有效的聚光灯文本检测器(STD),它由聚光灯校准模块(SCM)和多元信息提取模块(MIEM)组成。前者将注意力集中于候选内核,如同相机聚焦于目标。它通过映射滤波器获取候选特征并对其进行精确校准,以消除部分误报样本。后者设计了不同的形状方案来探索场景文本的多种几何特征,有助于提取多样化的空间关系以提升模型识别内核区域的能力。消融研究证明了所设计的SCM和MIEM的有效性。大量实验验证了我们的STD在多个数据集(包括ICDAR2015、CTW1500、MSRA-TD500和Total-Text)上优于现有的最先进方法。