Inspired by deep convolution segmentation algorithms, scene text detectors break the performance ceiling of datasets steadily. However, these methods often encounter threshold selection bottlenecks and have poor performance on text instances with extreme aspect ratios. In this paper, we propose to automatically learn the discriminate segmentation threshold, which distinguishes text pixels from background pixels for segmentation-based scene text detectors and then further reduces the time-consuming manual parameter adjustment. Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios. Following the GE-FPN, we introduce a cascade optimization structure to further refine the text instances. Finally, together with the proposed threshold learning strategy and text detection structure, we design an Adaptive Segmentation Network (ASNet) for scene text detection. Extensive experiments are carried out to demonstrate that the proposed ASNet can achieve the state-of-the-art performance on four text detection benchmarks, i.e., ICDAR 2015, MSRA-TD500, ICDAR 2017 MLT and CTW1500. The ablation experiments also verify the effectiveness of our contributions.
翻译:受深度卷积分割算法的启发,场景文本检测器持续突破数据集性能上限。然而,此类方法常面临阈值选择瓶颈,对极端长宽比的文本实例表现欠佳。本文提出自动学习判别式分割阈值的方法,该方法可区分基于分割的场景文本检测器中的文本像素与背景像素,进而减少耗时的手动参数调整。此外,我们设计了全局信息增强特征金字塔网络(GE-FPN)以捕捉宏观尺寸及极端长宽比的文本实例。基于GE-FPN,我们引入级联优化结构以进一步细化文本实例。最终,结合所提出的阈值学习策略与文本检测结构,我们构建了用于场景文本检测的自适应分割网络(ASNet)。大量实验表明,所提ASNet在四个文本检测基准数据集(即ICDAR 2015、MSRA-TD500、ICDAR 2017 MLT和CTW1500)上均能取得最优性能。消融实验也验证了各项贡献的有效性。