Inspired by deep convolution segmentation algorithms, scene text detectors break the performance ceiling of datasets steadily. However, these methods often encounter threshold selection bottlenecks and have poor performance on text instances with extreme aspect ratios. In this paper, we propose to automatically learn the discriminate segmentation threshold, which distinguishes text pixels from background pixels for segmentation-based scene text detectors and then further reduces the time-consuming manual parameter adjustment. Besides, we design a Global-information Enhanced Feature Pyramid Network (GE-FPN) for capturing text instances with macro size and extreme aspect ratios. Following the GE-FPN, we introduce a cascade optimization structure to further refine the text instances. Finally, together with the proposed threshold learning strategy and text detection structure, we design an Adaptive Segmentation Network (ASNet) for scene text detection. Extensive experiments are carried out to demonstrate that the proposed ASNet can achieve the state-of-the-art performance on four text detection benchmarks, i.e., ICDAR 2015, MSRA-TD500, ICDAR 2017 MLT and CTW1500. The ablation experiments also verify the effectiveness of our contributions.
翻译:受深度卷积分割算法的启发,场景文本检测器不断突破数据集的性能上限。然而,这些方法常面临阈值选择瓶颈,且对具有极端宽高比的文本实例表现不佳。本文提出一种自动学习判别性分割阈值的方法,该方法可为基于分割的场景文本检测器区分文本像素与背景像素,进而进一步减少耗时的参数手动调整。此外,我们设计了一个全局信息增强特征金字塔网络(GE-FPN),用于捕获宏观尺寸及极端宽高比的文本实例。基于GE-FPN,我们引入级联优化结构以进一步精化文本实例。最终,结合所提出的阈值学习策略与文本检测结构,我们构建了面向场景文本检测的自适应分割网络(ASNet)。大量实验表明,所提出的ASNet在四个文本检测基准数据集(即ICDAR 2015、MSRA-TD500、ICDAR 2017 MLT和CTW1500)上均能达到最先进的性能。消融实验也验证了我们各项贡献的有效性。