Kernel Proposal Network for Arbitrary Shape Text Detection

from arxiv, This paper was completed in 2020-11.It was first submitted to CVPR 2021 and then ICCV 2021. Finally, it has been accepted by TNNLS in 2022-02 after major revision. Here, I thank Dr.Hou for his important contributions

Segmentation-based methods have achieved great success for arbitrary shape text detection. However, separating neighboring text instances is still one of the most challenging problems due to the complexity of texts in scene images. In this paper, we propose an innovative Kernel Proposal Network (dubbed KPN) for arbitrary shape text detection. The proposed KPN can separate neighboring text instances by classifying different texts into instance-independent feature maps, meanwhile avoiding the complex aggregation process existing in segmentation-based arbitrary shape text detection methods. To be concrete, our KPN will predict a Gaussian center map for each text image, which will be used to extract a series of candidate kernel proposals (i.e., dynamic convolution kernel) from the embedding feature maps according to their corresponding keypoint positions. To enforce the independence between kernel proposals, we propose a novel orthogonal learning loss (OLL) via orthogonal constraints. Specifically, our kernel proposals contain important self-information learned by network and location information by position embedding. Finally, kernel proposals will individually convolve all embedding feature maps for generating individual embedded maps of text instances. In this way, our KPN can effectively separate neighboring text instances and improve the robustness against unclear boundaries. To our knowledge, our work is the first to introduce the dynamic convolution kernel strategy to efficiently and effectively tackle the adhesion problem of neighboring text instances in text detection. Experimental results on challenging datasets verify the impressive performance and efficiency of our method. The code and model are available at https://github.com/GXYM/KPN.

翻译：基于分割的方法在任意形状文本检测中取得了巨大成功。然而，由于场景图像中文本的复杂性，分离相邻文本实例仍是最具挑战性的问题之一。本文提出了一种创新的核建议网络（称为KPN）用于任意形状文本检测。所提出的KPN能够通过将不同文本分类为实例无关的特征图来分离相邻文本实例，同时避免了现有基于分割的任意形状文本检测方法中复杂的聚合过程。具体而言，我们的KPN将为每个文本图像预测一个高斯中心图，该图用于从嵌入特征图中根据其对应关键点位置提取一系列候选核建议（即动态卷积核）。为增强核建议之间的独立性，我们通过正交约束提出了一种新颖的正交学习损失（OLL）。具体地，我们的核建议包含网络学习得到的重要自信息和通过位置嵌入获得的位置信息。最后，核建议将分别与所有嵌入特征图进行卷积，以生成文本实例的独立嵌入图。通过这种方式，我们的KPN能够有效分离相邻文本实例，并提高对模糊边界的鲁棒性。据我们所知，本文首次引入动态卷积核策略，高效且有效地解决了文本检测中相邻文本实例的粘连问题。在具有挑战性的数据集上的实验结果验证了所提方法的显著性能和效率。代码和模型已开源至https://github.com/GXYM/KPN。