Recently, regression-based methods, which predict parameterized text shapes for text localization, have gained popularity in scene text detection. However, the existing parameterized text shape methods still have limitations in modeling arbitrary-shaped texts due to ignoring the utilization of text-specific shape information. Moreover, the time consumption of the entire pipeline has been largely overlooked, leading to a suboptimal overall inference speed. To address these issues, we first propose a novel parameterized text shape method based on low-rank approximation. Unlike other shape representation methods that employ data-irrelevant parameterization, our approach utilizes singular value decomposition and reconstructs the text shape using a few eigenvectors learned from labeled text contours. By exploring the shape correlation among different text contours, our method achieves consistency, compactness, simplicity, and robustness in shape representation. Next, we propose a dual assignment scheme for speed acceleration. It adopts a sparse assignment branch to accelerate the inference speed, and meanwhile, provides ample supervised signals for training through a dense assignment branch. Building upon these designs, we implement an accurate and efficient arbitrary-shaped text detector named LRANet. Extensive experiments are conducted on several challenging benchmarks, demonstrating the superior accuracy and efficiency of LRANet compared to state-of-the-art methods. Code is available at: \url{https://github.com/ychensu/LRANet.git}
翻译:近期,基于回归的方法因其通过预测参数化文本形状实现文本定位而在场景文本检测领域广受关注。然而,现有参数化文本形状方法因忽视文本特异性形状信息的利用,在建模任意形状文本时仍存在局限性。此外,整体流水线的时间消耗常被忽略,导致推理速度未达到最优。针对这些问题,本文首先提出一种基于低秩逼近的新型参数化文本形状方法。不同于采用数据无关参数化的其他形状表示方法,本方法利用奇异值分解,通过从标注文本轮廓中学习少量特征向量重构文本形状。通过探索不同文本轮廓间的形状关联性,本方法在形状表示中实现了一致性、紧凑性、简洁性与鲁棒性。随后,本文提出一种用于速度提升的双重分配机制:该机制采用稀疏分配分支加速推理速度,同时通过密集分配分支为训练提供充足监督信号。基于上述设计,我们实现了名为LRANet的高效精准任意形状文本检测器。在多个具有挑战性的基准数据集上的大量实验表明,与现有最优方法相比,LRANet在准确率和效率方面均表现卓越。代码开源地址:\url{https://github.com/ychensu/LRANet.git}