Document dewarping, aiming to eliminate geometric deformation in photographed documents to benefit text recognition, has made great progress in recent years but is still far from being solved. While Cartesian coordinates are typically leveraged by state-of-the-art approaches to learn a group of deformation control points, such representation is not efficient for dewarping model to learn the deformation information. In this work, we explore Polar coordinates representation for each point in document dewarping, namely Polar-Doc. In contrast to most current works adopting a two-stage pipeline typically, Polar representation enables a unified point regression framework for both segmentation and dewarping network in one single stage. Such unification makes the whole model more efficient to learn under an end-to-end optimization pipeline, and also obtains a compact representation. Furthermore, we propose a novel multi-scope Polar-Doc-IOU loss to constrain the relationship among control points as a grid-based regularization under the Polar representation. Visual comparisons and quantitative experiments on two benchmarks show that, with much fewer parameters than the other mainstream counterparts, our one-stage model with multi-scope constraints achieves new state-of-the-art performance on both pixel alignment metrics and OCR metrics. Source codes will be available at \url{*****}.
翻译:文档去扭曲旨在消除拍摄文档中的几何变形以提升文本识别性能,近年来虽取得显著进展,但仍远未解决。现有主流方法通常利用笛卡尔坐标学习一组变形控制点,但这种表示方式对去扭曲模型学习变形信息效率不高。本文针对文档去扭曲任务探索了每个点的极坐标表示,即Polar-Doc。与当前多数采用两阶段流水线的工作不同,极坐标表示使得分割网络与去扭曲网络能够在单一阶段中实现统一点回归框架。这种统一性使整个模型在端到端优化流水线下学习更高效,同时获得紧凑的表示。此外,我们提出新颖的多范围Polar-Doc-IOU损失函数,在极坐标表示下以网格正则化形式约束控制点间关系。在两个基准上的视觉比较和定量实验表明,与主流方法相比,我们的单阶段多范围约束模型在参数量大幅减少的情况下,在像素对齐指标和OCR指标上均达到新的最优性能。源代码将发布于\url{*****}。