We present a new table structure recognition (TSR) approach, called TSRFormer, to robustly recognizing the structures of complex tables with geometrical distortions from various table images. Unlike previous methods, we formulate table separation line prediction as a line regression problem instead of an image segmentation problem and propose a new two-stage dynamic queries enhanced DETR based separation line regression approach, named DQ-DETR, to predict separation lines from table images directly. Compared to Vallina DETR, we propose three improvements in DQ-DETR to make the two-stage DETR framework work efficiently and effectively for the separation line prediction task: 1) A new query design, named Dynamic Query, to decouple single line query into separable point queries which could intuitively improve the localization accuracy for regression tasks; 2) A dynamic queries based progressive line regression approach to progressively regressing points on the line which further enhances localization accuracy for distorted tables; 3) A prior-enhanced matching strategy to solve the slow convergence issue of DETR. After separation line prediction, a simple relation network based cell merging module is used to recover spanning cells. With these new techniques, our TSRFormer achieves state-of-the-art performance on several benchmark datasets, including SciTSR, PubTabNet, WTW and FinTabNet. Furthermore, we have validated the robustness and high localization accuracy of our approach to tables with complex structures, borderless cells, large blank spaces, empty or spanning cells as well as distorted or even curved shapes on a more challenging real-world in-house dataset.
翻译:我们提出一种新的表格结构识别方法,称为TSRFormer,用于鲁棒识别来自各种表格图像的、存在几何畸变的复杂表格结构。与以往方法不同,我们将表格分隔线预测问题建模为线回归问题而非图像分割问题,并提出一种基于动态查询增强DETR的两阶段分隔线回归方法(名为DQ-DETR),直接从表格图像中预测分隔线。针对Vallina DETR,我们在DQ-DETR中提出三项改进,使两阶段DETR框架能够高效且有效地完成分隔线预测任务:1) 一种名为动态查询的新型查询设计,将单一线查询解耦为可分离的点查询,直观提升回归任务的定位精度;2) 一种基于动态查询的渐进式线回归方法,逐步回归线上的点,进一步增强了畸变表格的定位精度;3) 一种先验增强匹配策略,解决DETR收敛缓慢的问题。在分隔线预测之后,使用一个基于关系网络的简单单元格合并模块来恢复合并单元格。借助这些新技术,我们的TSRFormer在多个基准数据集(包括SciTSR、PubTabNet、WTW和FinTabNet)上取得了最优性能。此外,我们在更具挑战性的真实世界内部数据集上验证了该方法对复杂结构表格、无边框单元格、大空白区域、空单元格或合并单元格以及畸变甚至弯曲形状的鲁棒性和高定位精度。