Table structure recognition (TSR) aims at extracting tables in images into machine-understandable formats. Recent methods solve this problem by predicting the adjacency relations of detected cell boxes, or learning to generate the corresponding markup sequences from the table images. However, they either count on additional heuristic rules to recover the table structures, or require a huge amount of training data and time-consuming sequential decoders. In this paper, we propose an alternative paradigm. We model TSR as a logical location regression problem and propose a new TSR framework called LORE, standing for LOgical location REgression network, which for the first time combines logical location regression together with spatial location regression of table cells. Our proposed LORE is conceptually simpler, easier to train and more accurate than previous TSR models of other paradigms. Experiments on standard benchmarks demonstrate that LORE consistently outperforms prior arts. Code is available at https:// github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR.
翻译:表格结构识别旨在将图像中的表格提取为机器可理解的格式。现有方法通常通过预测检测到的单元格框的邻接关系,或学习从表格图像生成对应的标记序列来解决该问题。然而,这些方法要么依赖额外的启发式规则来恢复表格结构,要么需要大量的训练数据和耗时的序列解码器。本文提出了一种替代范式。我们将表格结构识别建模为逻辑位置回归问题,并提出了一种新的表格结构识别框架LORE(逻辑位置回归网络),该框架首次将逻辑位置回归与表格单元格的空间位置回归相结合。与以往其他范式的表格结构识别模型相比,本文提出的LORE在概念上更简洁、更易训练且精度更高。标准基准实验证明,LORE持续优于现有方法。代码开源地址:https://github.com/AlibabaResearch/AdvancedLiterateMachinery/tree/main/DocumentUnderstanding/LORE-TSR。