All tables can be represented as grids. Based on this observation, we propose GridFormer, a novel approach for interpreting unconstrained table structures by predicting the vertex and edge of a grid. First, we propose a flexible table representation in the form of an MXN grid. In this representation, the vertexes and edges of the grid store the localization and adjacency information of the table. Then, we introduce a DETR-style table structure recognizer to efficiently predict this multi-objective information of the grid in a single shot. Specifically, given a set of learned row and column queries, the recognizer directly outputs the vertexes and edges information of the corresponding rows and columns. Extensive experiments on five challenging benchmarks which include wired, wireless, multi-merge-cell, oriented, and distorted tables demonstrate the competitive performance of our model over other methods.
翻译:所有表格均可表示为网格形式。基于这一发现,我们提出GridFormer——一种通过预测网格顶点与边来实现无约束表格结构解析的新方法。首先,我们提出一种基于M×N网格的灵活表格表示形式,其中网格的顶点与边分别存储表格的定位信息与邻接关系。随后,我们引入DETR风格的表格结构识别器,通过一次前向传播高效预测该多目标网格信息。具体而言,该识别器基于一组可学习的行/列查询向量,直接输出对应行列的顶点与边信息。在包含有线表、无线表、多单元格合并表、倾斜表及畸变表等五种具有挑战性的基准数据集上的大量实验表明,本模型相较于其他方法具有显著竞争性表现。