A table is an object that captures structured and informative content within a document, and recognizing a table in an image is challenging due to the complexity and variety of table layouts. Many previous works typically adopt a two-stage approach; (1) Table detection(TD) localizes the table region in an image and (2) Table Structure Recognition(TSR) identifies row- and column-wise adjacency relations between the cells. The use of a two-stage approach often entails the consequences of error propagation between the modules and raises training and inference inefficiency. In this work, we analyze the natural characteristics of a table, where a table is composed of cells and each cell is made up of borders consisting of edges. We propose a novel method to reconstruct the table in a bottom-up manner. Through a simple process, the proposed method separates cell boundaries from low-level features, such as corners and edges, and localizes table positions by combining the cells. A simple design makes the model easier to train and requires less computation than previous two-stage methods. We achieve state-of-the-art performance on the ICDAR2013 table competition benchmark and Wired Table in the Wild(WTW) dataset.
翻译:表格是文档中用于捕获结构化信息内容的对象,由于表格布局的复杂性和多样性,在图像中识别表格极具挑战性。以往方法通常采用两阶段流程:(1)表格检测(TD)用于定位图像中的表格区域,(2)表格结构识别(TSR)用于识别单元格之间的行列邻接关系。两阶段方法常导致模块间的误差传播问题,并降低训练与推理效率。本研究分析了表格的自然特性——表格由单元格构成,每个单元格由边缘构成的边框组成。我们提出一种新颖的由下而上的表格重建方法。通过简洁的处理流程,该方法从角点、边缘等低层特征中分离单元格边界,并通过组合单元格定位表格位置。简洁的设计使模型相比以往两阶段方法更易训练且计算量更小。我们在ICDAR2013表格竞赛基准数据集和WTW有线表格数据集上均取得了最优性能。