Image-based table recognition is a challenging task due to the diversity of table styles and the complexity of table structures. Most of the previous methods focus on a non-end-to-end approach which divides the problem into two separate sub-problems: table structure recognition; and cell-content recognition and then attempts to solve each sub-problem independently using two separate systems. In this paper, we propose an end-to-end multi-task learning model for image-based table recognition. The proposed model consists of one shared encoder, one shared decoder, and three separate decoders which are used for learning three sub-tasks of table recognition: table structure recognition, cell detection, and cell-content recognition. The whole system can be easily trained and inferred in an end-to-end approach. In the experiments, we evaluate the performance of the proposed model on two large-scale datasets: FinTabNet and PubTabNet. The experiment results show that the proposed model outperforms the state-of-the-art methods in all benchmark datasets.
翻译:基于图像的表格识别因表格样式多样性和结构复杂性而具有挑战性。以往方法多采用非端到端方式,将问题拆分为表格结构识别与单元格内容识别两个独立子问题,并通过两个独立系统分别求解。本文提出一种用于图像表格识别的端到端多任务学习模型。该模型包含一个共享编码器、一个共享解码器及三个独立解码器,分别用于学习表格识别的三个子任务:表格结构识别、单元格检测和单元格内容识别。整个系统可通过端到端方式轻松训练和推理。实验中,我们在两个大规模数据集(FinTabNet和PubTabNet)上评估了模型性能。实验结果表明,该模型在所有基准数据集上均优于现有最优方法。