Latent Diffusion for Guided Document Table Generation

Obtaining annotated table structure data for complex tables is a challenging task due to the inherent diversity and complexity of real-world document layouts. The scarcity of publicly available datasets with comprehensive annotations for intricate table structures hinders the development and evaluation of models designed for such scenarios. This research paper introduces a novel approach for generating annotated images for table structure by leveraging conditioned mask images of rows and columns through the application of latent diffusion models. The proposed method aims to enhance the quality of synthetic data used for training object detection models. Specifically, the study employs a conditioning mechanism to guide the generation of complex document table images, ensuring a realistic representation of table layouts. To evaluate the effectiveness of the generated data, we employ the popular YOLOv5 object detection model for training. The generated table images serve as valuable training samples, enriching the dataset with diverse table structures. The model is subsequently tested on the challenging pubtables-1m testset, a benchmark for table structure recognition in complex document layouts. Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training, leading to YOLOv5 models with enhanced performance. The mean Average Precision (mAP) values obtained on the pubtables-1m testset showcase results closely aligned with state-of-the-art methods. Furthermore, low FID results obtained on the synthetic data further validate the efficacy of the proposed methodology in generating annotated images for table structure.

翻译：由于真实世界文档布局固有的多样性和复杂性，获取复杂表格的标注化表格结构数据是一项具有挑战性的任务。缺乏针对复杂表格结构提供全面标注的公开可用数据集，阻碍了为此类场景设计的模型的开发与评估。本研究论文提出了一种新颖的方法，通过应用隐式扩散模型并利用行列的条件化掩码图像，来生成用于表格结构的标注图像。所提出的方法旨在提升用于训练目标检测模型的合成数据质量。具体而言，本研究采用一种条件机制来引导复杂文档表格图像的生成，确保表格布局的真实表征。为了评估生成数据的有效性，我们采用流行的YOLOv5目标检测模型进行训练。生成的表格图像作为有价值的训练样本，丰富了数据集中表格结构的多样性。随后，该模型在具有挑战性的pubtables-1m测试集上进行了测试，该测试集是复杂文档布局中表格结构识别的基准。实验结果表明，所引入的方法显著提高了用于训练的合成数据质量，从而使得YOLOv5模型获得了增强的性能。在pubtables-1m测试集上获得的平均精度均值（mAP）显示出与最先进方法高度一致的结果。此外，在合成数据上获得的低FID结果进一步验证了所提方法在生成表格结构标注图像方面的有效性。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日