Obtaining annotated table structure data for complex tables is a challenging task due to the inherent diversity and complexity of real-world document layouts. The scarcity of publicly available datasets with comprehensive annotations for intricate table structures hinders the development and evaluation of models designed for such scenarios. This research paper introduces a novel approach for generating annotated images for table structure by leveraging conditioned mask images of rows and columns through the application of latent diffusion models. The proposed method aims to enhance the quality of synthetic data used for training object detection models. Specifically, the study employs a conditioning mechanism to guide the generation of complex document table images, ensuring a realistic representation of table layouts. To evaluate the effectiveness of the generated data, we employ the popular YOLOv5 object detection model for training. The generated table images serve as valuable training samples, enriching the dataset with diverse table structures. The model is subsequently tested on the challenging pubtables-1m testset, a benchmark for table structure recognition in complex document layouts. Experimental results demonstrate that the introduced approach significantly improves the quality of synthetic data for training, leading to YOLOv5 models with enhanced performance. The mean Average Precision (mAP) values obtained on the pubtables-1m testset showcase results closely aligned with state-of-the-art methods. Furthermore, low FID results obtained on the synthetic data further validate the efficacy of the proposed methodology in generating annotated images for table structure.
翻译:由于真实世界文档布局固有的多样性和复杂性,获取复杂表格的标注化表格结构数据是一项具有挑战性的任务。缺乏针对复杂表格结构提供全面标注的公开可用数据集,阻碍了为此类场景设计的模型的开发与评估。本研究论文提出了一种新颖的方法,通过应用隐式扩散模型并利用行列的条件化掩码图像,来生成用于表格结构的标注图像。所提出的方法旨在提升用于训练目标检测模型的合成数据质量。具体而言,本研究采用一种条件机制来引导复杂文档表格图像的生成,确保表格布局的真实表征。为了评估生成数据的有效性,我们采用流行的YOLOv5目标检测模型进行训练。生成的表格图像作为有价值的训练样本,丰富了数据集中表格结构的多样性。随后,该模型在具有挑战性的pubtables-1m测试集上进行了测试,该测试集是复杂文档布局中表格结构识别的基准。实验结果表明,所引入的方法显著提高了用于训练的合成数据质量,从而使得YOLOv5模型获得了增强的性能。在pubtables-1m测试集上获得的平均精度均值(mAP)显示出与最先进方法高度一致的结果。此外,在合成数据上获得的低FID结果进一步验证了所提方法在生成表格结构标注图像方面的有效性。