Table Detection (TD) is a fundamental task towards visually rich document understanding. Current studies usually formulate the TD problem as an object detection problem, then leverage Intersection over Union (IoU) based metrics to evaluate the model performance and IoU-based loss functions to optimize the model. TD applications usually require the prediction results to cover all the table contents and avoid information loss. However, IoU and IoU-based loss functions cannot directly reflect the degree of information loss for the prediction results. Therefore, we propose to decouple IoU into a ground truth coverage term and a prediction coverage term, in which the former can be used to measure the information loss of the prediction results. Besides, tables in the documents are usually large, sparsely distributed, and have no overlaps because they are designed to summarize essential information to make it easy to read and interpret for human readers. Therefore, in this study, we use SparseR-CNN as the base model, and further improve the model by using Gaussian Noise Augmented Image Size region proposals and many-to-one label assignments. To demonstrate the effectiveness of proposed method and compare with state-of-the-art methods fairly, we conduct experiments and use IoU-based evaluation metrics to evaluate the model performance. The experimental results show that the proposed method can consistently outperform state-of-the-art methods under different IoU-based metric on a variety of datasets. We conduct further experiments to show the superiority of the proposed decoupled IoU for the TD applications by replacing the IoU-based loss functions and evaluation metrics with proposed decoupled IoU counterparts. The experimental results show that our proposed decoupled IoU loss can encourage the model to alleviate information loss.
翻译:表格检测(Table Detection, TD)是视觉丰富文档理解的基础任务。当前研究通常将表格检测问题建模为目标检测问题,并采用基于交并比(Intersection over Union, IoU)的指标评估模型性能,以及基于IoU的损失函数优化模型。表格检测应用通常要求预测结果覆盖所有表格内容且避免信息丢失。然而,IoU及基于IoU的损失函数无法直接反映预测结果的信息丢失程度。因此,我们提出将IoU解耦为真实标注覆盖项与预测覆盖项,其中前者可用于度量预测结果的信息丢失。此外,文档中的表格通常尺寸较大、分布稀疏且无重叠,因其设计初衷在于总结关键信息以方便人类阅读与理解。为此,本研究采用SparseR-CNN作为基础模型,并通过引入高斯噪声增强图像尺寸候选区域与多对一标签分配机制进一步改进模型。为验证所提方法的有效性并与先进方法公平对比,我们开展实验并采用基于IoU的评估指标评价模型性能。实验结果表明,所提方法在多种数据集的多个基于IoU的指标上均能持续优于现有先进方法。进一步通过将基于IoU的损失函数与评估指标替换为所提解耦IoU对应项,我们验证了解耦IoU在表格检测应用中的优势。实验结果表明,所提解耦IoU损失函数可有效促进模型缓解信息丢失问题。