Table Detection (TD) is a fundamental task to enable visually rich document understanding, which requires the model to extract information without information loss. However, popular Intersection over Union (IoU) based evaluation metrics and IoU-based loss functions for the detection models cannot directly represent the degree of information loss for the prediction results. Therefore, we propose to decouple IoU into a ground truth coverage term and a prediction coverage term, in which the former can be used to measure the information loss of the prediction results. Besides, considering the sparse distribution of tables in document images, we use SparseR-CNN as the base model and further improve the model by using Gaussian Noise Augmented Image Size region proposals and many-to-one label assignments. Results under comprehensive experiments show that the proposed method can consistently outperform state-of-the-art methods with different IoU-based metrics under various datasets and demonstrate that the proposed decoupled IoU loss can enable the model to alleviate information loss.
翻译:表格检测(Table Detection, TD)是实现视觉丰富文档理解的基础任务,要求模型能够无信息损失地提取信息。然而,当前基于交并比(IoU)的检测模型评估指标与损失函数无法直接表征预测结果的信息损失程度。为此,我们提出将IoU解耦为真值覆盖项与预测覆盖项,其中前者可用于衡量预测结果的信息损失。此外,针对文档图像中表格分布的稀疏特性,我们以SparseR-CNN为基础模型,通过引入高斯噪声增强图像尺寸的区域提议与多对一标签分配策略进行改进。综合实验结果表明,所提方法在多种数据集上采用不同IoU指标均能持续超越现有最优方法,并验证了解耦IoU损失能够引导模型缓解信息损失问题。