Table detection within document images is a crucial task in document processing, involving the identification and localization of tables. Recent strides in deep learning have substantially improved the accuracy of this task, but it still heavily relies on large labeled datasets for effective training. Several semi-supervised approaches have emerged to overcome this challenge, often employing CNN-based detectors with anchor proposals and post-processing techniques like non-maximal suppression (NMS). However, recent advancements in the field have shifted the focus towards transformer-based techniques, eliminating the need for NMS and emphasizing object queries and attention mechanisms. Previous research has focused on two key areas to improve transformer-based detectors: refining the quality of object queries and optimizing attention mechanisms. However, increasing object queries can introduce redundancy, while adjustments to the attention mechanism can increase complexity. To address these challenges, we introduce a semi-supervised approach employing SAM-DETR, a novel approach for precise alignment between object queries and target features. Our approach demonstrates remarkable reductions in false positives and substantial enhancements in table detection performance, particularly in complex documents characterized by diverse table structures. This work provides more efficient and accurate table detection in semi-supervised settings.
翻译:文档图像中的表格检测是文档处理中的一项关键任务,涉及表格的识别与定位。近年来深度学习的进展显著提升了该任务的准确率,但其有效训练仍严重依赖大规模标注数据集。为克服这一挑战,多种半监督方法应运而生,这些方法通常采用基于锚点提议的CNN检测器,并辅以非极大值抑制(NMS)等后处理技术。然而,该领域的最新进展已将重点转向基于Transformer的技术,这类方法无需NMS,强调目标查询与注意力机制。先前研究主要聚焦于两个关键方向以改进基于Transformer的检测器:优化目标查询质量与注意力机制。然而,增加目标查询可能引入冗余,而调整注意力机制则会提升复杂度。为解决上述挑战,我们提出了一种采用SAM-DETR的半监督方法——该创新方法实现了目标查询与目标特征的精确对齐。实验表明,我们的方法在减少误检方面成效显著,尤其在表格结构多样化的复杂文档中,大幅提升了表格检测性能。本研究为半监督场景提供了更高效、更精确的表格检测方案。