Before developing a Document Layout Analysis (DLA) model in real-world applications, conducting comprehensive robustness testing is essential. However, the robustness of DLA models remains underexplored in the literature. To address this, we are the first to introduce a robustness benchmark for DLA models, which includes 450K document images of three datasets. To cover realistic corruptions, we propose a perturbation taxonomy with 36 common document perturbations inspired by real-world document processing. Additionally, to better understand document perturbation impacts, we propose two metrics, Mean Perturbation Effect (mPE) for perturbation assessment and Mean Robustness Degradation (mRD) for robustness evaluation. Furthermore, we introduce a self-titled model, i.e., Robust Document Layout Analyzer (RoDLA), which improves attention mechanisms to boost extraction of robust features. Experiments on the proposed benchmarks (PubLayNet-P, DocLayNet-P, and M$^6$Doc-P) demonstrate that RoDLA obtains state-of-the-art mRD scores of 115.7, 135.4, and 150.4, respectively. Compared to previous methods, RoDLA achieves notable improvements in mAP of +3.8%, +7.1% and +12.1%, respectively.
翻译:在实际应用中开发文档布局分析(DLA)模型之前,对其开展全面的鲁棒性测试至关重要。然而,现有文献中针对DLA模型鲁棒性的研究仍显不足。为此,我们首次提出了DLA模型的鲁棒性基准测试,该基准测试包含三个数据集的45万张文档图像。为覆盖真实场景中的图像退化现象,我们基于实际文档处理流程提出了包含36种常见文档扰动的扰动分类体系。此外,为更好理解文档扰动的影响,我们设计了两个评估指标:用于扰动评估的平均扰动效应(mPE)和用于鲁棒性评估的平均鲁棒性退化(mRD)。更进一步,我们提出了同名模型——鲁棒文档布局分析器(RoDLA),该模型通过改进注意力机制增强鲁棒特征的提取能力。在提出的基准测试集(PubLayNet-P、DocLayNet-P和M$^6$Doc-P)上的实验表明,RoDLA分别取得了115.7、135.4和150.4的最优mRD分数。与先前方法相比,RoDLA在mAP指标上分别实现了+3.8%、+7.1%和+12.1%的显著提升。