Object detection in documents is a key step to automate the structural elements identification process in a digital or scanned document through understanding the hierarchical structure and relationships between different elements. Large and complex models, while achieving high accuracy, can be computationally expensive and memory-intensive, making them impractical for deployment on resource constrained devices. Knowledge distillation allows us to create small and more efficient models that retain much of the performance of their larger counterparts. Here we present a graph-based knowledge distillation framework to correctly identify and localize the document objects in a document image. Here, we design a structured graph with nodes containing proposal-level features and edges representing the relationship between the different proposal regions. Also, to reduce text bias an adaptive node sampling strategy is designed to prune the weight distribution and put more weightage on non-text nodes. We encode the complete graph as a knowledge representation and transfer it from the teacher to the student through the proposed distillation loss by effectively capturing both local and global information concurrently. Extensive experimentation on competitive benchmarks demonstrates that the proposed framework outperforms the current state-of-the-art approaches. The code will be available at: https://github.com/ayanban011/GraphKD.
翻译:文档中的目标检测是通过理解层次化结构及不同元素间关系,实现数字或扫描文档中结构元素自动识别的关键步骤。大型复杂模型虽能实现高精度,但其计算开销与内存消耗较大,难以部署于资源受限设备。知识蒸馏技术可创建保留大模型大部分性能的小型高效模型。本文提出一种基于图的知识蒸馏框架,用于正确识别与定位文档图像中的文档对象。我们设计了一种结构化图:节点包含提议区域特征,边表示不同提议区域之间的关系。此外,为降低文本偏差,设计了一种自适应节点采样策略,通过修剪权重分布并赋予非文本节点更高权重。我们以知识表示形式对整个图进行编码,通过所提出的蒸馏损失,同步有效捕捉局部与全局信息,将知识从教师网络传递至学生网络。在多个权威基准上的广泛实验表明,所提框架性能优于现有最优方法。代码将发布于:https://github.com/ayanban011/GraphKD。