DONUT-hole: DONUT Sparsification by Harnessing Knowledge and Optimizing Learning Efficiency

This paper introduces DONUT-hole, a sparse OCR-free visual document understanding (VDU) model that addresses the limitations of its predecessor model, dubbed DONUT. The DONUT model, leveraging a transformer architecture, overcoming the challenges of separate optical character recognition (OCR) and visual semantic understanding (VSU) components. However, its deployment in production environments and edge devices is hindered by high memory and computational demands, particularly in large-scale request services. To overcome these challenges, we propose an optimization strategy based on knowledge distillation and model pruning. Our paradigm to produce DONUT-hole, reduces the model denisty by 54\% while preserving performance. We also achieve a global representational similarity index between DONUT and DONUT-hole based on centered kernel alignment (CKA) metric of 0.79. Moreover, we evaluate the effectiveness of DONUT-hole in the document image key information extraction (KIE) task, highlighting its potential for developing more efficient VDU systems for logistic companies.

翻译：本文提出DONUT-hole，一种稀疏无OCR视觉文档理解（VDU）模型，旨在解决其前身模型DONUT的局限性。DONUT模型基于Transformer架构，克服了独立光学字符识别（OCR）与视觉语义理解（VSU）组件的挑战。然而，由于对内存和计算资源的高需求，特别是在大规模请求服务中，该模型在生产环境与边缘设备上的部署受到阻碍。为应对这些挑战，我们提出一种基于知识蒸馏与模型剪枝的优化策略。通过该范式生成的DONUT-hole，在保持性能的同时将模型密度降低54%。我们基于中心核对齐（CKA）指标，实现了DONUT与DONUT-hole之间0.79的全局表征相似性指数。此外，我们在文档图像关键信息提取（KIE）任务中评估了DONUT-hole的有效性，突显了其为物流公司开发更高效VDU系统的潜力。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【WSDM2020】超越统计关系：将知识关系整合到多标签音乐风格分类的风格关联中（附pdf）

专知会员服务

18+阅读 · 2019年11月23日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日