Understanding documents is central to many real-world tasks but remains a challenging topic. Unfortunately, there is no well-established consensus on how to comprehensively evaluate document understanding abilities, which significantly hinders the fair comparison and measuring the progress of the field. To benchmark document understanding researches, this paper summarizes four representative abilities, i.e., document classification, document structural analysis, document information extraction, and document transcription. Under the new evaluation framework, we propose \textbf{Document Language Understanding Evaluation} -- \textbf{DLUE}, a new task suite which covers a wide-range of tasks in various forms, domains and document genres. We also systematically evaluate six well-established transformer models on DLUE, and find that due to the lengthy content, complicated underlying structure and dispersed knowledge, document understanding is still far from being solved, and currently there is no neural architecture that dominates all tasks, raising requirements for a universal document understanding architecture.
翻译:文档理解是众多真实世界任务的核心,但至今仍是一个具有挑战性的研究领域。遗憾的是,目前尚未建立如何全面评估文档理解能力的共识,这严重阻碍了该领域的公平比较与进展度量。为基准化文档理解研究,本文归纳了四项代表性能力:文档分类、文档结构分析、文档信息抽取和文档转录。基于这一新的评估框架,我们提出**文档语言理解评估(DLUE)**——一个覆盖多种形式、领域与文档体裁的广泛任务集合。我们系统评估了六种成熟的Transformer模型在DLUE上的表现,发现由于文档内容冗长、底层结构复杂且知识分布分散,文档理解问题远未得到解决,且当前不存在能主导所有任务的神经架构,这凸显了对通用文档理解架构的需求。