We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins, and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI.
翻译:我们呼吁文档人工智能(DocAI)社区重新审视当前的方法论,并迎接创建更具实践导向基准的挑战。文档理解数据集与评估(DUDE)旨在弥补视觉丰富文档(VRD)理解研究进展停滞的现状。我们提出一个全新的数据集,其在问题类型、答案形式及文档布局方面具有创新性,涵盖了来自不同行业、领域、来源及时间跨度的多页VRD。此外,我们通过构建多任务与多领域评估框架,突破当前方法的边界,更真实地模拟实际应用场景——即要求在低资源环境下实现强大的泛化与自适应能力。DUDE旨在为社区树立一个更实用、更持久的基准新标准,并期望其能推动未来针对真实世界挑战的扩展与贡献。最后,我们的工作揭示了在文档AI中寻找更高效的文本、图像与布局建模方法的重要性。