We call on the Document AI (DocAI) community to reevaluate current methodologies and embrace the challenge of creating more practically-oriented benchmarks. Document Understanding Dataset and Evaluation (DUDE) seeks to remediate the halted research progress in understanding visually-rich documents (VRDs). We present a new dataset with novelties related to types of questions, answers, and document layouts based on multi-industry, multi-domain, and multi-page VRDs of various origins, and dates. Moreover, we are pushing the boundaries of current methods by creating multi-task and multi-domain evaluation setups that more accurately simulate real-world situations where powerful generalization and adaptation under low-resource settings are desired. DUDE aims to set a new standard as a more practical, long-standing benchmark for the community, and we hope that it will lead to future extensions and contributions that address real-world challenges. Finally, our work illustrates the importance of finding more efficient ways to model language, images, and layout in DocAI.
翻译:我们呼吁文档AI(DocAI)社区重新审视现有方法论,迎接构建更贴近实际应用的基准测试的挑战。文档理解数据集与评测(DUDE)旨在解决富含视觉信息文档(VRDs)领域研究进展停滞的问题。我们提出一个基于多行业、多领域、多来源及多时间的多页VRDs的新数据集,在问题类型、答案形式与文档布局方面均体现创新性。此外,我们通过构建多任务、多领域的评估框架,更精确地模拟实际场景中期望的低资源条件下强泛化与适应能力,从而突破现有方法的边界。DUDE致力于为社区树立一个更实用、可持续的基准标杆,并期望推动后续扩展与贡献以解决真实世界挑战。最终,本工作揭示了在DocAI中寻找更高效的语言、图像与布局建模方法的重要性。