Table answering questions from business documents has many challenges that require understanding tabular structures, cross-document referencing, and additional numeric computations beyond simple search queries. This paper introduces a novel pipeline, named TabIQA, to answer questions about business document images. TabIQA combines state-of-the-art deep learning techniques 1) to extract table content and structural information from images and 2) to answer various questions related to numerical data, text-based information, and complex queries from structured tables. The evaluation results on VQAonBD 2023 dataset demonstrate the effectiveness of TabIQA in achieving promising performance in answering table-related questions. The TabIQA repository is available at https://github.com/phucty/itabqa.
翻译:从商业文档中进行表格问答面临诸多挑战,包括理解表格结构、跨文档引用,以及除简单搜索查询外的额外数值计算。本文提出了一种名为TabIQA的新型流水线,用于回答有关商业文档图像的问题。TabIQA结合了最先进的深度学习技术:1)从图像中提取表格内容和结构信息;2)基于结构化表格,回答与数值数据、文本信息及复杂查询相关的各类问题。在VQAonBD 2023数据集上的评估结果表明,TabIQA在表格相关问答任务中表现出色,取得了令人满意的性能。TabIQA代码库已开源,地址为https://github.com/phucty/itabqa。