Incorporating external knowledge bases in traditional retrieval-augmented generation (RAG) relies on parsing the document, followed by querying a language model with the parsed information via in-context learning. While effective for text-based documents, question answering on tabular documents often fails to generate plausible responses. Standard parsing techniques lose the two-dimensional structural semantics critical for cell interpretation. In this work, we present TabRAG, a parsing-based RAG framework designed to improve tabular document question answering via structured representations. Our framework consists of layout segmentation that decomposes the document inputs into a series of components, enabling fine-grained extraction. Subsequently, a vision language model parses and extracts the document tables into a hierarchically structured representation. In order to cater various table styles and formats, we integrate a self-generated in-context learning module that guides the table extraction process. Experimental results demonstrate that TabRAG outperforms existing popular parsing techniques across a broad suite of evaluation and ablation benchmarks. Code is available at: https://github.com/jacobyhsi/TabRAG.
翻译:在传统的检索增强生成(RAG)中融入外部知识库,通常依赖于对文档进行解析,然后通过上下文学习将解析后的信息输入语言模型进行查询。虽然这种方法对于基于文本的文档是有效的,但在处理表格文档的问答任务时,往往难以生成合理的回答。标准的解析技术会丢失对单元格解释至关重要的二维结构语义。本文提出了TabRAG,一个基于解析的RAG框架,旨在通过结构化表示来改进表格文档的问答性能。我们的框架包含布局分割模块,该模块将文档输入分解为一系列组件,从而实现细粒度的信息提取。随后,一个视觉语言模型对文档中的表格进行解析和提取,将其转化为层次化的结构化表示。为了适应不同的表格样式和格式,我们集成了一个自生成的上下文学习模块,以指导表格提取过程。实验结果表明,在一系列广泛的评估和消融基准测试中,TabRAG的表现优于现有的主流解析技术。代码发布于:https://github.com/jacobyhsi/TabRAG。