Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table. However, real-world queries are complex in nature, often over multiple tables in a relational database or web page. Single table questions do not involve common table operations such as set operations, Cartesian products (joins), or nested queries. Furthermore, multi-table operations often result in a tabular output, which necessitates table generation capabilities of tabular QA models. To fill this gap, we propose a new task of answering questions over multiple tables. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers. To enable effective training, we build a pre-training dataset comprising of 132,645 SQL queries and tabular answers. Further, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. MultiTabQA outperforms state-of-the-art single table QA models adapted to a multi-table QA setting by finetuning on three datasets: Spider, Atis and GeoQuery.
翻译:近期基于大语言模型的表格问答技术受限于覆盖范围,仅能解答单表格相关问题。然而现实场景中的复杂查询往往涉及关系数据库或网页中的多张表格。单表格问题无法涵盖集合运算、笛卡尔积(连接操作)或嵌套查询等常规表格操作。此外,多表格操作常产生表格形式的输出,这要求表格问答模型具备表格生成能力。为填补这一空白,我们提出多表格问答新任务。我们的模型MultiTabQA不仅能够解答多表格问题,还能泛化生成表格形式的答案。为实现高效训练,我们构建了包含132,645条SQL查询及其对应表格答案的预训练数据集。进一步地,我们引入不同严格程度的表格专用评估指标,从表格结构的多个粒度层级对生成表格进行评测。实验表明,在Spider、Atis和GeoQuery三个数据集上,MultiTabQA通过微调策略显著优于经改造适配多表格问答场景的现有单表格问答模型。