Recent advances in tabular question answering (QA) with large language models are constrained in their coverage and only answer questions over a single table. However, real-world queries are complex in nature, often over multiple tables in a relational database or web page. Single table questions do not involve common table operations such as set operations, Cartesian products (joins), or nested queries. Furthermore, multi-table operations often result in a tabular output, which necessitates table generation capabilities of tabular QA models. To fill this gap, we propose a new task of answering questions over multiple tables. Our model, MultiTabQA, not only answers questions over multiple tables, but also generalizes to generate tabular answers. To enable effective training, we build a pre-training dataset comprising of 132,645 SQL queries and tabular answers. Further, we evaluate the generated tables by introducing table-specific metrics of varying strictness assessing various levels of granularity of the table structure. MultiTabQA outperforms state-of-the-art single table QA models adapted to a multi-table QA setting by finetuning on three datasets: Spider, Atis and GeoQuery.
翻译:近期基于大语言模型的表格问答(QA)研究受限于其覆盖范围,仅能回答针对单张表格的问题。然而现实世界的查询具有复杂本质,常涉及关系数据库或网页中的多张表格。单表问题不涉及集合运算、笛卡尔积(连接)或嵌套查询等常见表格操作。此外,多表操作常产生表格形式的输出,这对表格问答模型的表格生成能力提出了需求。为填补这一空白,我们提出一项针对多张表格进行问答的新任务。我们的模型MultiTabQA不仅能回答多表问题,还能泛化生成表格形式的答案。为实现有效训练,我们构建了一个包含132,645个SQL查询及对应表格答案的预训练数据集。进一步地,我们通过引入不同严格程度的表格专用评估指标,对生成表格的结构粒度进行多层级评估。在Spider、Atis和GeoQuery三个数据集上进行微调后,MultiTabQA在面向多表问答场景的适配性能上,全面超越了当前最优的单表问答模型。