Open-domain table question answering aims to provide answers to a question by retrieving and extracting information from a large collection of tables. Existing studies of open-domain table QA either directly adopt text retrieval methods or consider the table structure only in the encoding layer for table retrieval, which may cause syntactical and structural information loss during table scoring. To address this issue, we propose a syntax- and structure-aware retrieval method for the open-domain table QA task. It provides syntactical representations for the question and uses the structural header and value representations for the tables to avoid the loss of fine-grained syntactical and structural information. Then, a syntactical-to-structural aggregator is used to obtain the matching score between the question and a candidate table by mimicking the human retrieval process. Experimental results show that our method achieves the state-of-the-art on the NQ-tables dataset and overwhelms strong baselines on a newly curated open-domain Text-to-SQL dataset.
翻译:开放域表格问答旨在通过从大规模表格集合中检索和抽取信息,为自然语言问题提供答案。现有开放域表格问答研究或直接采用文本检索方法,或仅在表格检索的编码层考虑表格结构,这可能导致在表格评分过程中丢失句法与结构信息。针对此问题,我们提出了一种适用于开放域表格问答任务的语法与结构感知检索方法。该方法为问题提供语法表示,同时利用表格的结构化表头与数值表示,以避免细粒度语法及结构信息的丢失。随后,通过模拟人类检索过程的语法到结构聚合器,计算问题与候选表格的匹配得分。实验结果表明,本方法在NQ-tables数据集上达到了最优性能,并在新构建的开放域Text-to-SQL数据集上显著超越了强基线方法。