Retrieving relevant tables containing the necessary information to accurately answer a given question over tables is critical to open-domain question-answering (QA) systems. Previous methods assume the answer to such a question can be found either in a single table or multiple tables identified through question decomposition or rewriting. However, neither of these approaches is sufficient, as many questions require retrieving multiple tables and joining them through a join plan that cannot be discerned from the user query itself. If the join plan is not considered in the retrieval stage, the subsequent steps of reasoning and answering based on those retrieved tables are likely to be incorrect. To address this problem, we introduce a method that uncovers useful join relations for any query and database during table retrieval. We use a novel re-ranking method formulated as a mixed-integer program that considers not only table-query relevance but also table-table relevance that requires inferring join relationships. Our method outperforms the state-of-the-art approaches for table retrieval by up to 9.3% in F1 score and for end-to-end QA by up to 5.4% in accuracy.
翻译:从开放域问答系统中准确检索包含所需信息的相关表格,对于正确回答基于表格的问题至关重要。现有方法假设此类问题的答案要么存在于单个表格中,要么通过问题分解或改写确定的多个表格中。然而,这两种方法均不够充分,因为许多问题需要检索多个表格,并通过用户查询本身无法明确推断的连接计划将其关联。若在检索阶段未考虑连接计划,后续基于检索到的表格进行推理和回答的步骤很可能产生错误。为解决这一问题,我们提出了一种方法,能在表格检索过程中为任意查询和数据库揭示有用的连接关系。我们采用了一种新颖的重排序方法,该方法被形式化为混合整数规划,不仅考虑表格与查询的相关性,还考虑需要推断连接关系的表格间相关性。实验表明,我们的方法在表格检索的F1分数上相比现有最先进方法最高提升9.3%,在端到端问答准确率上最高提升5.4%。