Open-domain text-to-SQL is an important task that retrieves question-relevant tables from massive databases and then generates SQL. However, existing retrieval methods that retrieve in a single hop do not pay attention to the text-to-SQL challenge of schema linking, which is aligning the entities in the question with table entities, reflected in two aspects: similar irrelevant entity and domain mismatch entity. Therefore, we propose our method, the multi-hop table retrieval with rewrite and beam search (Murre). To reduce the effect of the similar irrelevant entity, our method focuses on unretrieved entities at each hop and considers the low-ranked tables by beam search. To alleviate the limitation of domain mismatch entity, Murre rewrites the question based on retrieved tables in multiple hops, decreasing the domain gap with relevant tables. We conduct experiments on SpiderUnion and BirdUnion+, reaching new state-of-the-art results with an average improvement of 6.38%.
翻译:开放域文本到SQL是一项重要任务,其目标是从海量数据库中检索与问题相关的表格,进而生成SQL查询。然而,现有单跳检索方法未能充分关注文本到SQL任务中的模式链接挑战——即将问题中的实体与表格实体对齐,这一挑战主要体现在两方面:相似无关实体和领域不匹配实体。为此,我们提出了一种基于重写与波束搜索的多跳表格检索方法(Murre)。为降低相似无关实体的影响,本方法在每跳检索中聚焦于未检索到的实体,并通过波束搜索考量低排名表格。为缓解领域不匹配实体的局限,Murre基于多跳检索到的表格对问题进行重写,从而缩小与相关表格的领域差距。我们在SpiderUnion和BirdUnion+数据集上进行了实验,取得了平均6.38%的性能提升,达到了新的最先进水平。