Open-domain text-to-SQL is an important task that retrieves question-relevant tables from massive databases and then generates SQL. However, existing retrieval methods that retrieve in a single hop do not pay attention to the text-to-SQL challenge of schema linking, which is aligning the entities in the question with table entities, reflected in two aspects: similar irrelevant entity and domain mismatch entity. Therefore, we propose our method, the multi-hop table retrieval with rewrite and beam search (Murre). To reduce the effect of the similar irrelevant entity, our method focuses on unretrieved entities at each hop and considers the low-ranked tables by beam search. To alleviate the limitation of domain mismatch entity, Murre rewrites the question based on retrieved tables in multiple hops, decreasing the domain gap with relevant tables. We conduct experiments on SpiderUnion and BirdUnion+, reaching new state-of-the-art results with an average improvement of 6.38%.
翻译:开放域文本到SQL是一项重要任务,旨在从海量数据库中检索与问题相关的表格,并生成SQL语句。然而,现有单跳检索方法未关注文本到SQL中的模式链接挑战——即对齐问题中的实体与表格实体,该问题体现在两方面:相似无关实体和领域不匹配实体。为此,我们提出基于重写和束搜索的多跳表格检索方法(Murre)。为降低相似无关实体的影响,该方法在每一跳中聚焦于未检索到的实体,并通过束搜索考虑低排名的表格。为缓解领域不匹配实体的限制,Murre在多跳中基于已检索的表格重写问题,从而缩小与相关表格的领域差距。我们在SpiderUnion和BirdUnion+数据集上开展实验,平均提升6.38%,达到了新的最优结果。