Practical semantic parsers are expected to understand user utterances and map them to executable programs, even when these are ambiguous. We introduce a new benchmark, AMBROSIA, which we hope will inform and inspire the development of text-to-SQL parsers capable of recognizing and interpreting ambiguous requests. Our dataset contains questions showcasing three different types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. In each case, the ambiguity persists even when the database context is provided. This is achieved through a novel approach that involves controlled generation of databases from scratch. We benchmark various LLMs on AMBROSIA, revealing that even the most advanced models struggle to identify and interpret ambiguity in questions.
翻译:实用的语义解析器需要能够理解用户话语并将其映射为可执行程序,即使这些话语存在模糊性。我们提出了一个新的基准测试AMBROSIA,旨在启发和推动能够识别并解释模糊请求的文本到SQL解析器的开发。我们的数据集包含展示三种不同类型模糊性(范围模糊、依附模糊和表述模糊)的问题、其不同解释以及相应的SQL查询。在每种情况下,即使提供了数据库上下文,模糊性依然存在。这是通过一种新颖的、从零开始受控生成数据库的方法实现的。我们在AMBROSIA上对多种大语言模型进行了基准测试,结果表明即使是最先进的模型也难以识别和解释问题中的模糊性。