We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.
翻译:我们提出SpotIt+,一款通过有界等价验证评估文本到SQL系统的开源工具。给定生成的SQL查询与真实标注,SpotIt+主动搜索能够区分这两个查询的数据库实例。为确保生成的反例反映实际相关的差异,我们引入了一种约束挖掘流程,该流程结合了基于规则的示例数据库规范挖掘与基于LLM的验证。在BIRD数据集上的实验结果表明,挖掘的约束使SpotIt+能够生成更真实的区分性数据库,同时保持其高效发现生成SQL查询与标注SQL查询之间大量差异的能力,这些差异在基于测试的标准评估中常被遗漏。