We present SpotIt+, an open-source tool for evaluating Text-to-SQL systems via bounded equivalence verification. Given a generated SQL query and the ground truth, SpotIt+ actively searches for database instances that differentiate the two queries. To ensure that the generated counterexamples reflect practically relevant discrepancies, we introduce a constraint-mining pipeline that combines rule-based specification mining over example databases with LLM-based validation. Experimental results on the BIRD dataset show that the mined constraints enable SpotIt+ to generate more realistic differentiating databases, while preserving its ability to efficiently uncover numerous discrepancies between generated and gold SQL queries that are missed by standard test-based evaluation.
翻译:我们提出SpotIt+,一个通过有界等价性验证评估Text-to-SQL系统的开源工具。给定生成的SQL查询与真实查询,SpotIt+主动搜索能够区分这两个查询的数据库实例。为确保生成的反例反映实际相关的差异,我们引入了一个约束挖掘流水线,该流水线将基于规则的示例数据库规约挖掘与基于LLM的验证相结合。在BIRD数据集上的实验结果表明,所挖掘的约束使得SpotIt+能够生成更真实的区分性数据库,同时保持其高效发现标准测试评估所遗漏的生成SQL与标准SQL之间众多差异的能力。