Large language models (LLMs) allow users to query databases using natural language by translating questions into executable queries. Despite strong progress on tasks such as Text2SQL, Text2SPARQL, and Text2Cypher, most existing methods focus on better prompting, fine-tuning, or iterative refinement. However, they often do not explicitly enforce structural constraints, such as syntactic validity and schema consistency. This can reduce reliability, since generated queries must satisfy both syntax rules and database schema constraints to be executable. In this work, we study how structured constraints can be used in test-time inference for Text2Cypher. We focus on post-generation validation to improve query correctness. We extend a confidence-based inference framework with a sequential filtering process that combines confidence scoring, grammar validation, and schema constraints before final aggregation. This lets us analyze how different constraint types affect generated queries. Our experiments with two instruction-tuned models show that grammar-based filtering improves syntactic validity. Schema-aware filtering further improves execution quality by enforcing consistency with the database structure. However, stronger filtering also increases the number of empty predictions and reduces execution coverage. Overall, we show that adding simple structural checks at test time improves the reliability of Text2Cypher generation, and we provide a clearer view of how syntax and schema constraints contribute differently.
翻译:大语言模型(LLMs)允许用户通过将自然语言问题转换为可执行查询来查询数据库。尽管在Text2SQL、Text2SPARQL和Text2Cypher等任务上取得了显著进展,但现有方法大多侧重于改进提示工程、微调或迭代优化。然而,这些方法通常未显式强制结构约束(如语法有效性和模式一致性),这可能降低可靠性——因为生成的查询必须同时满足语法规则和数据库模式约束才能执行。本研究探讨了如何在Text2Cypher的测试时推理中应用结构约束,重点通过生成后验证提升查询正确性。我们扩展了一个基于置信度的推理框架,引入顺序过滤流程,在最终聚合前结合了置信度评分、语法验证和模式约束。这使得我们能够分析不同类型约束对生成查询的影响。基于两个指令微调模型的实验表明:基于语法的过滤可提升句法有效性;模式感知过滤通过强制与数据库结构的一致性进一步提高执行质量。然而,更强的过滤也会导致空预测数量增加并降低执行覆盖率。总体而言,本研究证明了在测试时添加简单结构检查可提升Text2Cypher生成的可靠性,并清晰阐释了语法约束与模式约束的差异化贡献。