Language models acting as agents over knowledge graphs generate Cypher queries that fail structurally (crashing at the database) or semantically (executing but returning wrong results). We place a pre-execution gate between query generation and a production Neo4j database. The gate validates structure through a four-backend chain culminating in execution against a mirror graph at 5.6 ms median latency. Structurally broken queries are routed to a corrector that iterates structured error feedback through a language model. On seven CypherBench schemas (2348 questions, ACL 2025) the pipeline maintains generation accuracy on every model tested, confirming it operates as a safe defensive layer. The corrector achieves 81% to 95% success across five models (mean 89%). On a template-generated corpus across nine schemas the gate catches 100% of parse errors, 100% of constraint violations, and 100% of schema-reference errors in path queries with labelled endpoints, at zero false positives across 1135 queries. Property sibling-swaps where the substituted name is valid on the target label score 0%, marking the formal boundary where structural validation ends and semantic validation must begin. A planner-based cost gate flags catastrophic plan structures before execution.
翻译:摘要:作为智能体在知识图谱上运作的语言模型生成的Cypher查询,可能出现结构性失败(在数据库端崩溃)或语义性失败(执行但返回错误结果)。我们在查询生成与生产级Neo4j数据库之间设置了一个预执行门控。该门控通过四后端链验证查询结构,最终在镜像图上执行,中位延迟为5.6毫秒。结构性错误的查询被导向一个校正器,该校正器通过语言模型迭代结构化错误反馈。在七个CypherBench模式(2348个问题,ACL 2025)上的测试中,该流水线在所有受测模型上均保持了生成准确率,证实其作为安全防御层的有效性。该校正器在五个模型上的成功率达81%至95%(平均89%)。在覆盖九个模式的模板生成语料库上,该门控针对带标记端点的路径查询,捕获了100%的解析错误、100%的约束违反和100%的模式引用错误,在1135个查询中实现零误报。当替换名称在目标标签上有效时,属性同层交换(property sibling-swap)得分为0%,这标志着结构化验证结束与语义验证开始的形式化边界。基于规划器的成本门控在执行前对灾难性计划结构进行标记。