While current tasks of converting natural language to SQL (NL2SQL) using Foundation Models have shown impressive achievements, adapting these approaches for converting natural language to Graph Query Language (NL2GQL) encounters hurdles due to the distinct nature of GQL compared to SQL, alongside the diverse forms of GQL. Moving away from traditional rule-based and slot-filling methodologies, we introduce a novel approach, $R^3$-NL2GQL, integrating both small and large Foundation Models for ranking, rewriting, and refining tasks. This method leverages the interpretative strengths of smaller models for initial ranking and rewriting stages, while capitalizing on the superior generalization and query generation prowess of larger models for the final transformation of natural language queries into GQL formats. Addressing the scarcity of datasets in this emerging field, we have developed a bilingual dataset, sourced from graph database manuals and selected open-source Knowledge Graphs (KGs). Our evaluation of this methodology on this dataset demonstrates its promising efficacy and robustness.
翻译:尽管当前利用基础模型将自然语言转换为SQL(NL2SQL)的任务已展现出令人瞩目的成果,但将这些方法适配于自然语言到图查询语言(NL2GQL)的转换却面临挑战,这主要源于GQL与SQL的本质差异以及GQL本身形式的多样性。我们摒弃了传统的基于规则和槽填充的方法,提出了一种新颖的R^3-NL2GQL方法,该方法整合了小型与大型基础模型,分别用于排序、重写和精炼任务。此方法利用较小模型在解释性上的优势进行初始的排序和重写阶段,同时借助较大模型卓越的泛化能力和查询生成能力,完成自然语言查询到GQL格式的最终转换。针对这一新兴领域数据集稀缺的问题,我们构建了一个双语数据集,其数据来源于图数据库手册及选定的开源知识图谱。我们在此数据集上对该方法进行的评估,证明了其良好的有效性和鲁棒性。