While current NL2SQL tasks constructed using Foundation Models have achieved commendable results, their direct application to Natural Language to Graph Query Language (NL2GQL) tasks poses challenges due to the significant differences between GQL and SQL expressions, as well as the numerous types of GQL. Our extensive experiments reveal that in NL2GQL tasks, larger Foundation Models demonstrate superior cross-schema generalization abilities, while smaller Foundation Models struggle to improve their GQL generation capabilities through fine-tuning. However, after fine-tuning, smaller models exhibit better intent comprehension and higher grammatical accuracy. Diverging from rule-based and slot-filling techniques, we introduce R3-NL2GQL, which employs both smaller and larger Foundation Models as reranker, rewriter and refiner. The approach harnesses the comprehension ability of smaller models for information reranker and rewriter, and the exceptional generalization and generation capabilities of larger models to transform input natural language queries and code structure schema into any form of GQLs. Recognizing the lack of established datasets in this nascent domain, we have created a bilingual dataset derived from graph database documentation and some open-source Knowledge Graphs (KGs). We tested our approach on this dataset and the experimental results showed that delivers promising performance and robustness.Our code and dataset is available at https://github.com/zhiqix/NL2GQL
翻译:尽管当前基于基础模型构建的NL2SQL任务已取得可观成果,但由于图查询语言(GQL)与SQL在表达方式上的显著差异以及GQL类型的多样性,将这些方法直接应用于自然语言到图查询语言(NL2GQL)任务仍面临挑战。我们的广泛实验表明,在NL2GQL任务中,较大规模的基础模型展现出更优的跨模式泛化能力,而较小规模的基础模型难以通过微调提升其GQL生成能力。然而,经微调后,较小模型展现出更强的意图理解能力和更高的语法准确性。与基于规则和槽填充的技术不同,我们提出了R³-NL2GQL方法,该方法同时利用较小和较大规模的基础模型分别作为重排序器、改写器和优化器。该技术利用较小模型在信息重排序和改写方面的理解能力,结合较大模型卓越的泛化与生成能力,将输入的自然语言查询和代码结构模式转化为任意形式的GQL。考虑到这一新兴领域缺乏基准数据集,我们基于图数据库文档和部分开源知识图谱(KG)构建了一个双语数据集。在该数据集上进行的实验结果表明,该方法展现出优异的性能与鲁棒性。我们的代码和数据集已开源至https://github.com/zhiqix/NL2GQL。