Large Language Models enable users to access database using natural language interfaces using tools like Text2SQL, Text2SPARQL, and Text2Cypher, which translate user questions into structured database queries. While these systems improve database accessibility, most research focuses on English with limited multilingual support. This work investigates a scalable multilingual Text2Cypher, aiming to support new languages without re-running full fine-tuning, avoiding manual hyper-parameter tuning, and maintaining performance close to joint multilingual fine-tuning. We train language-specific LoRA adapters for English, Spanish, and Turkish and combined them via uniform linear merging or learned fusion MLP with dynamic gating. Experimental results show that the fusion MLP recovers around 75\% of the accuracy gains from joint multilingual fine-tuning while requiring only a smaller subset of the data, outperforming linear merging across all three languages. This approach enables incremental language expansion to new languages by requiring only one LoRA adapter and a lightweight MLP retraining. Learned adapter fusion offers a practical alternative to expensive joint fine-tuning, balancing performance, data efficiency, and scalability for multilingual Text2Cypher task.
翻译:大型语言模型使得用户能够通过Text2SQL、Text2SPARQL和Text2Cypher等工具,以自然语言界面访问数据库,这些工具将用户问题转化为结构化的数据库查询。尽管此类系统提升了数据库的可访问性,但现有研究大多集中于英语,多语言支持有限。本研究探索了一种可扩展的多语言Text2Cypher方法,旨在支持新语言时无需重新进行完整微调,避免人工超参数调优,同时保持接近联合多语言微调的性能。我们针对英语、西班牙语和土耳其语训练了语言特定的LoRA适配器,并通过均匀线性合并或带动态门控的学习融合MLP将其组合。实验结果表明,融合MLP恢复了约75%的联合多语言微调带来的准确率提升,且仅需较少的数据子集,在三种语言上均优于线性合并。该方法仅需一个LoRA适配器和轻量级MLP重训练即可实现向新语言的增量扩展,为多语言Text2Cypher任务提供了一种平衡性能、数据效率与可扩展性的实用替代方案,避免了昂贵的联合微调。