Large Language Models enable users to access database using natural language interfaces using tools like Text2SQL, Text2SPARQL, and Text2Cypher, which translate user questions into structured database queries. While these systems improve database accessibility, most research focuses on English with limited multilingual support. This work investigates a scalable multilingual Text2Cypher, aiming to support new languages without re-running full fine-tuning, avoiding manual hyper-parameter tuning, and maintaining performance close to joint multilingual fine-tuning. We train language-specific LoRA adapters for English, Spanish, and Turkish and combined them via uniform linear merging or learned fusion MLP with dynamic gating. Experimental results show that the fusion MLP recovers around 75\% of the accuracy gains from joint multilingual fine-tuning while requiring only a smaller subset of the data, outperforming linear merging across all three languages. This approach enables incremental language expansion to new languages by requiring only one LoRA adapter and a lightweight MLP retraining. Learned adapter fusion offers a practical alternative to expensive joint fine-tuning, balancing performance, data efficiency, and scalability for multilingual Text2Cypher task.
翻译:大语言模型通过Text2SQL、Text2SPARQL和Text2Cypher等工具,将用户问题转化为结构化数据库查询,使用户能够利用自然语言界面访问数据库。尽管这些系统提升了数据库的可访问性,但现有研究主要聚焦于英语,对多语言支持有限。本研究探索了一种可扩展的多语言Text2Cypher方法,旨在支持新增语言时无需重新执行完整微调、避免手动超参数调优,并保持与联合多语言微调相近的性能。我们为英语、西班牙语和土耳其语训练了语言特定的LoRA适配器,并通过均匀线性合并或基于动态门控的融合MLP进行组合。实验结果表明,融合MLP能够恢复约75%的联合多语言微调精度提升,且仅需使用更小的数据子集,在三种语言上均优于线性合并方法。该方法通过仅需一个LoRA适配器和轻量级MLP重训练,支持逐步扩展至新语言。学习的适配器融合为昂贵的联合微调提供了一种实用替代方案,在性能、数据效率和可扩展性之间取得了平衡,适用于多语言Text2Cypher任务。