Recent studies show that large language models (LLMs) are powerful tools for working with natural language, bringing advances in many areas of computational linguistics. However, these models face challenges when applied to low-resource languages due to limited training data and difficulty in understanding cultural nuances. In this paper, we propose QueEn, a novel approach for Quechua-English translation that combines Retrieval-Augmented Generation (RAG) with parameter-efficient fine-tuning techniques. Our method leverages external linguistic resources through RAG and uses Low-Rank Adaptation (LoRA) for efficient model adaptation. Experimental results show that our approach substantially exceeds baseline models, with a BLEU score of 17.6 compared to 1.5 for standard GPT models. The integration of RAG with fine-tuning allows our system to address the challenges of low-resource language translation while maintaining computational efficiency. This work contributes to the broader goal of preserving endangered languages through advanced language technologies.
翻译:近期研究表明,大语言模型(LLMs)是处理自然语言的强大工具,为计算语言学的诸多领域带来了进展。然而,由于训练数据有限且难以理解文化细微差别,这些模型在应用于低资源语言时面临挑战。本文提出QueEn,一种结合检索增强生成(RAG)与参数高效微调技术的克丘亚语-英语翻译新方法。我们的方法通过RAG利用外部语言资源,并采用低秩适应(LoRA)实现高效的模型适配。实验结果表明,我们的方法显著超越基线模型,BLEU得分达到17.6,而标准GPT模型仅为1.5。RAG与微调的结合使我们的系统能够应对低资源语言翻译的挑战,同时保持计算效率。这项工作通过先进的语言技术,为保护濒危语言这一更广泛的目标做出了贡献。