Large Language Models (LLMs) offer new potential for translation tasks but often experience performance degradation when handling low-resource languages. To address this limitation, we propose an approach for fine-tuning LLMs on a low-resource language, Kupang Malay. Our approach involves designing a set of instructions by leveraging explicit lexical and semantic features from a bilingual dictionary, and introducing Continual Instruction Tuning (CIT), a training paradigm that enables iterative instruction-based training. Experimental results demonstrate that our model, named Lius, yields notable improvements over standard instruction-tuned models by outperforming 4-6 points, and surpassing both Neural Machine Translation (NMT) and Multilingual LLM models by 10-13 points on several evaluation metrics. These findings highlight the potential of our approach to mitigate the reliance on large-scale parallel data in low-resource language translation.
翻译:大规模语言模型(LLMs)为翻译任务提供了新潜力,但在处理低资源语言时往往会出现性能下降。为解决这一局限,我们提出了一种针对低资源语言——库邦马来语——微调LLMs的方法。该方法通过利用双语词典中的显式词汇和语义特征设计一组指令,并引入持续指令微调(CIT)——一种支持迭代式基于指令训练的范式。实验结果表明,我们的模型Lius相较于标准指令微调模型取得了显著提升,在多项评估指标上超出4-6个点,同时以10-13个点的优势超越了神经机器翻译(NMT)和多语言LLM模型。这些发现凸显了我们的方法在减轻低资源语言翻译对大规模平行数据依赖方面的潜力。