In an evolving landscape of crisis communication, the need for robust and adaptable Machine Translation (MT) systems is more pressing than ever, particularly for low-resource languages. This study presents a comprehensive exploration of leveraging Large Language Models (LLMs) and Multilingual LLMs (MLLMs) to enhance MT capabilities in such scenarios. By focusing on the unique challenges posed by crisis situations where speed, accuracy, and the ability to handle a wide range of languages are paramount, this research outlines a novel approach that combines the cutting-edge capabilities of LLMs with fine-tuning techniques and community-driven corpus development strategies. At the core of this study is the development and empirical evaluation of MT systems tailored for two low-resource language pairs, illustrating the process from initial model selection and fine-tuning through to deployment. Bespoke systems are developed and modelled on the recent Covid-19 pandemic. The research highlights the importance of community involvement in creating highly specialised, crisis-specific datasets and compares custom GPTs with NLLB-adapted MLLM models. It identifies fine-tuned MLLM models as offering superior performance compared with their LLM counterparts. A scalable and replicable model for rapid MT system development in crisis scenarios is outlined. Our approach enhances the field of humanitarian technology by offering a blueprint for developing multilingual communication systems during emergencies.
翻译:在不断演变的危机沟通格局中,对稳健且适应性强的机器翻译(MT)系统的需求比以往任何时候都更加迫切,尤其对于低资源语言而言。本研究全面探讨了如何利用大型语言模型(LLMs)及多语言大型语言模型(MLLMs)来增强此类情境下的机器翻译能力。通过聚焦危机情境所特有的挑战——其中翻译速度、准确性以及处理广泛语言的能力至关重要,本研究提出了一种创新方法,将LLMs的前沿能力与微调技术及社区驱动的语料库开发策略相结合。本研究的核心是针对两对低资源语言开发并实证评估定制化的机器翻译系统,阐述了从初始模型选择、微调到部署的全过程。定制化系统的开发以近期的新冠疫情为建模基础。研究强调了社区参与构建高度专业化、危机特定数据集的重要性,并比较了定制GPT模型与基于NLLB适配的MLLM模型。研究发现,经过微调的MLLM模型相较于其对应的LLM模型展现出更优越的性能。本文概述了一个可扩展、可复制的模型,用于在危机情境下快速开发机器翻译系统。我们的方法通过提供一套在紧急情况下开发多语言通信系统的蓝图,推动了人道主义技术领域的发展。