While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task. In this paper, we propose a recipe for tailoring LLMs to multiple tasks present in translation workflows. We perform continued pretraining on a multilingual mixture of monolingual and parallel data, creating TowerBase, followed by finetuning on instructions relevant for translation processes, creating TowerInstruct. Our final model surpasses open alternatives on several tasks relevant to translation workflows and is competitive with general-purpose closed LLMs. To facilitate future research, we release the Tower models, our specialization dataset, an evaluation framework for LLMs focusing on the translation ecosystem, and a collection of model generations, including ours, on our benchmark.
翻译:尽管通用大语言模型在翻译领域的多项任务上展现出良好性能,但基于开源大语言模型的方法仅在专攻单一任务时具备竞争力。本文提出一种针对翻译工作流中多项任务定制大语言模型的方案。我们通过持续预训练混合单语与平行数据构建多语言模型TowerBase,并进一步针对翻译流程相关指令进行微调得到TowerInstruct。我们的最终模型在翻译工作流相关的若干任务上超越开源替代方案,且与通用闭源大语言模型性能相当。为促进后续研究,我们开源了Tower模型、专用数据集、面向翻译生态的大语言模型评估框架,以及包含本模型在内的基准测试生成结果集合。