In this paper, we introduce SCALE, a collaborative framework that connects compact Specialized Translation Models (STMs) and general-purpose Large Language Models (LLMs) as one unified translation engine. By introducing translation from STM into the triplet in-context demonstrations, SCALE unlocks refinement and pivoting ability of LLM, thus mitigating language bias of LLM and parallel data bias of STM, enhancing LLM speciality without sacrificing generality, and facilitating continual learning without expensive LLM fine-tuning. Our comprehensive experiments show that SCALE significantly outperforms both few-shot LLMs (GPT-4) and specialized models (NLLB) in challenging low-resource settings. Moreover, in Xhosa to English translation, SCALE experiences consistent improvement by a 4 BLEURT score without tuning LLM and surpasses few-shot GPT-4 by 2.5 COMET score and 3.8 BLEURT score when equipped with a compact model consisting of merely 600M parameters. SCALE could also effectively exploit the existing language bias of LLMs by using an English-centric STM as a pivot for translation between any language pairs, outperforming few-shot GPT-4 by an average of 6 COMET points across eight translation directions. Furthermore we provide an in-depth analysis of SCALE's robustness, translation characteristics, and latency costs, providing solid foundation for future studies exploring the potential synergy between LLMs and more specialized, task-specific models.
翻译:本文提出SCALE框架,通过连接紧凑型专用翻译模型(STMs)与通用大语言模型(LLMs),构建统一的翻译引擎。该框架将STM的输出译文嵌入三元组上下文示例中,从而激发LLM的修正与枢纽翻译能力:既可缓解LLM的语言偏差与STM的平行数据偏差,又能在不牺牲通用性的前提下增强LLM的专业性,且无需对LLM进行昂贵的微调即可实现持续学习。广泛实验表明,在低资源场景下,SCALE显著超越少样本LLM(GPT-4)与专用模型(NLLB)。以科萨语译英语为例,SCALE无需调整LLM即可实现4个BLEURT分数的持续提升,当配备仅含600M参数的紧凑模型时,其表现超越少样本GPT-4达2.5个COMET分数与3.8个BLEURT分数。通过将英语中心型STM作为任意语言对翻译的枢纽,SCALE能有效利用LLM的固有语言偏差,在八个翻译方向上平均超越少样本GPT-4达6个COMET分数。本文进一步对SCALE的鲁棒性、翻译特性与延迟成本进行深度分析,为探索LLM与专用任务模型协同潜力奠定基础。