Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning

Large language models (LLMs) are a promising avenue for machine translation (MT). However, current LLM-based MT systems are brittle: their effectiveness highly depends on the choice of few-shot examples and they often require extra post-processing due to overgeneration. Alternatives such as finetuning on translation instructions are computationally expensive and may weaken in-context learning capabilities, due to overspecialization. In this paper, we provide a closer look at this problem. We start by showing that adapter-based finetuning with LoRA matches the performance of traditional finetuning while reducing the number of training parameters by a factor of 50. This method also outperforms few-shot prompting and eliminates the need for post-processing or in-context examples. However, we show that finetuning generally degrades few-shot performance, hindering adaptation capabilities. Finally, to obtain the best of both worlds, we propose a simple approach that incorporates few-shot examples during finetuning. Experiments on 10 language pairs show that our proposed approach recovers the original few-shot capabilities while keeping the added benefits of finetuning.

翻译：大型语言模型（LLM）为机器翻译（MT）提供了有前景的途径。然而，当前基于LLM的机器翻译系统较为脆弱：其有效性高度依赖于少量示例的选择，且由于过度生成，通常需要额外的后处理。基于翻译指令的微调等替代方案计算成本高昂，且可能因过度专门化而削弱上下文学习能力。本文对这一问题进行了更深入的探究。首先，我们证明基于适配器的LoRA微调在将训练参数量减少50倍的同时，能达到与传统微调相当的性能。该方法还优于少量样本提示，并消除了后处理或上下文示例的需求。然而，我们发现微调通常会降低少量样本性能，从而阻碍适应能力。最后，为兼顾两者优势，我们提出一种简单方法，在微调过程中融入少量样本示例。在10个语言对上的实验表明，所提方法在保留微调额外优势的同时，恢复了原有的少量样本能力。

相关内容

小样本学习

关注 216

小样本学习（Few-Shot Learning，以下简称 FSL ）用于解决当可用的数据量比较少时，如何提升神经网络的性能。在 FSL 中，经常用到的一类方法被称为 Meta-learning。和普通的神经网络的训练方法一样，Meta-learning 也包含训练过程和测试过程，但是它的训练过程被称作 Meta-training 和 Meta-testing。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日