Large language models (LLMs) like ChatGPT and GPT-4 have exhibited remarkable abilities on a wide range of natural language processing (NLP) tasks, including various machine translation abilities accomplished during chat. However, these models are only accessible through restricted APIs, which creates barriers to new research and advancements in the field. Therefore, we propose the $\mathbf{ParroT}$ framework to enhance and regulate the translation abilities during chat based on open-sourced LLMs (i.e., LLaMA-7b) and human written translation and evaluation data. Specifically, ParroT reformulates translation data into the instruction-following style, and introduces a "Hint" field for incorporating extra requirements to regulate the translation process. Accordingly, we propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction. Experiments on Flores subsets and WMT22 test sets suggest that translation instruction improves the translation performance of vanilla LLMs significantly while error-guided instruction can lead to a further improvement, which demonstrates the importance of learning from low-quality translations annotated by human. Meanwhile, the ParroT models can also preserve the ability on general tasks with the Alpaca multi-task dataset involved in finetuning. Codes: https://github.com/wxjiao/ParroT
翻译:大型语言模型(LLMs)如ChatGPT和GPT-4已在广泛的自然语言处理任务中展现出卓越能力,包括在对话过程中实现的多种机器翻译能力。然而,这些模型只能通过受限的API访问,这为领域内的新研究和进展设置了障碍。为此,我们提出$\mathbf{ParroT}$框架,基于开源的大语言模型(即LLaMA-7b)以及人工编写的翻译和评估数据,增强并规范对话中的翻译能力。具体来说,ParroT将翻译数据重构为指令遵循风格,并引入“提示”字段以纳入额外要求来规范翻译过程。相应地,我们提出三种用于微调ParroT模型的指令类型:翻译指令、对比指令和错误引导指令。在Flores子集和WMT22测试集上的实验表明,翻译指令显著提升了原始LLM的翻译性能,而错误引导指令可带来进一步改进,这证明了从人工标注的低质量翻译中学习的重要性。同时,ParroT模型在微调过程中加入Alpaca多任务数据集后,也能保持处理通用任务的能力。代码地址:https://github.com/wxjiao/ParroT