Large language models (LLMs) like ChatGPT and GPT-4 have exhibited remarkable abilities on a wide range of natural language processing (NLP) tasks, including various machine translation abilities accomplished during chat. However, these models are only accessible through restricted APIs, which creates barriers to new research and advancements in the field. Therefore, we propose the $\mathbf{ParroT}$ framework to enhance and regulate the translation abilities during chat based on open-sourced LLMs (i.e., LLaMA-7b) and human written translation and evaluation data. Specifically, ParroT reformulates translation data into the instruction-following style, and introduces a "Hint" field for incorporating extra requirements to regulate the translation process. Accordingly, we propose three instruction types for finetuning ParroT models, including translation instruction, contrastive instruction, and error-guided instruction. Experiments on two Flores subsets and WMT22 test sets suggest that translation instruction improves the translation performance of vanilla LLMs significantly while error-guided instruction can lead to a further improvement, which demonstrates the importance of learning from low-quality translations annotated by human. Meanwhile, the ParroT models can also preserve the ability on general tasks with the Alpaca multi-task dataset involved in finetuning. Codes: https://github.com/wxjiao/ParroT
翻译:大型语言模型(如ChatGPT和GPT-4)在包括多种机器翻译能力在内的自然语言处理任务中展现出卓越性能,但这些能力仅在聊天过程中得以实现。然而,这些模型仅通过受限API开放访问,为领域内的新研究和进展设置了障碍。为此,我们提出$\mathbf{ParroT}$框架,基于开源大型语言模型(即LLaMA-7b)及人工编写的翻译与评估数据,增强并规范聊天过程中的翻译能力。具体而言,ParroT将翻译数据重构为指令跟随格式,并引入"提示"字段以整合额外需求来调控翻译过程。相应地,我们提出三种用于微调ParroT模型的指令类型:翻译指令、对比指令和错误引导指令。在Flores子集和WMT22测试集上的实验表明,翻译指令显著提升基础LLM的翻译性能,而错误引导指令可进一步带来改进,这证明了从人工标注的低质量翻译中学习的重要性。同时,通过引入Alpaca多任务数据集进行微调,ParroT模型还能保留在通用任务上的能力。代码:https://github.com/wxjiao/ParroT