We demonstrate the potential of few-shot translation systems, trained with unpaired language data, for both high and low-resource language pairs. We show that with only 5 examples of high-quality translation data shown at inference, a transformer decoder-only model trained solely with self-supervised learning, is able to match specialized supervised state-of-the-art models as well as more general commercial translation systems. In particular, we outperform the best performing system on the WMT'21 English - Chinese news translation task by only using five examples of English - Chinese parallel data at inference. Moreover, our approach in building these models does not necessitate joint multilingual training or back-translation, is conceptually simple and shows the potential to extend to the multilingual setting. Furthermore, the resulting models are two orders of magnitude smaller than state-of-the-art language models. We then analyze the factors which impact the performance of few-shot translation systems, and highlight that the quality of the few-shot demonstrations heavily determines the quality of the translations generated by our models. Finally, we show that the few-shot paradigm also provides a way to control certain attributes of the translation -- we show that we are able to control for regional varieties and formality using only a five examples at inference, paving the way towards controllable machine translation systems.
翻译:我们展示了基于非配对语言数据训练的小样本翻译系统在高低资源语言对中的潜力。研究表明,仅需在推理时提供5个高质量翻译示例,一个仅通过自监督学习训练的仅解码器Transformer模型,就能匹配专业监督式最先进模型及通用商业翻译系统的表现。特别地,我们仅通过在推理时使用5个英中平行数据实例,便在WMT'21英中新闻翻译任务中超越了最佳系统。此外,我们的模型构建方法无需联合多语言训练或反向翻译,概念简单且具备向多语言场景扩展的潜力。同时,最终模型规模比最先进语言模型小两个数量级。我们进一步分析了影响小样本翻译系统性能的因素,强调小样本演示的质量直接影响模型生成的翻译质量。最后,我们证明小样本范式还能实现翻译属性的控制——通过推理时仅5个示例即可控制区域变体和正式程度,为可控机器翻译系统铺平了道路。