Customizing machine translation models to comply with fine-grained attributes such as formality has seen tremendous progress recently. However, current approaches mostly rely on at least some supervised data with attribute annotation. Data scarcity therefore remains a bottleneck to democratizing such customization possibilities to a wider range of languages, lower-resource ones in particular. Given recent progress in pretrained massively multilingual translation models, we use them as a foundation to transfer the attribute controlling capabilities to languages without supervised data. In this work, we present a comprehensive analysis of transferring attribute controllers based on a pretrained NLLB-200 model. We investigate both training- and inference-time control techniques under various data scenarios, and uncover their relative strengths and weaknesses in zero-shot performance and domain robustness. We show that both paradigms are complementary, as shown by consistent improvements on 5 zero-shot directions. Moreover, a human evaluation on a real low-resource language, Bengali, confirms our findings on zero-shot transfer to new target languages. The code is $\href{https://github.com/dannigt/attribute-controller-transfer}{\text{here}}$.
翻译:定制机器翻译模型以遵循形式性等细粒度属性近期取得了显著进展。然而,当前方法大多依赖于至少部分带有属性标注的监督数据。因此,数据稀缺仍然是阻碍将此类定制能力普及到更广泛语言(尤其是低资源语言)的瓶颈。鉴于预训练大规模多语言翻译模型的最新进展,我们将其作为基础,将属性控制能力迁移至无监督数据的语言。在本工作中,我们基于预训练的NLLB-200模型,对属性控制器的迁移进行了全面分析。我们研究了多种数据场景下的训练时和推理时控制技术,并揭示了它们在零样本性能和领域鲁棒性方面的相对优劣。我们证明了这两种范式具有互补性,在5个零样本方向上均取得了一致的改进。此外,针对真实低资源语言孟加拉语的人工评估证实了我们在零样本迁移至新目标语言上的发现。代码链接为 $\href{https://github.com/dannigt/attribute-controller-transfer}{\text{此处}}$。