Customizing machine translation models to comply with fine-grained attributes such as formality has seen tremendous progress recently. However, current approaches mostly rely on at least some supervised data with attribute annotation. Data scarcity therefore remains a bottleneck to democratizing such customization possibilities to a wider range of languages, lower-resource ones in particular. Given recent progress in pretrained massively multilingual translation models, we use them as a foundation to transfer the attribute controlling capabilities to languages without supervised data. In this work, we present a comprehensive analysis of transferring attribute controllers based on a pretrained NLLB-200 model. We investigate both training- and inference-time control techniques under various data scenarios, and uncover their relative strengths and weaknesses in zero-shot performance and domain robustness. We show that both paradigms are complementary, as shown by consistent improvements on 5 zero-shot directions. Moreover, a human evaluation on a real low-resource language, Bengali, confirms our findings on zero-shot transfer to new target languages. The code is $\href{https://github.com/dannigt/attribute-controller-transfer}{\text{here}}$.
翻译:近期,定制机器翻译模型以遵循形式性等细粒度属性的能力取得了显著进展。然而,当前方法大多依赖于至少部分带有属性标注的监督数据。因此,数据稀缺性仍是阻碍将此类定制能力推广至更广泛语言(尤其是低资源语言)的瓶颈。鉴于预训练大规模多语言翻译模型的最新进展,我们以其为基础,将属性控制能力迁移至无监督数据的语言。本研究基于预训练的NLLB-200模型,对属性控制器的迁移机制进行了全面分析。我们探讨了多种数据场景下的训练时与推理时控制技术,揭示了其在零样本性能与领域鲁棒性方面的相对优势与不足。研究表明,两种范式具有互补性,在5个零样本方向上均取得了一致的性能提升。此外,针对真实低资源语言孟加拉语的人工评估,验证了我们在零样本迁移至新目标语言方面的发现。代码链接:$\href{https://github.com/dannigt/attribute-controller-transfer}{\text{此处}}$。