Customizing machine translation models to comply with desired attributes (e.g., formality or grammatical gender) is a well-studied topic. However, most current approaches rely on (semi-)supervised data with attribute annotations. This data scarcity bottlenecks democratizing such customization possibilities to a wider range of languages, particularly lower-resource ones. This gap is out of sync with recent progress in pretrained massively multilingual translation models. In response, we transfer the attribute controlling capabilities to languages without attribute-annotated data with an NLLB-200 model as a foundation. Inspired by techniques from controllable generation, we employ a gradient-based inference-time controller to steer the pretrained model. The controller transfers well to zero-shot conditions, as it operates on pretrained multilingual representations and is attribute -- rather than language-specific. With a comprehensive comparison to finetuning-based control, we demonstrate that, despite finetuning's clear dominance in supervised settings, the gap to inference-time control closes when moving to zero-shot conditions, especially with new and distant target languages. The latter also shows stronger domain robustness. We further show that our inference-time control complements finetuning. A human evaluation on a real low-resource language, Bengali, confirms our findings. Our code is https://github.com/dannigt/attribute-controller-transfer
翻译:定制化机器翻译模型以遵循特定属性(如礼貌程度或语法性别)是研究成熟的课题。然而,当前多数方法依赖带有属性标注的(半)监督数据。这种数据稀缺性制约了将定制化能力推广至更广泛语言(尤其是低资源语言)的民主化进程。这一差距与预训练大规模多语言翻译模型的最新进展脱节。为此,我们以NLLB-200模型为基础,将属性控制能力迁移至无属性标注数据的语言。受可控生成技术启发,我们采用基于梯度的推理时控制器来引导预训练模型。该控制器在零样本条件下表现良好,因其作用于预训练的多语言表征,且具有属性特异性而非语言特异性。通过与基于微调的控制方法全面对比,我们发现:尽管微调在监督场景下具有明显优势,但在零样本条件下(尤其针对新语种及远距离目标语言),推理时控制的性能差距显著缩小。后者还展现出更强的领域鲁棒性。研究进一步表明,我们的推理时控制可与微调互补。针对真实低资源语言孟加拉语的人工评估验证了上述发现。代码详见https://github.com/dannigt/attribute-controller-transfer