Modularity is a paradigm of machine translation with the potential of bringing forth models that are large at training time and small during inference. Within this field of study, modular approaches, and in particular attention bridges, have been argued to improve the generalization capabilities of models by fostering language-independent representations. In the present paper, we study whether modularity affects translation quality; as well as how well modular architectures generalize across different evaluation scenarios. For a given computational budget, we find non-modular architectures to be always comparable or preferable to all modular designs we study.
翻译:模块化是机器翻译的一种范式,有潜力在训练时构建大规模模型,而在推理时保持模型小巧。在该研究领域中,模块化方法,特别是注意力桥机制,被认为通过促进与语言无关的表示来改善模型的泛化能力。本文研究了模块化是否影响翻译质量,以及模块化架构在不同评估场景下的泛化表现。在给定的计算预算下,我们发现非模块化架构始终与我们研究的所有模块化设计表现相当或更优。