Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.
翻译:多语言机器翻译有望改善非英语语言之间的翻译质量。这具有多重优势,包括降低延迟(无需两次翻译)以及减少错误级联(例如,避免通过英语转译时丢失性别和格式信息)。然而,其不足之处在于,添加更多语言会降低每语言对应的模型容量,这通常通过增加整体模型规模来应对,但会导致训练难度提升和推理速度下降。在本研究中,我们引入了语言特定Transformer层(Language-Specific Transformer Layers, LSLs),该方法能够在保持前向传播计算量和参数量不变的同时增加模型容量。其核心思想是:编码器的部分层被设计为源语言或目标语言特定层,而其余层保持共享。我们采用基于神经架构搜索的方法,研究了这些语言特定层的最优放置策略。实验结果表明,在不使用LSLs的独立解码器架构上,该方法取得了1.3个chrF(1.5个spBLEU)点的提升;而在共享解码器架构上,则获得了1.9个chrF(2.2个spBLEU)点的改进。