Neural machine translation benefits from semantically rich representations. Considerable progress in learning such representations has been achieved by language modelling and mutual information maximization objectives using contrastive learning. The language-dependent nature of language modelling introduces a trade-off between the universality of the learned representations and the model's performance on the language modelling tasks. Although contrastive learning improves performance, its success cannot be attributed to mutual information alone. We propose a novel Context Enhancement step to improve performance on neural machine translation by maximizing mutual information using the Barlow Twins loss. Unlike other approaches, we do not explicitly augment the data but view languages as implicit augmentations, eradicating the risk of disrupting semantic information. Further, our method does not learn embeddings from scratch and can be generalised to any set of pre-trained embeddings. Finally, we evaluate the language-agnosticism of our embeddings through language classification and use them for neural machine translation to compare with state-of-the-art approaches.
翻译:神经机器翻译受益于语义丰富的表示。在通过对比学习利用语言建模和互信息最大化目标来学习此类表示方面,已取得了显著进展。语言建模的语言依赖性在所学表示的通用性与模型在语言建模任务上的性能之间引入了权衡。尽管对比学习提升了性能,但其成功不能仅归因于互信息。我们提出了一种新颖的上下文增强步骤,通过使用Barlow Twins损失最大化互信息来改善神经机器翻译性能。与其他方法不同,我们并不显式增强数据,而是将语言视为隐式增强,从而消除了破坏语义信息的风险。此外,我们的方法无需从头学习嵌入,并可推广到任意预训练嵌入集。最后,我们通过语言分类评估了嵌入的语言无关性,并将其用于神经机器翻译,以与最先进的方法进行比较。