Cross-lingual alignment, the meaningful similarity of representations across languages in multilingual language models, has been an active field of research in recent years. We survey the literature of techniques to improve cross-lingual alignment, providing a taxonomy of methods and summarising insights from throughout the field. We present different understandings of cross-lingual alignment and their limitations. We provide a qualitative summary of results from a large number of surveyed papers. Finally, we discuss how these insights may be applied not only to encoder models, where this topic has been heavily studied, but also to encoder-decoder or even decoder-only models, and argue that an effective trade-off between language-neutral and language-specific information is key.
翻译:跨语言对齐,即多语言语言模型中跨语言表征的有意义相似性,近年来已成为一个活跃的研究领域。我们综述了改进跨语言对齐的技术文献,提出了方法的分类体系,并总结了整个领域的见解。我们呈现了对跨语言对齐的不同理解及其局限性,对大量调研论文的结果进行了定性总结。最后,我们讨论了这些见解不仅可应用于编码器模型(该主题已在此类模型中受到深入研究),还可应用于编码器-解码器甚至仅解码器模型,并论证了在语言中立信息与语言特定信息之间取得有效权衡是关键所在。