The paper presents a novel three-step transfer learning framework for enhancing cross-lingual transfer from high- to low-resource languages in the downstream application of Automatic Speech Translation. The approach integrates a semantic knowledge-distillation step into the existing two-step cross-lingual transfer learning framework XLS-R. This extra step aims to encode semantic knowledge in the multilingual speech encoder pre-trained via Self-Supervised Learning using unlabeled speech. Our proposed three-step cross-lingual transfer learning framework addresses the large cross-lingual transfer gap (TRFGap) observed in the XLS-R framework between high-resource and low-resource languages. We validate our proposal through extensive experiments and comparisons on the CoVoST-2 benchmark, showing significant improvements in translation performance, especially for low-resource languages, and a notable reduction in the TRFGap.
翻译:本文提出了一种新颖的三步迁移学习框架,旨在增强将高资源语言知识迁移到低资源语言的能力,并在自动语音翻译的下游任务中实现应用。该框架在现有的两步跨语言迁移学习框架XLS-R的基础上,整合了一个语义知识蒸馏步骤。这一额外步骤的目标是:通过使用未标注语音进行自监督学习,对多语言语音编码器预训练过程中编码的语义知识进行蒸馏。我们提出的三步跨语言迁移学习框架,有效解决了XLS-R框架中高资源与低资源语言之间存在的较大跨语言迁移差距(TRFGap)。通过在CoVoST-2基准上的广泛实验和对比,我们验证了该方法的有效性,实验结果表明,翻译性能(尤其是低资源语言)得到显著提升,同时TRFGap显著缩小。