This work evaluated several cutting-edge large-scale foundation models based on self-supervision or weak supervision, including SeamlessM4T, SeamlessM4T v2, and Whisper-large-v3, on three code-switched corpora. We found that self-supervised models can achieve performances close to the supervised model, indicating the effectiveness of multilingual self-supervised pre-training. We also observed that these models still have room for improvement as they kept making similar mistakes and had unsatisfactory performances on modeling intra-sentential code-switching. In addition, the validity of several variants of Whisper was explored, and we concluded that they remained effective in a code-switching scenario, and similar techniques for self-supervised models are worth studying to boost the performance of code-switched tasks.
翻译:本文评估了多种基于自监督或弱监督的前沿大规模基础模型(包括SeamlessM4T、SeamlessM4T v2及Whisper-large-v3)在三个混合语料库上的表现。研究发现,自监督模型能够达到接近监督模型的性能水平,这证明了多语言自监督预训练的有效性。同时,我们观察到这些模型仍存在改进空间,因为它们持续出现相似的错误,且在句内语码转换建模方面表现欠佳。此外,本研究探索了Whisper若干变体的有效性,认为这些变体在语码转换场景中仍保持有效,且类似技术值得在自监督模型中进一步研究以提升混合语言任务的性能。