Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.
翻译:语码混合(CS),即在同一句子中混合不同语言,是交流中的常见现象,并在诸多自然语言处理(NLP)场景中构成挑战。先前关于CS语音的研究在端到端语音翻译(ST)方面取得了有前景的结果,但局限于离线场景和将源语言中出现的某种语言翻译为单一语言的场景(单语转录)。本文聚焦于实际CS语音翻译中两个关键但尚未探索的领域:流式设置以及向第三种语言(即源语言中不包含的语言)的翻译。为此,我们扩展了Fisher和Miami测试与验证数据集,新增西班牙语和德语的目标语言。利用这些数据,我们训练了一个适用于离线与流式ST的模型,并为上述两种场景建立了基线结果。