Code-switching (CS), i.e. mixing different languages in a single sentence, is a common phenomenon in communication and can be challenging in many Natural Language Processing (NLP) settings. Previous studies on CS speech have shown promising results for end-to-end speech translation (ST), but have been limited to offline scenarios and to translation to one of the languages present in the source (\textit{monolingual transcription}). In this paper, we focus on two essential yet unexplored areas for real-world CS speech translation: streaming settings, and translation to a third language (i.e., a language not included in the source). To this end, we extend the Fisher and Miami test and validation datasets to include new targets in Spanish and German. Using this data, we train a model for both offline and streaming ST and we establish baseline results for the two settings mentioned earlier.
翻译:语码混合(Code-switching, CS),即在单个句子中混合使用不同语言,是交流中的常见现象,并在诸多自然语言处理(NLP)场景中具有挑战性。先前关于语码混合语音的端到端语音翻译(ST)研究虽取得令人鼓舞的成果,但仅限于离线场景及将源语言中的语言之一进行翻译(即单语转录)。本文聚焦于真实世界语码混合语音翻译中两个关键但尚未探索的领域:流式设置以及翻译至第三方语言(即源语言中未包含的语言)。为此,我们扩展了Fisher和Miami测试与验证数据集,新增西班牙语和德语的目标语言。利用该数据,我们训练了同时适用于离线与流式语音翻译的模型,并为前述两种场景建立了基准结果。