Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which focuses on the translation of scientific conference talks. The test condition features accented input speech and terminology-dense contents. The tasks requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of re-training. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks.
翻译:许多已有的语音翻译基准测试聚焦于高质量录音条件下的母语英语语音,这通常与现实应用场景的条件不符。本文描述了我们在IWSLT 2023多语言赛道中的语音翻译系统,该赛道专注于科学会议演讲的翻译。测试条件包含带口音的输入语音和术语密集的内容。任务要求翻译至10种资源数量不等的语言。在缺乏目标领域训练数据的情况下,我们采用基于检索的方法(kNN-MT)进行有效适配(语音翻译的BLEU值提升0.8)。我们还利用适配器轻松整合来自数据增强的增量训练数据,并证明其性能与重新训练相当。我们观察到,级联系统因其独立模块而更易于适配特定目标领域。在科学演讲翻译中,我们的级联语音系统显著优于端到端对应系统,尽管两者在TED演讲上的性能保持相近。