Many existing speech translation benchmarks focus on native-English speech in high-quality recording conditions, which often do not match the conditions in real-life use-cases. In this paper, we describe our speech translation system for the multilingual track of IWSLT 2023, which evaluates translation quality on scientific conference talks. The test condition features accented input speech and terminology-dense contents. The task requires translation into 10 languages of varying amounts of resources. In absence of training data from the target domain, we use a retrieval-based approach (kNN-MT) for effective adaptation (+0.8 BLEU for speech translation). We also use adapters to easily integrate incremental training data from data augmentation, and show that it matches the performance of re-training. We observe that cascaded systems are more easily adaptable towards specific target domains, due to their separate modules. Our cascaded speech system substantially outperforms its end-to-end counterpart on scientific talk translation, although their performance remains similar on TED talks.
翻译:现有许多语音翻译基准测试专注于高质量录制条件下的母语英语语音,这与真实应用场景往往存在差异。本文描述了我们为IWSLT 2023多语言赛道开发的语音翻译系统,该赛道评估科学会议演讲的翻译质量。测试场景包含带口音的输入语音和术语密集内容,任务要求将语音翻译为10种资源量不同的语言。在缺乏目标域训练数据的情况下,我们采用基于检索的方法(kNN-MT)实现高效领域适配(语音翻译BLEU值提升0.8)。同时,我们使用适配器模块轻松整合数据增强产生的增量训练数据,实验表明该方法与完全重训练效果相当。我们观察到级联系统因其独立模块设计,更易针对特定目标领域进行适配。在科学演讲翻译任务中,我们的级联语音系统显著优于端到端系统,尽管两者在TED演讲上的表现仍保持相近水平。