This paper describes the FBK's participation in the Simultaneous Translation Evaluation Campaign at IWSLT 2024. For this year's submission in the speech-to-text translation (ST) sub-track, we propose SimulSeamless, which is realized by combining AlignAtt and SeamlessM4T in its medium configuration. The SeamlessM4T model is used "off-the-shelf" and its simultaneous inference is enabled through the adoption of AlignAtt, a SimulST policy based on cross-attention that can be applied without any retraining or adaptation of the underlying model for the simultaneous task. We participated in all the Shared Task languages (English->{German, Japanese, Chinese}, and Czech->English), achieving acceptable or even better results compared to last year's submissions. SimulSeamless, covering more than 143 source languages and 200 target languages, is released at: https://github.com/hlt-mt/FBK-fairseq/.
翻译:本文介绍了FBK在IWSLT 2024同声传译评估任务中的参与情况。针对今年的语音到文本翻译子赛道,我们提出了SimulSeamless系统,该系统通过结合AlignAtt策略与中等配置的SeamlessM4T模型实现。我们直接采用现成的SeamlessM4T模型,并通过引入AlignAtt——一种基于交叉注意力的同声传译策略——使其具备同声推理能力,该策略无需对基础模型进行任何针对同声任务的重新训练或适配。我们参与了共享任务的所有语言对(英语->{德语、日语、中文},以及捷克语->英语),与去年的提交结果相比,取得了可接受甚至更优的性能。SimulSeamless系统支持超过143种源语言和200种目标语言,已发布于:https://github.com/hlt-mt/FBK-fairseq/。