The challenge of low-latency speech translation has recently draw significant interest in the research community as shown by several publications and shared tasks. Therefore, it is essential to evaluate these different approaches in realistic scenarios. However, currently only specific aspects of the systems are evaluated and often it is not possible to compare different approaches. In this work, we propose the first framework to perform and evaluate the various aspects of low-latency speech translation under realistic conditions. The evaluation is carried out in an end-to-end fashion. This includes the segmentation of the audio as well as the run-time of the different components. Secondly, we compare different approaches to low-latency speech translation using this framework. We evaluate models with the option to revise the output as well as methods with fixed output. Furthermore, we directly compare state-of-the-art cascaded as well as end-to-end systems. Finally, the framework allows to automatically evaluate the translation quality as well as latency and also provides a web interface to show the low-latency model outputs to the user.
翻译:低延迟语音翻译的挑战近期在学术界引发了广泛关注,相关出版物和测评任务层出不穷。因此,在真实场景下评估不同方法至关重要。然而,目前仅针对系统的特定方面进行评估,且通常难以对不同方法进行比较。本研究首次提出了一个在真实条件下对低延迟语音翻译各环节进行评估与执行的框架。该评估采用端到端方式完成,涵盖音频分割及不同组件的运行时间。其次,我们利用该框架比较了多种低延迟语音翻译方法,包括支持输出修正的模型与固定输出方案。此外,我们直接对比了当前最先进的级联系统与端到端系统。最终,本框架能够自动评估翻译质量与延迟,并提供在线界面供用户查看低延迟模型输出。