Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the strengths of the Translatotron framework while incorporating key modifications for streaming operation, and an adjustable fixed delay. Our experiments show that SimulTron surpasses Translatotron 2 in offline evaluations. Furthermore, real-time evaluations reveal that SimulTron improves upon the performance achieved by Translatotron 1. Additionally, SimulTron achieves superior BLEU scores and latency compared to previous real-time S2ST method on the MuST-C dataset. Significantly, we have successfully deployed SimulTron on a Pixel 7 Pro device, show its potential for simultaneous S2ST on-device.
翻译:同步语音到语音翻译(S2ST)有望打破沟通障碍,实现跨语言的流畅对话。然而,通过移动设备实现准确、实时的翻译仍然是一个重大挑战。我们提出了SimulTron,一种专为应对此任务而设计的新型S2ST架构。SimulTron是一个轻量级的直接S2ST模型,它借鉴了Translatotron框架的优势,同时针对流式操作和可调固定延迟进行了关键性改进。我们的实验表明,SimulTron在离线评估中超越了Translatotron 2。此外,实时评估显示,SimulTron的性能优于Translatotron 1所达到的水平。同时,在MuST-C数据集上,与先前的实时S2ST方法相比,SimulTron获得了更高的BLEU分数和更低的延迟。尤为重要的是,我们已成功将SimulTron部署在Pixel 7 Pro设备上,展示了其在设备端实现同步S2ST的潜力。