SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

Seamless Communication,Loïc Barrault,Yu-An Chung,Mariano Cora Meglioli,David Dale,Ning Dong,Paul-Ambroise Duquenne,Hady Elsahar,Hongyu Gong,Kevin Heffernan,John Hoffman,Christopher Klaiber,Pengwei Li,Daniel Licht,Jean Maillard,Alice Rakotoarison,Kaushik Ram Sadagopan,Guillaume Wenzek,Ethan Ye,Bapi Akula,Peng-Jen Chen,Naji El Hachem,Brian Ellis,Gabriel Mejia Gonzalez,Justin Haaheim,Prangthip Hansanti,Russ Howes,Bernie Huang,Min-Jae Hwang,Hirofumi Inaguma,Somya Jain,Elahe Kalbassi,Amanda Kallet,Ilia Kulikov,Janice Lam,Daniel Li,Xutai Ma,Ruslan Mavlyutov,Benjamin Peloquin,Mohamed Ramadan,Abinesh Ramakrishnan,Anna Sun,Kevin Tran,Tuan Tran,Igor Tufanov,Vish Vogeti,Carleigh Wood,Yilin Yang,Bokai Yu,Pierre Andrews,Can Balioglu,Marta R. Costa-jussà,Onur Celebi,Maha Elbayad,Cynthia Gao,Francisco Guzmán,Justine Kao,Ann Lee,Alexandre Mourachko,Juan Pino,Sravya Popuri,Christophe Ropers,Safiyyah Saleem,Holger Schwenk,Paden Tomasello,Changhan Wang,Jeff Wang,Skyler Wang

What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages? While recent breakthroughs in text-based models have pushed machine translation coverage beyond 200 languages, unified speech-to-speech translation models have yet to achieve similar strides. More specifically, conventional speech-to-speech translation systems rely on cascaded systems that perform translation progressively, putting high-performing unified systems out of reach. To address these gaps, we introduce SeamlessM4T, a single model that supports speech-to-speech translation, speech-to-text translation, text-to-speech translation, text-to-text translation, and automatic speech recognition for up to 100 languages. To build this, we used 1 million hours of open speech audio data to learn self-supervised speech representations with w2v-BERT 2.0. Subsequently, we created a multimodal corpus of automatically aligned speech translations. Filtered and combined with human-labeled and pseudo-labeled data, we developed the first multilingual system capable of translating from and into English for both speech and text. On FLEURS, SeamlessM4T sets a new standard for translations into multiple target languages, achieving an improvement of 20% BLEU over the previous SOTA in direct speech-to-text translation. Compared to strong cascaded models, SeamlessM4T improves the quality of into-English translation by 1.3 BLEU points in speech-to-text and by 2.6 ASR-BLEU points in speech-to-speech. Tested for robustness, our system performs better against background noises and speaker variations in speech-to-text tasks compared to the current SOTA model. Critically, we evaluated SeamlessM4T on gender bias and added toxicity to assess translation safety. Finally, all contributions in this work are open-sourced and accessible at https://github.com/facebookresearch/seamless_communication

翻译：要创造“巴别鱼”——一种能够帮助个体在任何两种语言之间翻译语音的工具——需要什么？尽管近年来基于文本的模型在机器翻译覆盖范围上突破了200种语言，但在统一的语音到语音翻译模型方面，尚未取得类似进展。具体而言，传统的语音到语音翻译系统依赖级联系统逐步执行翻译，这使得高性能的统一系统难以实现。为填补这些空白，我们提出了SeamlessM4T，这是一个单一模型，支持至多100种语言的语音到语音翻译、语音到文本翻译、文本到语音翻译、文本到文本翻译以及自动语音识别。为此，我们使用了100万小时的公开语音音频数据，通过w2v-BERT 2.0学习自监督语音表示。随后，我们创建了一个包含自动对齐语音翻译的多模态语料库。经过过滤并与人工标注及伪标注数据结合，我们开发了首个能够支持语音和文本从英语翻译至其他语言以及从其他语言翻译至英语的多语言系统。在FLEURS数据集上，SeamlessM4T为多目标语言翻译设立了新标准，在直接语音到文本翻译中，相较于此前最优方法，BLEU值提升了20%。与强级联模型相比，SeamlessM4T在语音到文本任务中使到英语的翻译质量提升了1.3个BLEU点，在语音到语音任务中提升了2.6个ASR-BLEU点。在鲁棒性测试中，我们的系统在语音到文本任务中，对于背景噪声和说话人变化的处理能力优于当前最优模型。关键地，我们评估了SeamlessM4T在性别偏见和额外毒性方面的翻译安全性。最后，本工作的所有贡献均已开源，可通过 https://github.com/facebookresearch/seamless_communication 访问。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日