Textless Low-Resource Speech-to-Speech Translation With Unit Language Models

Existing speech-to-speech translation models fall into two camps: textless models trained with hundreds of hours of parallel speech data or unsupervised models that leverage text as an intermediate step. Both approaches limit building speech-to-speech translation models for a wide range of languages, as they exclude languages that are primarily spoken and language pairs that lack large-scale parallel speech data. We present a new framework for training textless low-resource speech-to-speech translation (S2ST) systems that only need dozens of hours of parallel speech data. We reformulate S2ST as a unit-to-unit seq2seq translation task, and start by pretraining a model on large-scale monolingual speech data. Then, we finetune it with a small amount of parallel speech data ($20-60$ hours). Lastly, we improve model performance through an unsupervised backtranslation objective. We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech data. Evaluated using the ASR-BLEU metric, our models achieve reasonable performance on all three domains, with some being within 1-2 points of our supervised topline.

翻译：现有的语音到语音翻译模型分为两类：一类是基于无文本方法，需使用数百小时并行语音数据训练的模型；另一类是利用文本作为中间步骤的无监督模型。这两种方法均限制了为广泛语言构建语音到语音翻译模型的能力，因为它们排除了主要以口语形式存在且缺乏大规模并行语音数据的语言对。我们提出了一种新的框架，用于训练仅需数十小时并行语音数据的无文本低资源语音到语音翻译（S2ST）系统。我们将S2ST重新定义为单元到单元的序列到序列翻译任务，首先在大规模单语语音数据上预训练模型，然后使用少量并行语音数据（20-60小时）进行微调。最后，通过无监督反向翻译目标提升模型性能。我们使用单说话人合成语音数据，在三个不同领域（欧洲议会、Common Voice和全印广播电台）上训练并评估了英语到德语、德语到英语以及马拉地语到英语的翻译模型。采用ASR-BLEU指标评估，我们的模型在所有三个领域均取得了合理性能，其中部分结果与监督基线差距在1-2分以内。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日