Most of the speech translation models heavily rely on parallel data, which is hard to collect especially for low-resource languages. To tackle this issue, we propose to build a cascaded speech translation system without leveraging any kind of paired data. We use fully unpaired data to train our unsupervised systems and evaluate our results on CoVoST 2 and CVSS. The results show that our work is comparable with some other early supervised methods in some language pairs. While cascaded systems always suffer from severe error propagation problems, we proposed denoising back-translation (DBT), a novel approach to building robust unsupervised neural machine translation (UNMT). DBT successfully increases the BLEU score by 0.7--0.9 in all three translation directions. Moreover, we simplified the pipeline of our cascaded system to reduce inference latency and conducted a comprehensive analysis of every part of our work. We also demonstrate our unsupervised speech translation results on the established website.
翻译:大多数语音翻译模型严重依赖平行数据,而此类数据难以收集,尤其对于低资源语言而言。为解决这一问题,我们提出构建完全不使用任何类型配对数据的级联式语音翻译系统。我们使用完全无配对数据训练无监督系统,并在CoVoST 2和CVSS数据集上评估结果。结果表明,在部分语言对中,我们的工作可与某些早期监督方法相媲美。针对级联系统长期存在的严重错误传播问题,我们提出了去噪反向翻译(DBT),这是一种构建稳健无监督神经机器翻译(UNMT)的新方法。DBT成功地将所有三个翻译方向的BLEU分数提高了0.7至0.9。此外,我们简化了级联系统的流水线以降低推理延迟,并对工作的每个部分进行了全面分析。我们还展示了在已建立的网站上无监督语音翻译的结果。