迈向零样本多模态机器翻译 (Towards Zero-Shot Multimodal Machine Translation)

Current multimodal machine translation (MMT) systems rely on fully supervised data (i.e models are trained on sentences with their translations and accompanying images). However, this type of data is costly to collect, limiting the extension of MMT to other language pairs for which such data does not exist. In this work, we propose a method to bypass the need for fully supervised data to train MMT systems, using multimodal English data only. Our method, called ZeroMMT, consists in adapting a strong text-only machine translation (MT) model by training it on a mixture of two objectives: visually conditioned masked language modelling and the Kullback-Leibler divergence between the original and new MMT outputs. We evaluate on standard MMT benchmarks and the recently released CoMMuTE, a contrastive benchmark aiming to evaluate how well models use images to disambiguate English sentences. We obtain disambiguation performance close to state-of-the-art MMT models trained additionally on fully supervised examples. To prove that our method generalizes to languages with no fully supervised training data available, we extend the CoMMuTE evaluation dataset to three new languages: Arabic, Russian and Chinese. We further show that we can control the trade-off between disambiguation capabilities and translation fidelity at inference time using classifier-free guidance and without any additional data. Our code, data and trained models are publicly accessible.

翻译：当前的多模态机器翻译系统依赖于完全监督的数据（即模型在句子及其翻译和伴随图像上进行训练）。然而，这类数据收集成本高昂，限制了MMT向其他不存在此类数据的语言对的扩展。在这项工作中，我们提出了一种方法，仅使用多模态英语数据来绕过训练MMT系统对完全监督数据的需求。我们的方法名为ZeroMMT，其核心在于通过混合两种目标来微调一个强大的纯文本机器翻译模型：视觉条件掩码语言建模以及原始MT输出与新MMT输出之间的Kullback-Leibler散度。我们在标准MMT基准测试以及新近发布的CoMMuTE上进行了评估，后者是一个旨在评估模型如何利用图像消除英语句子歧义的对比基准。我们获得的消歧性能接近额外在完全监督样本上训练的最先进MMT模型。为了证明我们的方法能够泛化到没有完全监督训练数据可用的语言，我们将CoMMuTE评估数据集扩展至三种新语言：阿拉伯语、俄语和中文。我们进一步表明，在推理时可以使用无分类器引导技术来控制消歧能力与翻译保真度之间的权衡，且无需任何额外数据。我们的代码、数据和训练模型均已公开。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日