Video-Helpful Multimodal Machine Translation

Existing multimodal machine translation (MMT) datasets consist of images and video captions or instructional video subtitles, which rarely contain linguistic ambiguity, making visual information ineffective in generating appropriate translations. Recent work has constructed an ambiguous subtitles dataset to alleviate this problem but is still limited to the problem that videos do not necessarily contribute to disambiguation. We introduce EVA (Extensive training set and Video-helpful evaluation set for Ambiguous subtitles translation), an MMT dataset containing 852k Japanese-English (Ja-En) parallel subtitle pairs, 520k Chinese-English (Zh-En) parallel subtitle pairs, and corresponding video clips collected from movies and TV episodes. In addition to the extensive training set, EVA contains a video-helpful evaluation set in which subtitles are ambiguous, and videos are guaranteed helpful for disambiguation. Furthermore, we propose SAFA, an MMT model based on the Selective Attention model with two novel methods: Frame attention loss and Ambiguity augmentation, aiming to use videos in EVA for disambiguation fully. Experiments on EVA show that visual information and the proposed methods can boost translation performance, and our model performs significantly better than existing MMT models. The EVA dataset and the SAFA model are available at: https://github.com/ku-nlp/video-helpful-MMT.git.

翻译：现有的大多数多模态机器翻译（MMT）数据集包含图像与视频字幕或教学视频的文本，但其中很少包含语言歧义问题，导致视觉信息在生成准确翻译时效果有限。近期研究构建了歧义字幕数据集以缓解该问题，但仍受限于视频未必有助于消歧的困境。我们提出EVA（面向歧义字幕翻译的大规模训练集与视频辅助评估集），该MMT数据集包含85.2万对日-英平行字幕对、52万对中-英平行字幕对，以及从电影和电视剧中采集的对应视频片段。除大规模训练集外，EVA还包含一个视频辅助评估集，其中字幕具有歧义性，且视频被保证有助于消歧。此外，我们提出基于选择性注意力机制的SAFA模型，包含两种创新方法：帧注意力损失与歧义增强，旨在充分利用EVA中的视频信息进行消歧。在EVA上的实验表明，视觉信息及所提方法能提升翻译性能，且我们的模型显著优于现有MMT模型。EVA数据集与SAFA模型已开源：https://github.com/ku-nlp/video-helpful-MMT.git。

相关内容

Machine Translation

关注 210

机器翻译（Machine Translation）涵盖计算语言学和语言工程的所有分支，包含多语言方面。特色论文涵盖理论，描述或计算方面的任何下列主题:双语和多语语料库的编写和使用，计算机辅助语言教学，非罗马字符集的计算含义，连接主义翻译方法，对比语言学等。官网地址：http://dblp.uni-trier.de/db/journals/mt/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日