3AM: An Ambiguity-Aware Multi-Modal Machine Translation Dataset

Multimodal machine translation (MMT) is a challenging task that seeks to improve translation quality by incorporating visual information. However, recent studies have indicated that the visual information provided by existing MMT datasets is insufficient, causing models to disregard it and overestimate their capabilities. This issue presents a significant obstacle to the development of MMT research. This paper presents a novel solution to this issue by introducing 3AM, an ambiguity-aware MMT dataset comprising 26,000 parallel sentence pairs in English and Chinese, each with corresponding images. Our dataset is specifically designed to include more ambiguity and a greater variety of both captions and images than other MMT datasets. We utilize a word sense disambiguation model to select ambiguous data from vision-and-language datasets, resulting in a more challenging dataset. We further benchmark several state-of-the-art MMT models on our proposed dataset. Experimental results show that MMT models trained on our dataset exhibit a greater ability to exploit visual information than those trained on other MMT datasets. Our work provides a valuable resource for researchers in the field of multimodal learning and encourages further exploration in this area. The data, code and scripts are freely available at https://github.com/MaxyLee/3AM.

翻译：多模态机器翻译（MMT）是一项具有挑战性的任务，旨在通过引入视觉信息提升翻译质量。然而，近期研究表明，现有MMT数据集提供的视觉信息不足，导致模型忽视这些信息并高估自身能力。这一问题严重阻碍了MMT研究的发展。本文提出一种创新解决方案，即3AM——一个对歧义敏感的MMT数据集，包含26,000对中英文平行句子对，每对均配有对应图像。与其他MMT数据集相比，本数据集特别注重包含更多歧义性以及更丰富的图文多样性。我们利用词义消歧模型从视觉-语言数据集中筛选歧义数据，从而构建更具挑战性的数据集。进一步地，我们在所提数据集上对多个最先进的MMT模型进行了基准测试。实验结果表明，基于本数据集训练的MMT模型在利用视觉信息方面优于基于其他MMT数据集训练的模型。本研究为多模态学习领域的研究人员提供了宝贵资源，并鼓励该方向的深入探索。相关数据、代码及脚本已开源：https://github.com/MaxyLee/3AM。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日