TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning - 专知论文

会员服务 ·

0

双模 · 数据集 · 模态 · 跨模态 · 示例 ·

2023 年 4 月 15 日

TraVLR: Now You See It, Now You Don't! A Bimodal Dataset for Evaluating Visio-Linguistic Reasoning

翻译：TraVLR：眼見未必為實！用於評估視覺-語言推理能力的雙模態數據集

Keng Ji Chow,Samson Tan,Min-Yen Kan

from arxiv, The first two authors contributed equally

Numerous visio-linguistic (V+L) representation learning methods have been developed, yet existing datasets do not adequately evaluate the extent to which they represent visual and linguistic concepts in a unified space. We propose several novel evaluation settings for V+L models, including cross-modal transfer. Furthermore, existing V+L benchmarks often report global accuracy scores on the entire dataset, making it difficult to pinpoint the specific reasoning tasks that models fail and succeed at. We present TraVLR, a synthetic dataset comprising four V+L reasoning tasks. TraVLR's synthetic nature allows us to constrain its training and testing distributions along task-relevant dimensions, enabling the evaluation of out-of-distribution generalisation. Each example in TraVLR redundantly encodes the scene in two modalities, allowing either to be dropped or added during training or testing without losing relevant information. We compare the performance of four state-of-the-art V+L models, finding that while they perform well on test examples from the same modality, they all fail at cross-modal transfer and have limited success accommodating the addition or deletion of one modality. We release TraVLR as an open challenge for the research community.

翻译：大量視覺-語言（V+L）表徵學習方法已被提出，然而現有數據集未能充分評估其將視覺與語言概念統一表徵於共同空間的能力。我們為V+L模型提出了多項新穎的評估設定，包括跨模態遷移。此外，現有V+L基準通常報告整個數據集的全局準確率分數，難以精確定位模型成敗的具體推理任務。本文提出TraVLR——一個由四項V+L推理任務組成的合成數據集。TraVLR的合成特性允許我們沿任務相關維度約束其訓練與測試分佈，從而實現對分佈外泛化能力的評估。TraVLR中的每個樣本均以兩種模態冗餘編碼場景信息，因此在訓練或測試過程中可任意移除或添加任一模態而不損失相關信息。我們比較了四種先進V+L模型的性能，發現雖然它們在同模態測試樣本上表現良好，但均無法完成跨模態遷移，且對單一模態的增刪處理能力有限。我們將TraVLR作為開放式挑戰發布給研究社區。

0

相关内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

专知会员服务

24+阅读 · 2020年9月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

专知会员服务

14+阅读 · 2020年3月27日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

PaperWeekly

0+阅读 · 2022年7月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

124+阅读 · 2019年8月15日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

卵母细胞区域特异性DNA甲基化擦除技术的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于三苯胺-吲哚[3,2-b]咔唑-富勒烯p-n型染料用于染料敏化太阳能电池的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tet3蛋白在山羊早期胚胎DNA甲基化重编程过程中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

多源地理数据集成评估中目标的形式化建模及适应性信息融合方法

国家自然科学基金

0+阅读 · 2012年12月31日

天基预警雷达STAP-TBD关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于拉曼光谱的牛ICSI和IVF早期胚胎差异研究

国家自然科学基金

0+阅读 · 2012年12月31日

空间引力波探测关键技术预先研究

国家自然科学基金

0+阅读 · 2011年12月31日

用于二氧化碳捕获与封存的微孔聚合物材料

国家自然科学基金

0+阅读 · 2011年12月31日

原发性肌张力障碍脑结构和功能改变

国家自然科学基金

0+阅读 · 2008年12月31日

曲古菌素A对人类体细胞核移植胚胎表观遗传重编程的影响

国家自然科学基金

0+阅读 · 2008年12月31日

Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization

Arxiv

0+阅读 · 2023年6月2日

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Arxiv

0+阅读 · 2023年6月1日

"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning

Arxiv

0+阅读 · 2023年6月1日

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Arxiv

0+阅读 · 2023年6月1日

ILLUME: Rationalizing Vision-Language Models through Human Interactions

Arxiv

0+阅读 · 2023年5月31日

Large Language Models Are Not Abstract Reasoners

Arxiv

0+阅读 · 2023年5月31日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

VIP会员

文章信息

相关主题

最新内容

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

2+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

4+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

5+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

6+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

10+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

10+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

7+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

13+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

8+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

美智库《战术级指挥控制的迫切要求：构建弹性机动式指挥控制网络》报告

专知会员服务

10+阅读 · 6月17日

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

【香港科技大学等】视觉-语言智能:任务、表示学习和大模型，Vision-Language Intelligence: Tasks, Representation Learning, and Large Models

专知会员服务

44+阅读 · 2022年3月8日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

【AAAI2021】知识增强的视觉-语言预训练技术 ERNIE-ViL

专知会员服务

26+阅读 · 2021年1月29日

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

【Google-Thang】最新《语言预训练语生成进展》67页ppt，Language Pretraining

专知会员服务

24+阅读 · 2020年9月15日

史上最全！358篇机器学习&自然语言处理综述论文！都这儿了

专知会员服务

129+阅读 · 2020年7月18日

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

【论文翻译】NLP注意力机制综述论文翻译，Attention, please! A Critical Review of Neural Attention Models in Natural Language Processing

专知会员服务

96+阅读 · 2020年4月18日

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

【ICML2020投稿论文-CMU-DeepMind-Google】用于评估跨语言泛化的大规模多语言多任务基准

专知会员服务

14+阅读 · 2020年3月27日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

热门VIP内容

开通专知VIP会员享更多权益服务

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

相关资讯

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

港科大&MSRA新研究：关于图像到图像转换，Fine-tuning is all you need

PaperWeekly

0+阅读 · 2022年7月5日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

文本+视觉，多篇 Visual/Video BERT 论文介绍

文本+视觉，多篇 Visual/Video BERT 论文介绍

AI科技评论

22+阅读 · 2019年8月30日

「Github」多模态机器学习文章阅读列表

「Github」多模态机器学习文章阅读列表

专知

124+阅读 · 2019年8月15日

视频分析/多模态学习论文、代码、数据集大列表

视频分析/多模态学习论文、代码、数据集大列表

专知

57+阅读 · 2019年7月13日

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

【泡泡一分钟】用于评估视觉惯性里程计的TUM VI数据集

泡泡机器人SLAM

11+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization

Arxiv

0+阅读 · 2023年6月2日

Evaluating the Capabilities of Multi-modal Reasoning Models with Synthetic Task Data

Arxiv

0+阅读 · 2023年6月1日

"Let's not Quote out of Context": Unified Vision-Language Pretraining for Context Assisted Image Captioning

Arxiv

0+阅读 · 2023年6月1日

Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective

Arxiv

0+阅读 · 2023年6月1日

ILLUME: Rationalizing Vision-Language Models through Human Interactions

Arxiv

0+阅读 · 2023年5月31日

Large Language Models Are Not Abstract Reasoners

Arxiv

0+阅读 · 2023年5月31日

Multimodality Representation Learning: A Survey on Evolution, Pretraining and Its Applications

Arxiv

20+阅读 · 2023年2月1日

Unifying Vision-and-Language Tasks via Text Generation

Arxiv

10+阅读 · 2021年2月4日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

相关基金

卵母细胞区域特异性DNA甲基化擦除技术的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于三苯胺-吲哚[3,2-b]咔唑-富勒烯p-n型染料用于染料敏化太阳能电池的研究

国家自然科学基金

0+阅读 · 2013年12月31日

Tet3蛋白在山羊早期胚胎DNA甲基化重编程过程中的作用

国家自然科学基金

0+阅读 · 2013年12月31日

多源地理数据集成评估中目标的形式化建模及适应性信息融合方法

国家自然科学基金

0+阅读 · 2012年12月31日

天基预警雷达STAP-TBD关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于拉曼光谱的牛ICSI和IVF早期胚胎差异研究

国家自然科学基金

0+阅读 · 2012年12月31日

空间引力波探测关键技术预先研究

国家自然科学基金

0+阅读 · 2011年12月31日

用于二氧化碳捕获与封存的微孔聚合物材料

国家自然科学基金

0+阅读 · 2011年12月31日

原发性肌张力障碍脑结构和功能改变

国家自然科学基金

0+阅读 · 2008年12月31日

曲古菌素A对人类体细胞核移植胚胎表观遗传重编程的影响

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员