Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge - 专知论文

会员服务 ·

0

知识 (knowledge) · 视觉问答 · MoDELS · 自动问答 · Extensibility ·

2023 年 5 月 30 日

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

翻译：生成然后选择：基于世界知识的开放式视觉问答

Xingyu Fu,Sheng Zhang,Gukyeong Kwon,Pramuditha Perera,Henghui Zhu,Yuhao Zhang,Alexander Hanbo Li,William Yang Wang,Zhiguo Wang,Vittorio Castelli,Patrick Ng,Dan Roth,Bing Xiang

from arxiv, Accepted to ACL 2023 Findings

The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality -- only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, RASO first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code and models are released at http://cogcomp.org/page/publication_view/1010

翻译：开放式视觉问答（VQA）任务要求人工智能模型利用世界知识，联合推理视觉和自然语言输入。最近，预训练语言模型（PLM），如GPT-3，已被应用于该任务，并显示出作为强大世界知识源的能力。然而，这些方法存在因PLM偏差——即无论提示如何变化，模型倾向于生成某些标记而非其他标记——导致的低知识覆盖率问题，以及对PLM质量的高度依赖性——只有使用GPT-3的模型才能达到最佳效果。为解决上述挑战，我们提出RASO：一种全新的VQA流水线，首次采用基于世界知识指导的“生成然后选择”策略。RASO并未遵循训练一个直接生成VQA答案的多模态模型的常规标准，而是首先采用PLM生成所有可能的答案，然后训练一个轻量级的答案选择模型以选出正确答案。如我们的分析所示，RASO大幅扩展了来自领域内训练数据的知识覆盖率。我们提供了广泛的实验，并通过在OK-VQA上将最新技术水平提升4.1%来展示我们流水线的有效性，且无需额外计算成本。代码和模型已在http://cogcomp.org/page/publication_view/1010上发布。

0

相关内容

知识 (knowledge)

知识 (knowledge)

通过学习、实践或探索所获得的认识、判断或技能。

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

胰岛素/胰岛素样生长因子-1信号通路相关基因多态性与长寿的遗传关联分析及功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

旋转通道等径角平行挤压工艺及其制备UFG时效硬化铝合金损伤机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

水稻苗期耐盐基因qSTS8的精细定位与候选基因分析

国家自然科学基金

0+阅读 · 2013年12月31日

SIRT5介导的线粒体蛋白去琥珀酰化修饰在细胞氧化应激调控和癌症发生过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Spiked模型中特征值和特征向量的理论分析与推断

国家自然科学基金

1+阅读 · 2012年12月31日

PIM-1信号通路在非小细胞肺癌EGFR-TKI获得性耐药中的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

知识驱动的软件需求和体系结构文档的归档方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

“#32511;洲—#33618;漠”#23707;屿生态种群的扩散模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation

Arxiv

0+阅读 · 2023年7月17日

Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?

Arxiv

0+阅读 · 2023年7月15日

On Grounded Planning for Embodied Tasks with Language Models

Arxiv

0+阅读 · 2023年7月15日

Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning

Arxiv

0+阅读 · 2023年7月15日

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs

Arxiv

1+阅读 · 2023年7月14日

Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

Arxiv

0+阅读 · 2023年7月14日

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis

Arxiv

0+阅读 · 2023年7月13日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

VIP会员

文章信息

相关主题

知识 (knowledge)

最新内容

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

8+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

4+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

6+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

7+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

11+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

11+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

8+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

19+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

10+阅读 · 6月17日

从燃煤战舰到算法战争：水面指挥的永恒要求

从燃煤战舰到算法战争：水面指挥的永恒要求

专知会员服务

6+阅读 · 6月17日

《短程弹道再入飞行器拦截时间中的一项异常现象》

《短程弹道再入飞行器拦截时间中的一项异常现象》

专知会员服务

8+阅读 · 6月17日

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

《基于回归方法与任务上下文的对抗环境动态战术网络报文优先级排序》

专知会员服务

8+阅读 · 6月17日

相关VIP内容

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

ICLR 2022杰出论文公布：7篇论文获得，清华朱军课题组摘得

专知会员服务

60+阅读 · 2022年4月22日

零样本文本分类，Zero-Shot Learning for Text Classification

零样本文本分类，Zero-Shot Learning for Text Classification

专知会员服务

97+阅读 · 2020年5月31日

因果图，Causal Graphs，52页ppt

因果图，Causal Graphs，52页ppt

专知会员服务

254+阅读 · 2020年4月19日

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【微软研究院】IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

专知会员服务

43+阅读 · 2020年1月28日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

《DeepGCNs: Making GCNs Go as Deep as CNNs》

《DeepGCNs: Making GCNs Go as Deep as CNNs》

专知会员服务

32+阅读 · 2019年10月17日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

【论文推荐】最新九篇自动问答相关论文—可解释推理网络、上下文知识图谱嵌入、注意力RNN、Multi-Cast注意力网络

专知

15+阅读 · 2018年6月29日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

相关论文

Manifold-Guided Sampling in Diffusion Models for Unbiased Image Generation

Arxiv

0+阅读 · 2023年7月17日

Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?

Arxiv

0+阅读 · 2023年7月15日

On Grounded Planning for Embodied Tasks with Language Models

Arxiv

0+阅读 · 2023年7月15日

Unifying Structure Reasoning and Language Model Pre-training for Complex Reasoning

Arxiv

0+阅读 · 2023年7月15日

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs

Arxiv

1+阅读 · 2023年7月14日

Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training

Arxiv

0+阅读 · 2023年7月14日

Augmented Co-Speech Gesture Generation: Including Form and Meaning Features to Guide Learning-Based Gesture Synthesis

Arxiv

0+阅读 · 2023年7月13日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

K-AID: Enhancing Pre-trained Language Models with Domain Knowledge for Question Answering

Arxiv

15+阅读 · 2021年9月22日

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Arxiv

12+阅读 · 2020年2月19日

相关基金

胰岛素/胰岛素样生长因子-1信号通路相关基因多态性与长寿的遗传关联分析及功能研究

国家自然科学基金

0+阅读 · 2015年12月31日

旋转通道等径角平行挤压工艺及其制备UFG时效硬化铝合金损伤机理研究

国家自然科学基金

0+阅读 · 2013年12月31日

水稻苗期耐盐基因qSTS8的精细定位与候选基因分析

国家自然科学基金

0+阅读 · 2013年12月31日

SIRT5介导的线粒体蛋白去琥珀酰化修饰在细胞氧化应激调控和癌症发生过程中的作用及机制研究

国家自然科学基金

0+阅读 · 2012年12月31日

Spiked模型中特征值和特征向量的理论分析与推断

国家自然科学基金

1+阅读 · 2012年12月31日

PIM-1信号通路在非小细胞肺癌EGFR-TKI获得性耐药中的作用及其分子机制

国家自然科学基金

0+阅读 · 2012年12月31日

ATP和ROS在BCL-2基因抑癌活性中的作用机制

国家自然科学基金

0+阅读 · 2011年12月31日

知识驱动的软件需求和体系结构文档的归档方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

“#32511;洲—#33618;漠”#23707;屿生态种群的扩散模型研究

国家自然科学基金

0+阅读 · 2009年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员