VLR-Bench: Multilingual Benchmark Dataset for Vision-Language Retrieval Augmented Generation

We propose the VLR-Bench, a visual question answering (VQA) benchmark for evaluating vision language models (VLMs) based on retrieval augmented generation (RAG). Unlike existing evaluation datasets for external knowledge-based VQA, the proposed VLR-Bench includes five input passages. This allows testing of the ability to determine which passage is useful for answering a given query, a capability lacking in previous research. In this context, we constructed a dataset of 32,000 automatically generated instruction-following examples, which we denote as VLR-IF. This dataset is specifically designed to enhance the RAG capabilities of VLMs by enabling them to learn how to generate appropriate answers based on input passages. We evaluated the validity of the proposed benchmark and training data and verified its performance using the state-of-the-art Llama3-based VLM, the Llava-Llama-3 model. The proposed VLR-Bench and VLR-IF datasets are publicly available online.

翻译：我们提出了VLR-Bench，一个用于评估基于检索增强生成（RAG）的视觉语言模型（VLMs）的视觉问答（VQA）基准。与现有的基于外部知识的VQA评估数据集不同，所提出的VLR-Bench包含五个输入段落。这使得能够测试模型判断哪个段落对回答给定查询有用的能力，这是先前研究中所缺乏的。在此背景下，我们构建了一个包含32,000个自动生成的遵循指令示例的数据集，我们将其称为VLR-IF。该数据集专门设计用于通过使VLMs学习如何基于输入段落生成合适的答案来增强其RAG能力。我们评估了所提出基准和训练数据的有效性，并使用基于最先进Llama3的VLM——Llava-Llama-3模型验证了其性能。所提出的VLR-Bench和VLR-IF数据集已在线上公开提供。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【亚马逊-WWW2020】不解析,生成!用于面向任务的语义分析的序列到序列体系结构，Don't Parse, Generate! A Sequence to Sequence Architecture for Task-Oriented Semantic Parsing

专知会员服务

15+阅读 · 2020年2月1日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日