Open-Set Knowledge-Based Visual Question Answering with Inference Paths

Given an image and an associated textual question, the purpose of Knowledge-Based Visual Question Answering (KB-VQA) is to provide a correct answer to the question with the aid of external knowledge bases. Prior KB-VQA models are usually formulated as a retriever-classifier framework, where a pre-trained retriever extracts textual or visual information from knowledge graphs and then makes a prediction among the candidates. Despite promising progress, there are two drawbacks with existing models. Firstly, modeling question-answering as multi-class classification limits the answer space to a preset corpus and lacks the ability of flexible reasoning. Secondly, the classifier merely consider "what is the answer" without "how to get the answer", which cannot ground the answer to explicit reasoning paths. In this paper, we confront the challenge of \emph{explainable open-set} KB-VQA, where the system is required to answer questions with entities at wild and retain an explainable reasoning path. To resolve the aforementioned issues, we propose a new retriever-ranker paradigm of KB-VQA, Graph pATH rankER (GATHER for brevity). Specifically, it contains graph constructing, pruning, and path-level ranking, which not only retrieves accurate answers but also provides inference paths that explain the reasoning process. To comprehensively evaluate our model, we reformulate the benchmark dataset OK-VQA with manually corrected entity-level annotations and release it as ConceptVQA. Extensive experiments on real-world questions demonstrate that our framework is not only able to perform open-set question answering across the whole knowledge base but provide explicit reasoning path.

翻译：给定一幅图像和一个相关的文本问题，知识驱动视觉问答的目标是利用外部知识库提供问题的正确答案。现有的KB-VQA模型通常采用检索器-分类器框架，其中预训练检索器从知识图谱中提取文本或视觉信息，然后在候选答案中进行预测。尽管取得了显著进展，但现有模型存在两个缺陷：首先，将问答建模为多类分类限制了答案空间为预设语料，缺乏灵活推理能力；其次，分类器仅考虑“答案是什么”而未关注“如何获得答案”，无法将答案关联至显式推理路径。本文针对**可解释开放集KB-VQA**的挑战展开研究，要求系统能够回答涉及开放实体的问题，并保留可解释的推理路径。为解决上述问题，我们提出一种新的KB-VQA检索-排序范式——图路径排序器（简称GATHER）。具体而言，该框架包含图构建、剪枝和路径级排序，不仅能检索准确答案，还可提供解释推理过程的推理路径。为全面评估模型，我们利用人工校正的实体级标注重构基准数据集OK-VQA，并将其发布为ConceptVQA。在真实世界问题上的大量实验表明，我们的框架不仅能跨整个知识库执行开放集问答，还能提供显式的推理路径。