Multi-Agents Based on Large Language Models for Knowledge-based Visual Question Answering

Large Language Models (LLMs) have achieved impressive results in knowledge-based Visual Question Answering (VQA). However existing methods still have challenges: the inability to use external tools autonomously, and the inability to work in teams. Humans tend to know whether they need to use external tools when they encounter a new question, e.g., they tend to be able to give a direct answer to a familiar question, whereas they tend to use tools such as search engines when they encounter an unfamiliar question. In addition, humans also tend to collaborate and discuss with others to get better answers. Inspired by this, we propose the multi-agent voting framework. We design three LLM-based agents that simulate different levels of staff in a team, and assign the available tools according to the levels. Each agent provides the corresponding answer, and finally all the answers provided by the agents are voted to get the final answer. Experiments on OK-VQA and A-OKVQA show that our approach outperforms other baselines by 2.2 and 1.0, respectively.

翻译：大语言模型（LLMs）在基于知识的视觉问答（VQA）任务中已取得令人瞩目的成果。然而，现有方法仍面临两大挑战：无法自主调用外部工具，以及无法以团队协作方式工作。人类在面对新问题时，往往能够判断是否需要借助外部工具——例如，对熟悉的问题倾向于直接给出答案，而对陌生问题则倾向于使用搜索引擎等工具。此外，人类也倾向于通过协作讨论来获得更优答案。受此启发，我们提出了多智能体投票框架。我们设计了三个基于大语言模型的智能体，分别模拟团队中不同层级的成员，并根据层级分配可用工具。每个智能体提供相应答案，最终通过投票机制汇总所有智能体的答案以得到最终结果。在OK-VQA和A-OKVQA数据集上的实验表明，我们的方法分别以2.2和1.0的显著优势超越了其他基线模型。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日