Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

Large Visual Language Models (LVLMs) struggle with hallucinations in visual instruction following task(s), limiting their trustworthiness and real-world applicability. We propose Pelican -- a novel framework designed to detect and mitigate hallucinations through claim verification. Pelican first decomposes the visual claim into a chain of sub-claims based on first-order predicates. These sub-claims consist of (predicate, question) pairs and can be conceptualized as nodes of a computational graph. We then use Program-of-Thought prompting to generate Python code for answering these questions through flexible composition of external tools. Pelican improves over prior work by introducing (1) intermediate variables for precise grounding of object instances, and (2) shared computation for answering the sub-question to enable adaptive corrections and inconsistency identification. We finally use reasoning abilities of LLM to verify the correctness of the the claim by considering the consistency and confidence of the (question, answer) pairs from each sub-claim. Our experiments reveal a drop in hallucination rate by $\sim$8%-32% across various baseline LVLMs and a 27% drop compared to approaches proposed for hallucination mitigation on MMHal-Bench. Results on two other benchmarks further corroborate our results.

翻译：大型视觉语言模型在视觉指令跟随任务中存在幻觉问题，这限制了其可信度与实际应用价值。我们提出Pelican——一种通过声明验证来检测与缓解幻觉的新型框架。Pelican首先基于一阶谓词将视觉声明分解为子声明链。这些子声明由（谓词，问题）对组成，可被概念化为计算图的节点。随后，我们通过思维程序提示生成Python代码，借助外部工具的灵活组合来回答这些问题。Pelican通过引入以下两点改进了先前工作：（1）用于精确锚定对象实例的中间变量；（2）用于回答子问题的共享计算机制，以实现自适应校正与不一致性识别。最后，我们利用大语言模型的推理能力，通过考量各子声明中（问题，答案）对的一致性与置信度来验证原始声明的正确性。实验表明，在不同基线大型视觉语言模型上，幻觉率降低了约8%-32%；在MMHal-Bench基准测试中，相较于现有幻觉缓解方法，幻觉率降低了27%。在另外两个基准测试上的结果进一步验证了我们的结论。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日