ALOHa: A New Measure for Hallucination in Captioning Models

Despite recent advances in multimodal pre-training for visual description, state-of-the-art models still produce captions containing errors, such as hallucinating objects not present in a scene. The existing prominent metric for object hallucination, CHAIR, is limited to a fixed set of MS COCO objects and synonyms. In this work, we propose a modernized open-vocabulary metric, ALOHa, which leverages large language models (LLMs) to measure object hallucinations. Specifically, we use an LLM to extract groundable objects from a candidate caption, measure their semantic similarity to reference objects from captions and object detections, and use Hungarian matching to produce a final hallucination score. We show that ALOHa correctly identifies 13.6% more hallucinated objects than CHAIR on HAT, a new gold-standard subset of MS COCO Captions annotated for hallucinations, and 30.8% more on nocaps, where objects extend beyond MS COCO categories. Our code is available at https://davidmchan.github.io/aloha/.

翻译：尽管近期多模态预训练在视觉描述方面取得了进展，最先进的模型仍会生成包含错误的描述，例如幻觉出场景中不存在的物体。现有物体幻觉评估指标CHAIR仅局限于MS COCO数据集的固定物体集合及其同义词。本研究提出了一种现代化开放词汇指标ALOHa，通过利用大型语言模型（LLMs）来评估物体幻觉。具体而言，我们使用LLM从候选描述中提取可指代物体，通过语义相似度与参考描述及物体检测结果中的物体进行度量，并采用匈牙利匹配算法生成最终幻觉分数。实验表明，在用于幻觉标注的新MS COCO Captions黄金标准子集HAT上，ALOHa比CHAIR能正确识别多13.6%的幻觉物体；在物体类别超出MS COCO范畴的nocaps数据集上，该比例提升至30.8%。我们的代码已开源在https://davidmchan.github.io/aloha/。

相关内容

关注 0

多媒体系统（MS）期刊详细介绍了多媒体计算，通信，存储和应用的各个方面的创新研究思想，新兴技术，最新方法和工具。它包含理论，实验和调查文章。多媒体系统的覆盖范围包括：在计算机系统中集成数字视频和音频功能；多媒体信息编码和数据交换格式；数字多媒体的操作系统机制；数字视频和音频网络与通信；存储模型和结构；用于支持多媒体应用程序的方法、范式、工具和软件体系结构；多媒体应用程序和应用程序接口，以及多媒体终端系统架构。官网地址：http://dblp.uni-trier.de/db/journals/mms/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日