"When did the emperor Napoleon invented iPhone?" Such hallucination-inducing question is well known challenge in generative language modeling. In this study, we present an innovative concept of visual hallucination, referred to as "I Know (IK)" hallucination, to address scenarios where "I Don't Know" is the desired response. To effectively tackle this issue, we propose the VQAv2-IDK benchmark, the subset of VQAv2 comprising unanswerable image-question pairs as determined by human annotators. Stepping further, we present the visually dehallucinative instruction generation method for IK hallucination and introduce the IDK-Instructions visual instruction database. Our experiments show that current methods struggle with IK hallucination. Yet, our approach effectively reduces these hallucinations, proving its versatility across different frameworks and datasets.
翻译:“拿破仑皇帝何时发明了iPhone?”这类引发幻觉的问题在生成式语言建模中是众所周知的挑战。在本研究中,我们提出了一种创新的视觉幻觉概念,称为“我知道(IK)”幻觉,以应对需要回答“我不知道”的场景。为有效解决此问题,我们提出了VQAv2-IDK基准,这是VQAv2中由人工标注者确定的无解图像-问题对子集。进一步地,我们针对IK幻觉提出了视觉去幻觉化指令生成方法,并引入了IDK-Instructions视觉指令数据库。实验表明,当前方法难以应对IK幻觉,而我们的方法有效减少了这些幻觉,证明了其在不同框架和数据集上的通用性。