GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs

Object hallucination in Multimodal Large Language Models (MLLMs) is a persistent failure mode that causes the model to perceive objects absent in the image. This weakness of MLLMs is currently studied using static benchmarks with fixed visual scenarios, which preempts the possibility of uncovering model-specific or unanticipated hallucination vulnerabilities. We introduce GHOST (Generating Hallucinations via Optimizing Stealth Tokens), a method designed to stress-test MLLMs by actively generating images that induce hallucination. GHOST is fully automatic and requires no human supervision or prior knowledge. It operates by optimizing in the image embedding space to mislead the model while keeping the target object absent, and then guiding a diffusion model conditioned on the embedding to generate natural-looking images. The resulting images remain visually natural and close to the original input, yet introduce subtle misleading cues that cause the model to hallucinate. We evaluate our method across a range of models, including reasoning models like GLM-4.1V-Thinking, and achieve a hallucination success rate exceeding 28%, compared to around 1% in prior data-driven discovery methods. We confirm that the generated images are both high-quality and object-free through quantitative metrics and human evaluation. Also, GHOST uncovers transferable vulnerabilities: images optimized for Qwen2.5-VL induce hallucinations in GPT-4o at a 66.5% rate. Finally, we show that fine-tuning on our images mitigates hallucination, positioning GHOST as both a diagnostic and corrective tool for building more reliable multimodal systems.

翻译：多模态大语言模型（MLLMs）中的物体幻觉是一种持续存在的失效模式，会导致模型感知到图像中不存在的物体。目前对MLLMs这一弱点的研究主要依赖具有固定视觉场景的静态基准测试，这阻碍了发现模型特定或未预见的幻觉漏洞的可能性。我们提出了GHOST（通过优化隐式令牌生成幻觉），这是一种旨在通过主动生成诱导幻觉的图像来对MLLMs进行压力测试的方法。GHOST完全自动化，无需人工监督或先验知识。其工作原理是在图像嵌入空间中进行优化，以误导模型，同时确保目标物体不存在，然后引导一个以该嵌入为条件的扩散模型生成外观自然的图像。生成的图像在视觉上保持自然且接近原始输入，但引入了微妙的误导性线索，导致模型产生幻觉。我们在包括GLM-4.1V-Thinking等推理模型在内的一系列模型上评估了我们的方法，实现了超过28%的幻觉诱发成功率，而此前数据驱动的发现方法成功率约为1%。我们通过定量指标和人工评估证实，生成的图像既高质量又不包含目标物体。此外，GHOST揭示了可迁移的漏洞：针对Qwen2.5-VL优化的图像能以66.5%的比率在GPT-4o中诱发幻觉。最后，我们证明在我们的图像上进行微调可以缓解幻觉问题，这使GHOST成为一种既能用于诊断又能用于修正的工具，有助于构建更可靠的多模态系统。