Multimodal large language models (MLLMs) have demonstrated remarkable capabilities in aligning visual inputs with natural language outputs. Yet, the extent to which generated tokens depend on visual modalities remains poorly understood, limiting interpretability and reliability. In this work, we present EAGLE, a lightweight black-box framework for explaining autoregressive token generation in MLLMs. EAGLE attributes any selected tokens to compact perceptual regions while quantifying the relative influence of language priors and perceptual evidence. The framework introduces an objective function that unifies sufficiency (insight score) and indispensability (necessity score), optimized via greedy search over sparsified image regions for faithful and efficient attribution. Beyond spatial attribution, EAGLE performs modality-aware analysis that disentangles what tokens rely on, providing fine-grained interpretability of model decisions. Extensive experiments across open-source MLLMs show that EAGLE consistently outperforms existing methods in faithfulness, localization, and hallucination diagnosis, while requiring substantially less GPU memory. These results highlight its effectiveness and practicality for advancing the interpretability of MLLMs. The code will be released at https://ruoyuchen10.github.io/EAGLE/.
翻译:多模态大语言模型(MLLMs)在将视觉输入与自然语言输出对齐方面展现出卓越能力。然而,生成词元对视觉模态的依赖程度仍不甚明确,这限制了模型的可解释性与可靠性。本研究提出EAGLE——一个轻量级黑盒框架,用于解释MLLMs中的自回归词元生成过程。EAGLE能够将任意选定词元归因于紧凑的感知区域,同时量化语言先验与感知证据的相对影响。该框架通过统一充分性(洞察分数)与必要性(需求分数)的目标函数,在稀疏化图像区域上通过贪婪搜索进行优化,实现忠实且高效的归因。除空间归因外,EAGLE还执行模态感知分析,解构词元的依赖基础,为模型决策提供细粒度可解释性。在开源MLLMs上的大量实验表明,EAGLE在忠实度、定位能力和幻觉诊断方面持续优于现有方法,同时显著降低GPU内存需求。这些结果凸显了其在推进MLLMs可解释性研究中的有效性与实用性。代码发布于https://ruoyuchen10.github.io/EAGLE/。