Variational Information Pursuit with Large Language and Multimodal Models for Interpretable Predictions

Variational Information Pursuit (V-IP) is a framework for making interpretable predictions by design by sequentially selecting a short chain of task-relevant, user-defined and interpretable queries about the data that are most informative for the task. While this allows for built-in interpretability in predictive models, applying V-IP to any task requires data samples with dense concept-labeling by domain experts, limiting the application of V-IP to small-scale tasks where manual data annotation is feasible. In this work, we extend the V-IP framework with Foundational Models (FMs) to address this limitation. More specifically, we use a two-step process, by first leveraging Large Language Models (LLMs) to generate a sufficiently large candidate set of task-relevant interpretable concepts, then using Large Multimodal Models to annotate each data sample by semantic similarity with each concept in the generated concept set. While other interpretable-by-design frameworks such as Concept Bottleneck Models (CBMs) require an additional step of removing repetitive and non-discriminative concepts to have good interpretability and test performance, we mathematically and empirically justify that, with a sufficiently informative and task-relevant query (concept) set, the proposed FM+V-IP method does not require any type of concept filtering. In addition, we show that FM+V-IP with LLM generated concepts can achieve better test performance than V-IP with human annotated concepts, demonstrating the effectiveness of LLMs at generating efficient query sets. Finally, when compared to other interpretable-by-design frameworks such as CBMs, FM+V-IP can achieve competitive test performance using fewer number of concepts/queries in both cases with filtered or unfiltered concept sets.

翻译：变分信息追寻（V-IP）是一种通过设计实现可解释预测的框架，它通过顺序选择与任务相关、用户定义且可解释的数据查询链（该查询链对任务最具信息量）来达成目标。尽管这为预测模型提供了内置的可解释性，但将V-IP应用于任何任务都需要领域专家对数据样本进行密集的概念标注，这限制了V-IP在需要人工数据标注的小规模任务中的应用。在本工作中，我们通过引入基础模型（FMs）扩展V-IP框架以解决这一限制。具体而言，我们采用两步流程：首先利用大语言模型（LLMs）生成足够大的、与任务相关的可解释概念候选集，然后使用大型多模态模型通过语义相似性为每个数据样本标注与所生成概念集中每个概念的对应关系。尽管其他可解释性设计框架（如概念瓶颈模型CBMs）需要额外步骤移除重复性和非判别性概念以获得良好的可解释性和测试性能，我们从数学和实证角度证明：在查询（概念）集具有足够信息量和任务相关性的前提下，所提出的FM+V-IP方法无需任何类型的概念过滤。此外，我们表明，使用LLM生成概念的FM+V-IP方法，其测试性能优于使用人工标注概念的V-IP方法，这证明了LLM在生成高效查询集方面的有效性。最后，与CBMs等其他可解释性设计框架相比，FM+V-IP在使用更少概念/查询的情况下（无论是使用经过过滤还是未经过滤的概念集），均可达到具有竞争力的测试性能。