Trained on a vast amount of data, Large Language models (LLMs) have achieved unprecedented success and generalization in modeling fairly complex textual inputs in the abstract space, making them powerful tools for zero-shot learning. Such capability is extended to other modalities such as the visual domain using cross-modal foundation models such as CLIP, and as a result, semantically meaningful representation are extractable from visual inputs. In this work, we leverage this capability and propose an approach that can provide semantic insights into a model's patterns of failures and biases. Given a black box model, its training data, and task definition, we first calculate its task-related loss for each data point. We then extract a semantically meaningful representation for each training data point (such as CLIP embeddings from its visual encoder) and train a lightweight diagnosis model which maps this semantically meaningful representation of a data point to its task loss. We show that an ensemble of such lightweight models can be used to generate insights on the performance of the black-box model, in terms of identifying its patterns of failures and biases.
翻译:在大规模数据上训练的大语言模型(LLMs)在抽象空间中对复杂的文本输入进行建模时,取得了前所未有的成功和泛化能力,这使得它们成为零样本学习的强大工具。这种能力也被扩展到其他模态,例如通过跨模态基础模型(如CLIP)得到的视觉领域,从而可以从视觉输入中提取语义上有意义的表征。在这项工作中,我们利用这一能力,提出了一种方法,能够对模型的失败模式和偏见提供语义层面的洞见。给定一个黑箱模型、其训练数据和任务定义,我们首先为每个数据点计算其与任务相关的损失。然后,我们为每个训练数据点提取语义上有意义的表征(例如,通过视觉编码器提取的CLIP嵌入),并训练一个轻量级的诊断模型,该模型将数据点的这种语义上有意义的表征映射到其任务损失。我们表明,这种轻量级模型的集成可以用于生成对黑箱模型性能的洞见,具体体现在识别其失败模式和偏见方面。